Page 17 - Big data - Concept and application for telecommunications
P. 17
Big data - Concept and application for telecommunications 1
The keywords "is recommended" indicate a requirement which is recommended but which is not absolutely
required. Thus this requirement need not be present to claim conformance.
The keywords "can optionally" indicate an optional requirement which is permissible, without implying any
sense of being recommended. This term is not intended to imply that the vendor's implementation must
provide the option and the feature can be optionally enabled by the network operator/service provider.
Rather, it means the vendor may optionally provide the feature and still claim conformance with the
specification.
In the body of this document and its annexes, the words shall, shall not, should, and may sometimes appear,
in which case they are to be interpreted, respectively, as is required to, is prohibited from, is recommended,
and can optionally. The appearance of such phrases or keywords in an appendix or in material explicitly
marked as informative are to be interpreted as having no normative intent.
6 Overview of big data
6.1 Introduction to big data
With the rapid development of information and communications technology (ICT), Internet technologies and
services, huge amounts of data are generated, transmitted and stored at an explosive rate of growth. Data
are generated by many sources and not only by sensors, cameras or network devices, but also by web pages,
email systems and social networks as well as by many other sources. Datasets are becoming so large and so
complex or are arriving so fast that traditional data processing methods and tools are inadequate. Efficient
analytics of data within tolerable elapsed times becomes very challenging. The paradigm being developed to
resolve the above issues is called big data.
For the purpose of this Recommendation it is understood, that within the big data ecosystem, data types
include structured, semi-structured and unstructured data. Structured data are often stored in databases
which may be organized in different models, such as relational models, document models, key-value models,
graph models, etc. Semi-structured data does not conform to the formal structure of data models, but
contain tags or markers to identify data. Unstructured data do not have a pre-defined data model and are
not organized in any defined manner. Within all data types data can exist in formats, such as text,
spreadsheet, video, audio, image, map, etc.
Big data are successfully used in many fields, if traditional methods and tools have become inefficient, where
data processing is characterized by scale (volume), diversity (variety), high speed (velocity) and possibly other
criteria like credibility (veracity) or business value. These characteristics, usually called the Vs, can be
explained as follows:
– Volume: refers to the amount of data collected, stored, analysed and visualized, which big data
technologies need to resolve;
– Variety: refers to different data types and data formats that are processed by big data technologies;
– Velocity: refers to both how fast the data is being collected and how fast the data is processed by
big data technologies to deliver expected results.
NOTE – Additionally, veracity refers to the uncertainty of the data and value refers to the business results
from the gains in new information using big data technologies. Other Vs can be considered as well.
Taking into account the above Vs' described characteristics, big data technologies and services allow many
new challenges to be resolved and also create more new opportunities than ever before:
– Heterogeneity and incompleteness: Data processed using big data can miss some attributes or
introduce noise in data transmission. Even after data cleaning and error correction, some
incompleteness and some errors in data are likely to remain. These challenges can be managed
during data analysis. [b-CRA-BDWP].
– Scale: Processing of large and rapidly increasing volumes of data is a challenging task. Using data
processing technologies, the data scale challenge was mitigated by the evolution of processing and
storage resources. Nowadays however data volumes are scaling faster than resources can evolve.
Basics of Big data 9