Big Data

The concept of “Big data” represents one of a long series of milestones in the history of information and communication systems. It reproduces a situation already known since we started to talk about the binary coding of information and the digital revolution, the informational explosion and info-obesity that prepared for the advent of the information society. At each of these major events, a general arrangement was always set up to invent, innovate, adapt, and implement solutions and methods to control and tame an overflowing stream of data.

big data

Let us make a quick comparison between evolution cycles in data processing that materializes in today's concept of Big data and associated technologies, including data transmission networks, technological interoperability standards and Data analytics. Without flashing back to the historical origins of the invention of printing five centuries ago, and much less to the origins of the invention of writing 5000 years ago (two major inventions that we can consider as primitive versions of "Big Data"), we'll just focus on digital era to identify three other forms of subsequent "Big data":

1-From Paper to Digital supports: the invention of printing and the industrialization of book edition had been an unprecedented revolution in the mass dissemination of information. It has created an information explosion (a paper based Big data) that digital technologies came later to encompass and control, first through the invention of personal computers, then through the connection of these computers into local area networks working with interoperability standards.

2-From single PC to the Internet: the capacity of desktop computers and local networks can no longer manage exponential amounts of data (multimedia). The sharing and exchange of resources and the opening onto a world of virtual networks became unavoidable. On virtual networks such as the Internet, directories and search engines were created to help control the amount of data produced and exchanged. New standards for metadata and more sophisticated search tools gradually emerged. Dublin Core, as generic metadata schema, resulted from a distress expressed in 1995 by OCLC and NCSA to control the exponential amount of data on the Internet. Metadata repositories, smart agents, fuzzy logic, push technology, etc. also express the need to optimize the access and use of the vast amount of data available online. A trend towards disciplinary specialization has also been established producing standards, frameworks, and domain-specific tools to help control the huge volume of data.

3-From the classical Internet to Big data: Big Data is the new reality of information generated by the permanent extension of the mass of data systems. In 2013, the daily volume of data created by humans is 2.5 trillion bytes and 90% of the world was created during the last two years. Therefore, "Big Data" is the term used to describe the potential of vast data resources to answer questions previously out of reach. It is also the subject of a broad consensus on the ability of Big Data to supplant traditional approaches thanks to their “3V” criteria (volume, velocity, variety). There is also a broad consensus on the remarkable potential of "Big Data" to drive innovation and progress in all areas of economic and cultural activities. To this, Big Data rely on a new technology and implement its own standards. Cloud computing, high-speed GRID networks and data analytics are among the technological arsenal that sustains the emergence of big data in various sectors.

However, despite the general agreement on the current possibilities and limitations of Big Data, a lack of consensus on some important and fundamental issues is still a source of confusion. What are the attributes that define Big Data solutions? How Big Data differ from traditional data and applications environments that we are used to so far? What are the essential characteristics of Big Data environments? How these environments integrate currently deployed data architectures? What are the scientific, technological, and regulatory challenges that must be addressed in order to accelerate the deployment of performant and reliable Big Data solutions?