CloudTweaks | Big Data Survey

Big Data Survey

Today’s organizations should become more collaborative, virtual, adaptive, and agile in order to be successful in complex business world. They should be able to respond to changes and market needs. Many organizations found that the valuable data they possess and how they use it can make them different than others. In fact, Big Data can transform many fields such as business, management, public administration, science, and so on. In 2012, Gartner defined Big Data as “high-volume, high-velocity, and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making”. The term volume refers to large amounts of data, velocity indicates the speed of data in and out, and variety describes the range of data types and sources.

(See Jaspersoft Big Data Survey Results)

According to an industry report prepared by McKinsey Global Institute, the effective use of Big Data is a key basis of competition and delivering a new wave of productive growth. In other words, managing, analysing, visualizing, and extracting useful information from large data sets will help organizations to increase operational efficiency, to inform strategic direction, to develop new products and services, to identify new customers and markets, to make better decisions and to become more innovative. (Davenport, Barth, & Bean, 2012).

Although Big Data bring many attractive opportunities, organizations are also facing a lot of challenges such as data capture, storage, searching, sharing, analysis, and visualization. In recent years, a large number of Big Data techniques and technologies have developed to overcome all the obstacles. Big Data techniques such as statistics, Data mining, machine learning, neural networks, social network analysis, signal processing, pattern recognition, optimization methods and visualization approaches; can be used to process efficiently large volume of data.

In addition, organizations need platforms or tools to make sense of big data. They should determine which platforms and tools can help them to meet their business goals. Current tools concentrate on three classes which are batch processing tools, stream processing tools, and interactive analysis tools. Majority of batch processing tools are based on the Apache Hadoop infrastructure which is one of the most important software platforms that support data-intensive distributed applications. It can load, store and query massive data sets on a large, flexible grid servers, as well as perform advanced analytics. It uses a programming model, which is called Map/Reduce, to process and generate great volume of data sets. Map/Reduce breaks down a complex problem into many sub-problems. These sub-problems are solved in separate and parallel ways. Finally, the solutions of sub-problems are combined to create a solution to the original problem. Although Hadoop can process large amount of data in parallel, it is not a real-time and high performance engine. It is not appropriate for high volume, high velocity and complex data types. Hence, other platforms such as SQL stream, Stream Cloud, and Storm can be used for real-time stream data analytics.

By Mojgan Afshari