The Lighter Side Of The Cloud – Privacy Issues
The Lighter Side Of The Cloud – Big Data List
The Lighter Side Of The Cloud – The Fog
Hand Writing: Data, Data, Everywhere, But Let’s Just Stop And Think

Hand Writing: Data, Data, Everywhere, But Let’s Just Stop And Think

Hand Writing: Data, Data, Everywhere, But Let’s Just Stop And Think

Surely nobody who has the slightest awareness of what’s going on in the world can be unaware of the phrase ‘big data’. Almost every day the newspapers and television make reference to it, and it’s ubiquitous on the web. In November, a Google search for the phrase ‘big data’ yielded 1.8 billion hits. Google Trends shows that the rate of searches for the phrase is now about ten times what it was at the start of 2011.

The phrase defies an exact definition: one can define it in absolute terms (so many gigabytes, petabytes, etc) or in relative terms (relative to your computational resources), and in other ways. The obvious way for data to be big is by having many units (e.g., stars in an astronomical database), but it could also be big in terms of the number of variables (e.g., genomic data), the number of times something is observed (e.g., high frequency financial data), or by virtue of its complexity (e.g., the number of potential interactions in a social network).

Data-Scientists_Infographic

However one defines it, the point about ‘big data’ is the implied promise—of wonderful discoveries concealed within the huge mass, if only one can tease them out. That this is exactly the same promise that data mining made some twenty years ago is no accident. To a large extent, ‘big data’ is merely a media rebranding of ‘data mining’ (and of ‘business analytics’ in commercial contexts), and the media coining of the phrase ‘big data’ goes some way towards explaining the suddenness of the rise in interest.

Broadly speaking, there are two kinds of use of big data. One merely involves searching, sorting, matching, concatenating, and so on. So, for example, we get directions from Google maps, we learn how far away the next bus is, and we find a shop stocking the item we want. But the other use, and my personal feeling is there are more problems of this kind, involves inference. That is, we don’t actually want to know about the data we have but about data we might have had or might have in the future. What will happen tomorrow? Which medicine will make us better? What is the true value of some attribute? What would have happened had things been different? While computational tools are the keys to the first kind of problem, statistical tools are the keys to the second.

If big data is another take on data mining (looking at it from the resources end, rather than the tool end) then perhaps we can learn from the data mining experience. We might suspect, for example, that interesting and valuable discoveries will be few and far between, that many discoveries will turn out to be uninteresting, or obvious, or already well-known, and that most will be explainable by data errors. For example, big data sets are often accumulated as a side-effect of some other process—calculating how much to charge for a basket of supermarket purchases, deciding what prescription is appropriate for each patient, marking the exams of individual students—so we must be wary of issues such as selection bias. Statisticians are very aware of such things, but others are not.

As far as errors are concerned, a critical thing about big data is that the computer is a necessary intermediary: the only way you can look at the data is via plots, models, and diagnostics. You cannot examine a massive data set point by point. If data themselves are one step in a mapping from the phenomenon being studied, then looking at those data through the window of the computer is yet another step. No wonder errors and misunderstandings creep in.

Moreover, while there is no doubt that big data opens up new possibilities for discovery, that does not mean that ‘small data’ are redundant. Indeed, I might conjecture an informal theorem: the number of data sets of size n is inversely related to n. There will be vastly more small data sets than big ones, so we should expect proportionately more discoveries to emerge from small data sets.

Neither must we forget that data and information are not the same: it is possible to be data rich but information poor. The manure heap theorem is of relevance here. This mistaken theorem says that the probability of finding a gold coin in a heap of manure tends to 1 as the size of the heap tends towards infinity. Several times, after I’ve given talks about the potential of big data (stressing the need for effective tools, and describing the pitfalls outlined above), I have had people, typically from the commercial world, approach me to say that they’ve employed researchers to study their massive data sets, but to no avail: no useful information has been found.

Finally, the bottom line: to have any hope of extracting anything useful from big data, and to overcome the pitfalls outlined above, effective inferential skills are vital. That is, at the heart of extracting value from big data lies statistics.

David-J-HandBy David J Hand

David Hand is Senior Research Investigator and Emeritus Professor of Mathematics at Imperial College, London, and Chief Scientific Advisor to Winton Capital Management. He is a Fellow of the British Academy, and a recipient of the Guy Medal of the Royal Statistical Society. He has served (twice) as President of the Royal Statistical Society, and is on the Board of the UK Statistics Authority. He has published 300 scientific papers and 25 books.

Original post can be seen in the Institute of Mathematical Statistics Bulletin, January/February 2014bulletin.imstat.org

Follow Us!

CloudTweaks

Established in 2009, CloudTweaks.com is recognized as one of the leading authorities in cloud computing information. Most of the excellent CloudTweaks articles are provided by our own paid writers, with a small percentage provided by guest authors from around the globe, including CEOs, CIOs, Technology bloggers and Cloud enthusiasts. Our goal is to continue to build a growing community offering the best in-depth articles, interviews, event listings, whitepapers, infographics and much more...
Follow Us!

Sorry, comments are closed for this post.

Popular Archives

Cloud-Based VOIP – 4 Alternatives To Skype

Cloud-Based VOIP – 4 Alternatives To Skype

Cloud-Based VOIP – 4 Alternatives To Skype Skype is the most popular cloud-based VOIP service. Since being bought out by Microsoft for $8.5 billion in 2011 the company has grown to more than 300 million users and now accounts for 34% of all international calls. Some people don’t want to use Skype though. Reports of…

5 Ways CIOs Can Tackle Cloud Fears

5 Ways CIOs Can Tackle Cloud Fears

5 Ways CIOs Can Tackle Cloud Fears  CIOs are tired of hearing about cloud computing concerns. They’ve spent years reading about how cloud resources are subject to risks, and wonder – what can they do to help people trust the cloud?  The truth is that despite being a hot issue for years, the topic of…

Forrester Releases Its “15 Emerging Technologies To Watch Before 2020” Report

Forrester Releases Its “15 Emerging Technologies To Watch Before 2020” Report

15 Emerging Technologies To Watch Before 2020 The cloud, big data, the internet of things, and wearable technology have all featured heavily in Forrester’s latest list of fifteen technologies to watch before 2020. It is becoming a reality for businesses that they need to adapt and change to an increasingly technologically-minded customer base. Traditional marketing…

Five Signs The Internet of Things Is About To Explode

Five Signs The Internet of Things Is About To Explode

The Internet of Things Is About To Explode By 2020, Gartner estimates that the Internet of Things (IoT) will generate incremental revenue exceeding $300 billion worldwide. It’s an astoundingly large figure given that the sector barely existed three years ago. We are now rapidly evolving toward a world in which just about everything will become…

Five Reasons SMBs Fear The Cloud

Five Reasons SMBs Fear The Cloud

Five Reasons SMBs Fear the Cloud Fear of the cloud has been around since the Cloud began. SMBs were traditionally afraid of security issues, while large companies fretted about increasing the complexity of their IT infrastructure. What many budding start-up companies don’t realise is Cloud Computing helps place them on a level playing field with…

Recent

Is The Internet of Things A Perfect Storm?

Is The Internet of Things A Perfect Storm?

Is The Internet of Things A Perfect Storm? There has been a great deal of discussion surrounding the Internet of Things over the past couple of years as more companies are taking an active and aggressive interest. IBM for example has recently decided to invest $3 Billion over the next 4 years. “Our knowledge of…

Digital Transformation: Miracle and Wonder

Digital Transformation: Miracle and Wonder

Digital Transformation These are the days of miracle and wonder. I’ve been leading a small, global research team at the Tau Institute for the past few years to examine the dynamics of IT adoption in more than 100 countries throughout the world. We’ve developed several indices that show how well these nations are doing on a relative basis. We ranked the nations…

Cloud Security Hottest Issue At RSA

Cloud Security Hottest Issue At RSA

Cloud Security Hottest Issue The integral integration of cyber security and cloud technology seemed to be the hottest issue at the busy RSA 2015 Conference in San Francisco. Interested parties packed security and cloud service booths for the duration of the conference. Several prominent publications covered the increased importance of securing their private information that’s…

Contact Us

Sending

Technology Sponsors

hp Logo CityCloud-PoweredByOpenstack-Bluesquare_logo_100x100-01
cisco_logo_100x100 vmware citrix100
Site 24x7 200px-KPMG

Established in 2009, CloudTweaks is recognized as one of the leading influencers in cloud computing, big data and internet of things (IoT) information. Our goal is to continue to build our growing information portal, by providing the best in-depth articles, interviews, event listings, whitepapers, infographics and much more.

CloudTweaks Comic Library

Advertising