Potential Powerful Pitfalls – Big Data, Big Trouble
Information has always been a great source of power. Kingdoms were often won and lost through scout reports, misinformation and treachery. Even now we fear those that hold too much of it and often cry foul in the name of privacy. But if you asked me, as a collective social generation, we have forsaken our privacy willingly so we should all just calm down about it.
Traditionally, we have collected only relevant data and stored those according to some form of sorting process so that we can make sense of all of them. For example if we wanted to know how many babies are born in Manhattan in a day we would only need to collect data from hospitals and clinics within Manhattan. This is the traditional way of collecting data because of the limited capacities of our data systems. But Big Data aims to change all of that, it is not about just collecting specific data like the number of births in Manhattan, but rather it wants to collect from each hospital from each state and the whole world as well, and not just regarding births. This can provide us with any information we need and allow us to predict future trends more accurately, or so we thought.
But Big Data is prone to some very powerful pitfalls. One is called the ‘lottery paradox,’ here we tend to give emphasis on something that is very improbable to happen to us simply because of the payoff just like in a lottery. There will always be a winner, and though chances are 175,000,000 to 1, we still hold on to the chance that we get to be that “1”. How this applies to big data is simple: the larger the data sets are the smaller the chance it is to find that piece of important information that gives out the biggest payoff. These small gems within the vast sea of data increases in number but does not increase in frequency, we simply find more of them the more data we have. So we then invest more in finding these gems, these “outrageous events”, and when we fail to find what we expect, we simply wonder if it was all worth the effort and money.
Another pitfall according to ZapThink is the ‘more is better’ paradox which is the assumption that if a certain quantity of data is good, then more of it is better. But this is not necessarily true as we might simply be encouraging the collection of more irrelevant and redundant data. Then we rely on services like Hadoop to make sense of all that chaff. But the truth is no process or software is ever capable of making sense of everything. Big Data is not about being selective in the collection process, a very big downside if you ask me.
By Abdul Salam
He has recently co-authored: Deploying and Managing a Cloud Infrastructure: Real-World Skills for the CompTIA Cloud+ Certification (Wiley).