Passing Big Data Through A Drinking Straw
Big Data has all the corporate heads up and about in excitement since it promises to uncover golden nuggets of information out from an ocean of mundane and redundant data. But here’s the problem sticking everybody in the side, Big Data is big, as in it can reach the levels of “we-can’t-come-up-with-enough-names” bytes big. And with current upload speeds nowhere near as fast as download speeds, all the fancy analytics software and techniques aren’t going to do us any good if we can’t get our data where we need them.
It is called the Skinny Straw or Drinking Straw problem and it is the biggest and most obvious problem being faced by Big Data. The analogy is simple; imagine passing an elephant through a drinking straw. Sure you can grind the elephant into very tiny bits so it can fit through the straw, but how long is that going to take? I admit that was a little gory, the real analogy was filling a swimming pool using a drinking straw, but you get the picture. The straw represents bandwidth and how small it is compared to the amount of data that needs to get to the other side of that straw.
The only real solution we can think of right off the bat is to get a bigger straw, but usually that would require major infrastructure upgrades on the part of the ISP or backbone provider, and we are talking about extreme amounts of cash (or credit if that’s how you roll). There are also the obvious technology limitations, we can upgrade to the best there is and it might not still be 100% enough. Some Big Data providers have tried their own proprietary ideas to try and get around this issue, or at least lessen it to some degree.
Here are some ways and techniques that are being used in the industry right now:
- We have the data compression and de-duplication techniques to make data transfers faster. That’s the “grinding the elephant and pushing it through the straw as fast as possible” solution.
- There is the “tinker with current protocols” direction by combining the reliability of TCP connections and the speed and bandwidth of UDP transfers into something that they call FASP. This ensures that communication is fast and secure while doing away with various handshaking processes that TCP requires.
- We can also work with various protocol optimizations in order to get around the problem. But one way that is really worth mentioning is the tried and tested transfer method –the old SneakerNet approach. Providers that use this method allow their customers to mail their hard drives to the company address so that they can transfer the data and then mail the hard drives back. This method is often faster at moving extremely large amounts of data quickly even taking into consideration the delivery time.
By Abdul Salam
(Image Source: ShutterStock)
He has recently co-authored: Deploying and Managing a Cloud Infrastructure: Real-World Skills for the CompTIA Cloud+ Certification (Wiley).