Big-Data

Hoarding In The Cloud – Saving Data Responsibly

Saving Data Responsibly

You can tell a lot about yourself and your co-workers by looking at their home directory, specifically how they organize their data (or not). Some people have never used folders or created a directory. There’s only one place to store things – at the top of their personal storage space. Others go to great lengths to create descriptive hierarchies and even potentially over-categorize their documents, blindly staring like a confused animal unsure of whether the latest reports should go in “…CorporateInternalMarketingDrafts” or “…PersonalDraftsWorkMarketing”. I will not speculate here about what either of these habits says about us as individuals. However, there is a common trait that both camps share – a lack of desire to archive data until the current working capacity reaches its maximum. In other words, no one bothers deleting unnecessary files until presented with the error “Unable to save – Disk is full.”

As disks and storage capacities continue to grow, this behavioral pattern leads us to hoarding of data simply because we can. We are confronted with the decision to assign value to individual items of data less and less frequently and therefore avoid having to choose one piece over the other. We become the technological equivalent to hoarders at worst, or at best lazy house owners who never clean up.

Bottomless Disks

The recent trend for cloud-storage-based “bottom-less disks” will likely exacerbate three related problems. First, the signal-to-noise ratio in our stored data will decrease as more and more low-value data accumulates. Second, inefficient application search features will bog down due to large data sets. And third, long term data retention issues such as format compatibility will become the bane of document management systems as less and less individual thought goes into what and how data should be stored.

I’ll save the latter two issues for another day. However, I would like to focus on high- vs. low-value data proposition since that was our starting point. Expensive data is content that is original or possibly would take an excessive amount of time to rebuild such that the price of regeneration exceeds the cost of the data. Some examples may include a manuscript, source code for an application or operating system, and video output from a digital rendering compute farm. Low value data on the other hand can easily be regenerated from its high value counterpart. For example, if we already maintain a revision control repository, then a specific, unchanged revision has no additional value. In terms of transfer bandwidth and storage space, the “checkout” is simply noise.

The closing question is whether or not everyone can be trained to “Save Responsibly” or if we should require a “License to Store in the Cloud”. The counterpoint is that moving to the cloud should require no change in existing behavior. Rather it is our data management software techniques that should be fixed to scale to larger and larger data sets. Personally, as a software developer and as an end-user, I believe it is a combination of both.

By Gerald (Jerry) Carter

Manager of the Likewise Open project for Likewise Software.

CloudTweaks

Established in 2009, CloudTweaks is recognized as one of the leading authorities in cloud connected technology information, resources and thought leadership services.

Contact us for a list of our leading programs.

The Shift from Monolithic to Microservices: What It Means for CTOs.

The Shift from Monolithic to Microservices: What It Means for CTOs.

The Shift to Microservices The shift in application development strategies is moving from monolithic design to isolated and resilient components ...
Machine Learning Explained: Understanding Supervised, Unsupervised, and Reinforcement Learning

Machine Learning Explained: Understanding Supervised, Unsupervised, and Reinforcement Learning

Machine Learning Explained Once we start delving into the concepts behind Artificial Intelligence (AI) and Machine Learning (ML), we come ...
Apcela

Industrial IoT will reshape network requirements

Industrial IoT The hype around IoT may have been surpassed this year by breathless coverage of topics such as artificial ...
Chris

How to Avoid Becoming Another Cloud Security Statistic

Cloud Security Statistic Last year, Gartner predicted that, by 2020, 95 percent of all cloud security failures will be caused ...
Cloudification - Budgets are Shifting Toward a “Cloud-first” and “Cloud-only” Approach

Cloudification – Budgets are Shifting Toward a “Cloud-first” and “Cloud-only” Approach

Cloudification and the Budget Shift Gartner has recently predicted that by 2020, a corporate "no-cloud" policy will be as rare ...
Beacons Flopped, But They’re About to Flourish in the Future

Beacons Flopped, But They’re About to Flourish in the Future

Cloud Beacons Flying High When Apple debuted cloud beacons in 2013, analysts predicted 250 million devices capable of serving as ...
Kodak Bitcoin mining 'scam' evaporates

Kodak Bitcoin mining ‘scam’ evaporates

The company behind a Kodak-branded crypto-currency mining scheme has confirmed the plan has collapsed. In January, a Bitcoin mining computer labelled Kodak KashMiner was on display on Kodak's official stand at the CES technology show ...
The Virtue of Intelligence in the Cloud

The Virtue of Intelligence in the Cloud

According to a recent IDG survey, about 70% of companies have at least one application in the cloud. An additional 43% want to migrate most, or all, of their data workloads and analytics capabilities to the ...
Google hit with record $5 billion EU antitrust fine

Google hit with record $5 billion EU antitrust fine

EU regulators hit Google with a record 4.34 billion euros ($5 billion) antitrust fine on Wednesday for using its Android mobile operating system to squeeze out rivals. The penalty is nearly double the previous record ...