Big-Data

Hoarding In The Cloud – Saving Data Responsibly

Saving Data Responsibly

You can tell a lot about yourself and your co-workers by looking at their home directory, specifically how they organize their data (or not). Some people have never used folders or created a directory. There’s only one place to store things – at the top of their personal storage space. Others go to great lengths to create descriptive hierarchies and even potentially over-categorize their documents, blindly staring like a confused animal unsure of whether the latest reports should go in “…CorporateInternalMarketingDrafts” or “…PersonalDraftsWorkMarketing”. I will not speculate here about what either of these habits says about us as individuals. However, there is a common trait that both camps share – a lack of desire to archive data until the current working capacity reaches its maximum. In other words, no one bothers deleting unnecessary files until presented with the error “Unable to save – Disk is full.”

As disks and storage capacities continue to grow, this behavioral pattern leads us to hoarding of data simply because we can. We are confronted with the decision to assign value to individual items of data less and less frequently and therefore avoid having to choose one piece over the other. We become the technological equivalent to hoarders at worst, or at best lazy house owners who never clean up.

Bottomless Disks

data-transfer

The recent trend for cloud-storage-based “bottom-less disks” will likely exacerbate three related problems. First, the signal-to-noise ratio in our stored data will decrease as more and more low-value data accumulates. Second, inefficient application search features will bog down due to large data sets. And third, long term data retention issues such as format compatibility will become the bane of document management systems as less and less individual thought goes into what and how data should be stored.

I’ll save the latter two issues for another day. However, I would like to focus on high- vs. low-value data proposition since that was our starting point. Expensive data is content that is original or possibly would take an excessive amount of time to rebuild such that the price of regeneration exceeds the cost of the data. Some examples may include a manuscript, source code for an application or operating system, and video output from a digital rendering compute farm. Low value data on the other hand can easily be regenerated from its high value counterpart. For example, if we already maintain a revision control repository, then a specific, unchanged revision has no additional value. In terms of transfer bandwidth and storage space, the “checkout” is simply noise.

The closing question is whether or not everyone can be trained to “Save Responsibly” or if we should require a “License to Store in the Cloud”. The counterpoint is that moving to the cloud should require no change in existing behavior. Rather it is our data management software techniques that should be fixed to scale to larger and larger data sets. Personally, as a software developer and as an end-user, I believe it is a combination of both.

By Gerald (Jerry) Carter

Manager of the Likewise Open project for Likewise Software.

CloudTweaks

Established in 2009, CloudTweaks is recognized as one of the leading authorities in cloud connected technology information and consultancy services.

Are you a cloud services expert in a world of digital transformation? If so, contact us for information on how to become part of our growing cloud consultancy ecosystem.

CONTRIBUTORS

Countdown to GDPR: Preparing for Global Data Privacy Reform

Countdown to GDPR: Preparing for Global Data Privacy Reform

Preparing for Global Data Privacy Reform Multinational businesses who aren’t up to speed on the regulatory requirements of the European ...
What Futuristic Transportation Will Look Like In Your Lifetime

What Futuristic Transportation Will Look Like In Your Lifetime

Futuristic Transportation Being stuck in traffic or late for work because of a hold up on the dreaded commute could ...
AWS S3 Outage & Lessons in Tech Responsibility From Smokey the Bear

AWS S3 Outage & Lessons in Tech Responsibility From Smokey the Bear

AWS S3 Outage & Lessons in Tech Responsibility Earlier this week, AWS S3 had to fight its way back to ...
Bryan Doerr

Cyber-Threats and the Need for Secure Industrial Control Systems

Secure Industrial Control Systems (ICS) Industrial Control Systems (ICS) tend to be “out of sight, out of mind.” These systems ...
Chris Gerva

Why Containers Can’t Solve All Your Problems In The Cloud

Containers and the cloud Docker and other container services are appealing for a good reason - they are lightweight and ...
3 Ways to Protect Users From Ransomware With the Cloud

3 Ways to Protect Users From Ransomware With the Cloud

Protect Users From Ransomware The threat of ransomware came into sharp focus over the course of 2016. Cybersecurity trackers have ...
What You Need To Know About Choosing A Cloud Service Provider

What You Need To Know About Choosing A Cloud Service Provider

Selecting The Right Cloud Services Provider How to find the right partner for cloud adoption on an enterprise scale The ...
Principles of an Effective Cybersecurity Strategy

Principles of an Effective Cybersecurity Strategy

Effective Cybersecurity Strategy A number of trends contribute to today’s reality in which businesses can no longer treat cybersecurity as ...
Two 2017 Trends From A Galaxy Far, Far Away

Two 2017 Trends From A Galaxy Far, Far Away

Reaching For The Stars People who know me know that I’m a huge Star Wars fan. I recently had the ...
The Rise Of BI Data And How To Use It Effectively

The Rise Of BI Data And How To Use It Effectively

The Rise of BI Data Every few years, a new concept or technological development is introduced that drastically improves the ...

NEWS

Deloitte TMT Predictions: Machine Learning Deployments, On-Demand Content and Live Events Will Continue to Drive Growth

Deloitte TMT Predictions: Machine Learning Deployments, On-Demand Content and Live Events Will Continue to Drive Growth

NEW YORK, Dec. 12, 2017 /PRNewswire/ -- Deloitte forecasts double digital growth in machine learning deployments for the enterprise, an increasing worldwide ...
Hackers shut down infrastructure safety system in attack: FireEye

Hackers shut down infrastructure safety system in attack: FireEye

Hackers shut down infrastructure safety system (Reuters) - Hackers likely working for a nation-state recently penetrated the safety system of ...
email as a service

Google Data Analysis, Artificial Intelligence and Predicting Vaccine Scares

Social media trends can predict tipping points in vaccine scares Analyzing trends on Twitter and Google can help predict vaccine ...