CloudTweaks | Q&A With Rob Fox: On-Premise Data, aka “Cloud Cache”

On-Premise Data, aka “Cloud Cache”

We caught up with Rob Fox, Senior Director of Software Development for Liaison Technologies, about the growing need for businesses and consumers to store and access data in the cloud as quickly as if it were locally stored.

Why are businesses and consumers moving away from on-premise data storage to cloud storage?

Consumers are the early adopters of cloud data storage. For years, they’ve been storing and sharing vast numbers of photos in the cloud with services like Shutterfly and Snapfish, and even Facebook. Newer services like Apple’s iCloud store and sync data, photos, videos and music, and there are a host of cloud-based computer back-up services for individual PCs. Many of these services have been driven by the explosion of mobile computing, which has been enabled by coupling with cloud computing.

Up until recently, the cloud was primarily thought of as a place to store backup data. This has changed significantly over the past 18 months. With the explosion of mobile applications, Big Data and improved bandwidth, the traditional walls around data have dissolved. In the case of mobile computing, resources such as disk space are limited. In the case of Big Data, organizations simply cannot afford to store copious amounts of data on local hardware. Part of the issue isn’t just the size of the data, but the fact that elastic storage provisioning models in the cloud make it easy to right-size storage and pay for only what you need – something you simply cannot do on-premise. If you look at how digital music, social media and online e-Commerce function in 2012, you see that it makes sense for Big Data to exist in the cloud.

What challenges do businesses face when storing Big Data in the cloud?

The challenge for storing Big Data in the cloud is for businesses to be able to access it as quickly as if it were stored on-premise. For years, we’ve been butting up against Moore’s Law, making faster computers and improving access, and now, we have moved the focus to where we want to store information, but the challenges are the same. Look at Hadoop (built on HDFS) and related storage technologies, or consumer applications that sit on top of these technologies like Spotify. They try and process data locally or as if it were local, hence the cloud cache. The trick is to make it seem like the data is local, when it is not. That’s why we need the cloud cache by storing small amounts locally, using similar techniques as traditional computing.

What’s the best way to implement cloud caching so that it behaves like on-premise caching?

I remember studying memory caching techniques in my computer architecture course in college, learning about how memory is organized and about overall caching strategies. The Level 1 (L1) or primary cache is the primary form of storage, and considered to be the fastest form of data storage. The L1 cache exists directly on the processor (CPU) and is limited in size to data that is accessed often or that is considered critical for quick access.

With data living somewhere else, applications and services that require real-time high availability/low latency can be a real challenge. The solution is exactly the same as the L1 cache concept – so more specifically, I predict that on-premise storage will simply be a form of high-speed cache. Systems will only store a small subset of Big Data locally. I’m already seeing this with many cloud-hosted audio services that stream MRU (most recently used) or MFU (most frequently used) datasets to local devices for fast access. What is interesting in this model is the ability to access data even when cloud access is not currently available (think of your mobile device in airplane mode).

I have no doubt that at some point, on-premise storage will simply be considered a “cloud cache.” Don’t be surprised if storage on a LAN is considered L1 cache and intermediary cloud storage is geographically proximal to an L2 cache, before finally reaching the true source of the data, which, by the way, is probably already federated across many data stores optimized for this kind of access. Regardless of how the cache is eventually constructed, it’s a good mental exercise.

By Robert Fox

Robert Fox is the Senior Director of Software Development at Liaison Technologies, a global provider of secure cloud-based integration and data management services and solutions based in Atlanta. An original contributor to the ebXML 1.0 specification, the former Chair of Marketing and Business Development for ASC ANSI X12, and a co-founder and co-chair of the Connectivity Caucus, he can be reached at [email protected].