Join the CloudTweaks thought leadership contributor program which includes a customized profile, branded identity page, newsletter marketing, social amplification and more...

The program is currently available to consultants, influencers or executive level contributors.

Bill Schmarzo

Great Data Scientists Don’t Just Think Outside the Box, They Redefine the Box

Redefine the Box

Special thanks to Michael Shepherd, AI Research Strategist, Dell EMC Services, for his co-authorship. Learn more about Michael at the bottom of this post.

Imagine you wanted to determine how much solar energy could be generated from adding solar cells to a particular house. This is what Google’s Project Sunroof does with Deep Learning. Enter an address and Google uses a Deep Learning framework to estimate how much money you could save in energy costs with solar cells over 20 years (see Figure 1).

Figure 1: Google Project Sunroof Project

It’s a very cool application of Deep Learning. But let’s assume there “might” be an even better way to estimate solar energy savings. For example, you want to use Deep Learning to estimate how much solar energy we could generate with solar panels on the Golden Gate Bridge (that probably wouldn’t be a very popular decision in San Francisco). The obvious application would be to analyze several photos of the Golden Gate Bridge and estimate clear skies based upon cloud coverage.

However instead of estimating the potential solar energy generation based upon “cloud coverage,” what if we wanted to use “sunlight reflection” to generate the solar energy estimate (see Figure 2)?

Figure 2: Determining Best Predictive Variables for the Golden Gate Bridge

Or maybe you want to test another metric based upon the “sharpness of the shadows” generated by the bridge? Or another metric based upon how many people in the photo are wearing sunglasses? Or yet another metric based upon…

How do you know which of these variables – clouds or reflection or shadows or sunglasses or anything else – is the better predictor of solar energy generation? You try them all!

This thought process highlights an important behavioral trait of the best data scientists; the best data scientists have strong imaginative skills for not just “thinking outside the box” – but actually redefining the box – in trying to find variables and metrics that might be better predictors of performance.

The word “might” is a powerful enabler. “Might” is used to say or indicate that something is possible. It’s a data scientist’s most important concept, because “might” gives the data scientist the license to explore, be wrong, learn and try again.

“It Can’t Be Done” Is Not a Data Scientist Term

Andrew Ng, artificial intelligence visionary and fearless leader for many of us, wrote a recent article titled, “What Artificial Intelligence Can and Can’t Do Right Now.” In the article, Andrew states the following:

“Surprisingly, despite AI’s breadth of impact, the types of it being deployed are still extremely limited. Almost all of AI’s recent progress is through one type, in which some input data (A) is used to quickly generate some simple response (B). For example:”

Figure 3: What Machine Learning Can Do

While the use cases are limited today, the creativity at which data scientists are leveraging Big Data and existing Machine Learning and Deep Learning technologies is staggering. Let me give you one example of how data scientists from one of our Services teams at Dell EMC are thinking outside the box, to uncover new ways to help our customers avoid issues in their IT environment and create a more effortless support experience.

Predicting Hard Drive Failures

Let’s say that you are capturing over 260+ different pieces of telemetry data several times a minute for the life of a device. Most of these 260+ variables have incomplete or sparse data, the collection timing doesn’t always line up nice and neat, and getting time continuity across the devices is a major challenge.

If you were using a traditional Machine Learning algorithm, the data science team would have to spend an overwhelming amount of time 1) feature engineering new variables based on domain knowledge, and 2) using trial-and-error to determine which combinations of variables should even be included in the Machine Learning model.

Instead, our Dell EMC Services data scientists used a Patent Pending approach to Deep Learning to “pixelate” the data. They turned the over 260+ variables into device performance “images.” Then once they created these “images,” the team leveraged a recurrent neural network to find “shapes” and repeatable patterns out of random pixels (see Figure 4).

Figure 4: Pixelating Telemetry Data

A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle. RNNs can use their internal memory to process arbitrary sequences of inputs, which typically makes RNNs ideal for handwriting or speech recognition. Except in this case, instead of trying to decipher handwriting into words, the data science team used the RNN to decipher the seemingly random pixels into a prediction on the state of the device (see Figure 5).

Figure 5: Using RNN’s to Identify Shapes and Patterns Buried in the Telemetry Data

I love this example because the team didn’t feel constrained to try to fit the square peg into the round “Machine Learning” hole. Instead, they used Deep Learning in a different context to decipher seemingly random pixels into a prediction of the health of a device. The data scientists didn’t wait until someone developed a better Machine Learning algorithm. Instead, they looked at the wide variety of Machine Learning and Deep Learning tools and algorithms available to them, and applied them to a different, but related use case. If we can predict the health of a device and the potential problems that could occur with that device, then we can also help customers prevent those problems, significantly enhancing their support experience and positively impacting their environment.


One of a data scientist’s most important characteristics is that they refuse to take “it can’t be done” as an answer. They are willing to try different variables and metrics, and different type of advanced analytic algorithms, to see if there is another way to predict performance.

By the way, I included this image just because I thought it was cool. This graphic measures the activity between different IT systems. Just like with data science, this image shows there’s no lack of variables to consider when building your Machine Learning and Deep Learning models!

Want more information on how Dell EMC Services uses data science?

Check out the “Decoding Customer DNA with Data Science” blog by Doug Schmitt, President, Dell EMC Global Services, and watch for the upcoming podcasts “A Conversation with Two Data Geeks” to hear directly from the data scientists behind our transformative technologies.

I would like to thank my co-author Michael Shepherd, AI Research Strategist, Dell EMC Services. Michael holds U.S. patents in both hardware and software and is a Technical Evangelist who provides vision through transformational AI data science. With experience in supply chain, manufacturing and services, he enjoys demonstrating real scenarios with the SupportAssist Intelligence Engine showing how predictive and proactive AI platforms running at the “speed of thought” are feasible in every industry.

By Bill Schmarzo

Bill Schmarzo

CTO, IoT and Analytics at Hitachi Vantara (aka “Dean of Big Data”)

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”. He’s written white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power an organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill also just completed a research paper on “Determining The Economic Value of Data”. Onalytica recently ranked Bill as #4 Big Data Influencer worldwide.

Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored the Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements. Bill serves on the City of San Jose’s Technology Innovation Board, and on the faculties of The Data Warehouse Institute and Strata.

Previously, Bill was vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications.

Bill holds a Masters Business Administration from University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.

The Lighter Side Of The Cloud - Staff Relocation
Star Wars IoT CES
The Lighter Side Of The Cloud - Beta Testing Gone Wrong
The Lighter Side Of The Cloud - Compliance
The Lighter Side Of The Cloud - Big Data Slingshot
Built to Last: Choosing the Right Infrastructure Partner for Your Game

Built to Last: Choosing the Right Infrastructure Partner for Your Game

Choosing the Right Infrastructure Partner There are millions of gamers around the globe, and according to gaming market research firm ...
blcokchain contributor

Cryptographic Key Generation – It’s Time To Pay Attention

Cryptographic Key Generation When we think about cryptographic keys, we tend to think about closely guarded secrets. Keys are the only ...
Want to dip your toe into the cloud? Challenges of a Large Migration

Want to dip your toe into the cloud? Challenges of a Large Migration

Challenges of a Large Migration Migrating to the cloud can be a daunting task. First you have to go through ...
My Fascination with Amazon Go

My Fascination with Amazon Go

Amazon Go Recently, Amazon unveiled the world’s first completely self-service, no checkout, grocery store — and it’s really captured the public’s imagination. Lines ...
The Digital Economy: Embracing The Latest Technological Advancements

The Digital Economy: Embracing The Latest Technological Advancements

The Digital Economy As you would expect, for any business to achieve successful growth and meet its objectives, it must ...
Combatting Malware in the Cloud Requires a New Way of Thinking

Combatting Malware in the Cloud Requires a New Way of Thinking

Malware in the Cloud It’s no secret that cloud adoption has exploded in the enterprise over last few years. However, ...
The Forecast for Industry 4.0: A Combination of Fog and Clouds Resulting in Limitless Opportunities for IIoT Innovation

The Forecast for Industry 4.0: A Combination of Fog and Clouds Resulting in Limitless Opportunities for IIoT Innovation

Limitless Opportunities for IIoT Innovation Manufacturing has transcended its material nature and emerged in a new form that is partially ...
Glassdoor’s 10 Highest Paying Tech Jobs Of 2018

Glassdoor’s 10 Highest Paying Tech Jobs Of 2018

Glassdoor is best known for its candid, honest reviews of employers written anonymously by employees. It is now common practice and a good idea for anyone considering a position with a new employer to check them out on Glassdoor first. With ...
20 Leading Cloud CMS Wordpress Alternatives

20 Leading Cloud CMS WordPress Alternatives

Cloud CMS Wordpress Alternatives Content management systems (CMS) have grown exponentially in recent years. Their number and features have exploded. There are now dozens of cloud CMS Wordpress alternatives for startups and small business. CMS is getting more sophisticated. Website building ...
15 Promising Cloud-Based Video Conferencing Services

15 Promising Cloud-Based Video Conferencing Services

Cloud Video Conferencing Services We have put together a compilation of some of the best cloud based conferencing services for businesses. The cloud video conferencing services market is expected to reach US$ 6.40 Billion by 2020 from the current $3.31 ...
Network Management Software Buyer Guide 2018

Network Management Software Buyer Guide 2018

This concise data-driven report covers the Network Management software landscape, as of August 2018. he 24-page report includes: Market Overview - Top 10 Network Management products in 2018, User reviews and vendor size data, In-depth look at the Top 3 ...
Infographic - Internet of Things (IoT) Will Be Top Technology Investment

Infographic – Internet of Things (IoT) Will Be Top Technology Investment

Internet of Things Investment Investors are jumping all over the opportunities abound when it comes to the Internet of Things and Big Data. There is simply way too much money at stake to ignore the potential that is going to truly ...
Automate Service Management

[Free eBook] 150 Ways to Automate Service Management Throughout Your Organization…

Think about an IT Service Catalog as a supermarket of available services. Everyone in your company requests and delivers services from each other. From Human Resources and Marketing to Facilities and Procurement, each department is a service provider to the ...