The CloudTweaks brand visibility 2019 program provides a number of terrific opportunities to help leverage your brand and service bringing it to the forefront of the technology world. We provide sponsorship, lead generation services, custom content packages, blog management and promotion. Contact us for a quote!

Bill Schmarzo

Great Data Scientists Don’t Just Think Outside the Box, They Redefine the Box

Redefine the Box

Special thanks to Michael Shepherd, AI Research Strategist, Dell EMC Services, for his co-authorship. Learn more about Michael at the bottom of this post.

Imagine you wanted to determine how much solar energy could be generated from adding solar cells to a particular house. This is what Google’s Project Sunroof does with Deep Learning. Enter an address and Google uses a Deep Learning framework to estimate how much money you could save in energy costs with solar cells over 20 years (see Figure 1).

Figure 1: Google Project Sunroof Project

It’s a very cool application of Deep Learning. But let’s assume there “might” be an even better way to estimate solar energy savings. For example, you want to use Deep Learning to estimate how much solar energy we could generate with solar panels on the Golden Gate Bridge (that probably wouldn’t be a very popular decision in San Francisco). The obvious application would be to analyze several photos of the Golden Gate Bridge and estimate clear skies based upon cloud coverage.

However instead of estimating the potential solar energy generation based upon “cloud coverage,” what if we wanted to use “sunlight reflection” to generate the solar energy estimate (see Figure 2)?

Figure 2: Determining Best Predictive Variables for the Golden Gate Bridge

Or maybe you want to test another metric based upon the “sharpness of the shadows” generated by the bridge? Or another metric based upon how many people in the photo are wearing sunglasses? Or yet another metric based upon…

How do you know which of these variables – clouds or reflection or shadows or sunglasses or anything else – is the better predictor of solar energy generation? You try them all!

This thought process highlights an important behavioral trait of the best data scientists; the best data scientists have strong imaginative skills for not just “thinking outside the box” – but actually redefining the box – in trying to find variables and metrics that might be better predictors of performance.

The word “might” is a powerful enabler. “Might” is used to say or indicate that something is possible. It’s a data scientist’s most important concept, because “might” gives the data scientist the license to explore, be wrong, learn and try again.

“It Can’t Be Done” Is Not a Data Scientist Term

Andrew Ng, artificial intelligence visionary and fearless leader for many of us, wrote a recent article titled, “What Artificial Intelligence Can and Can’t Do Right Now.” In the article, Andrew states the following:

“Surprisingly, despite AI’s breadth of impact, the types of it being deployed are still extremely limited. Almost all of AI’s recent progress is through one type, in which some input data (A) is used to quickly generate some simple response (B). For example:”

Figure 3: What Machine Learning Can Do

While the use cases are limited today, the creativity at which data scientists are leveraging Big Data and existing Machine Learning and Deep Learning technologies is staggering. Let me give you one example of how data scientists from one of our Services teams at Dell EMC are thinking outside the box, to uncover new ways to help our customers avoid issues in their IT environment and create a more effortless support experience.

Predicting Hard Drive Failures

Let’s say that you are capturing over 260+ different pieces of telemetry data several times a minute for the life of a device. Most of these 260+ variables have incomplete or sparse data, the collection timing doesn’t always line up nice and neat, and getting time continuity across the devices is a major challenge.

If you were using a traditional Machine Learning algorithm, the data science team would have to spend an overwhelming amount of time 1) feature engineering new variables based on domain knowledge, and 2) using trial-and-error to determine which combinations of variables should even be included in the Machine Learning model.

Instead, our Dell EMC Services data scientists used a Patent Pending approach to Deep Learning to “pixelate” the data. They turned the over 260+ variables into device performance “images.” Then once they created these “images,” the team leveraged a recurrent neural network to find “shapes” and repeatable patterns out of random pixels (see Figure 4).

Figure 4: Pixelating Telemetry Data

A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle. RNNs can use their internal memory to process arbitrary sequences of inputs, which typically makes RNNs ideal for handwriting or speech recognition. Except in this case, instead of trying to decipher handwriting into words, the data science team used the RNN to decipher the seemingly random pixels into a prediction on the state of the device (see Figure 5).

Figure 5: Using RNN’s to Identify Shapes and Patterns Buried in the Telemetry Data

I love this example because the team didn’t feel constrained to try to fit the square peg into the round “Machine Learning” hole. Instead, they used Deep Learning in a different context to decipher seemingly random pixels into a prediction of the health of a device. The data scientists didn’t wait until someone developed a better Machine Learning algorithm. Instead, they looked at the wide variety of Machine Learning and Deep Learning tools and algorithms available to them, and applied them to a different, but related use case. If we can predict the health of a device and the potential problems that could occur with that device, then we can also help customers prevent those problems, significantly enhancing their support experience and positively impacting their environment.


One of a data scientist’s most important characteristics is that they refuse to take “it can’t be done” as an answer. They are willing to try different variables and metrics, and different type of advanced analytic algorithms, to see if there is another way to predict performance.

By the way, I included this image just because I thought it was cool. This graphic measures the activity between different IT systems. Just like with data science, this image shows there’s no lack of variables to consider when building your Machine Learning and Deep Learning models!

Want more information on how Dell EMC Services uses data science?

Check out the “Decoding Customer DNA with Data Science” blog by Doug Schmitt, President, Dell EMC Global Services, and watch for the upcoming podcasts “A Conversation with Two Data Geeks” to hear directly from the data scientists behind our transformative technologies.

I would like to thank my co-author Michael Shepherd, AI Research Strategist, Dell EMC Services. Michael holds U.S. patents in both hardware and software and is a Technical Evangelist who provides vision through transformational AI data science. With experience in supply chain, manufacturing and services, he enjoys demonstrating real scenarios with the SupportAssist Intelligence Engine showing how predictive and proactive AI platforms running at the “speed of thought” are feasible in every industry.

By Bill Schmarzo

Bill Schmarzo

CTO, IoT and Analytics at Hitachi Vantara (aka “Dean of Big Data”)

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”. He’s written white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power an organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill also just completed a research paper on “Determining The Economic Value of Data”. Onalytica recently ranked Bill as #4 Big Data Influencer worldwide.

Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored the Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements. Bill serves on the City of San Jose’s Technology Innovation Board, and on the faculties of The Data Warehouse Institute and Strata.

Previously, Bill was vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications.

Bill holds a Masters Business Administration from University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.

Secure Business Agility

The First Steps on a CISOs DevOps Journey

CISOs DevOps The marriage between DevOps and Security is rapidly gaining traction. Security is shifting from its former mindset of ...
How IoT and OT collaborate to usher in the data-driven factory of the future

How IoT and OT collaborate to usher in the data-driven factory of the future

The Data-driven Factory The next BriefingsDirect Internet of Things (IoT) technology trends interview explores how innovation is impacting modern factories and supply chains ...
Global Public Cloud Spending To Double By 2020

Global Public Cloud Spending To Double By 2020

The Cloud and Endpoint Modeling The worldwide migration of IT resources to the public cloud continues, at a head-spinning pace ...
How Formal Verification Can Thwart Change-Induced Network Outages and Breaches

How Formal Verification Can Thwart Change-Induced Network Outages and Breaches

How Formal Verification Can Thwart  Breaches Formal verification is not a new concept. In a nutshell, the process uses sophisticated ...
How Machine Learning Quantifies Trust & Improves Employee Experiences

How Machine Learning Quantifies Trust & Improves Employee Experiences

Machine Learning Quantifies Trust Bottom Line: By enabling enterprises to scale security with user behavior-based, contextual intelligence, Next-Gen Access strategies are ...
Driving Transformation? It is possible to predict the future.

Driving Transformation? It is possible to predict the future.

Driving Transformation Previously, I wrote about the criticality of defining the Vision for your transformation - what is your real objective, how ...

Cloud Community Supporters

CA Technologies

Cloud community support comes from sponsorship, service opportunities and collaborative network partnership initiatives.

Predict ► Prescribe ► Prevent Analytics Value Cycle

Predict ► Prescribe ► Prevent Analytics Value Cycle

Predict ► Prescribe ► Prevent Organizations looking for justification to move beyond legacy reporting, should review this little ditty from the healthcare industry: The Institute of Medicine (IOM) estimates that the United States loses $750 billion annually to medical fraud, inefficiencies, and other siphons in
Exciting Potential of the AI Drones

The Exciting Potential of the AI Drones of Tomorrow

Exciting Potential of the AI Drones If you look at the ground from above, you won’t see the contour lines of a map from the U.S. Geological Service. You will see the trees, ground, dirt and other plant life. If you use a camera while

"Top 100 Brand Influencer, Cloud”

"Best Cloud Computing Blog"

"Top 10 Sites For Cloud Computing"

"Top 10 Cloud Computing Blogs”

"Top 25 Must Read Cloud Blogs"