CloudTweaks | Write Once, Run Anywhere: The IoT Machine Learning Shift From Proprietary Technology To Data

The IoT Machine Learning Shift

While early artificial intelligence (AI) programs were a one-trick pony, typically only able to excel at one task, today it’s about becoming a jack of all trades. Or at least, that’s the intention. The goal is to write one program that can solve multi-variant problems without the need to be rewritten when conditions change—write once, run anywhere. Digital heavyweights—notably Amazon, Google, IBM, and Microsoft—are now open sourcing their Machine Learning (ML) libraries in pursuit of that goal as competitive pressures shift focus from proprietary technologies to proprietary data for differentiation.

Machine learning is the study of algorithms that learn from examples and experience, rather than relying on hard-coded rules that do not always adapt well to real-world environments. ABI Research forecasts ML-based IoT analytics revenues will grow from $2 billion in 2016 to more than $19 billion in 2021, with more than 90% of 2021 revenue to be attributed to more advanced analytics phases. Yet while ML is an intuitive and organic approach to what was once a very rudimentary and primal way of analyzing data, it is worth noting that the ML/AI model creation process itself can be a very complex.

The techniques used to develop machine learning algorithms fall under two umbrellas:

How they learn: based on the type of input data provided to the algorithm (supervised learning, unsupervised learning, reinforcement learning, and semi-supervised learning)

How they work: based on type of operation, task, or problem performed on I/O data (classification, regression, clustering, anomaly detection, and recommendation engines)

Once the basic principles are established, a classifier can be trained to automate the creation of rules for a model. The challenge lies in learning and implementing the complex algorithms required to build these ML models, which can be costly, difficult, and time-consuming.

Engaging the open-source community introduces an order of magnitude to the development and integration of machine learning technologies without the need to expose proprietary data, a trend which Amazon, Google, IBM, and Microsoft swiftly pioneered.

At more than $1 trillion, these four companies have a combined market cap that dwarfs the annual gross domestic product of more than 90% of countries in the world. Each also open sourced its own deep learning library in the past 12 to 18 months: Amazon’s Deep Scalable Sparse Tensor Network Engine (DSSTNE; pronounced “destiny”), Google’s TensorFlow, IBM’s SystemML, and Microsoft’s Computational Network Toolkit (CNTK). And others are quickly following suit, including Baidu, Facebook, and OpenAI.

But this is just the beginning. To take the most advanced ML models used in IoT to the next level (artificial intelligence), modeling, and neural network toolsets (e.g., syntactic parsers) must improve. Open sourcing such toolsets is again a viable option, and Google is taking the lead by open sourcing its neural network framework, Google’s SyntaxNet, driving the next evolution in IoT from advanced analytics to smart, autonomous machines.

But should others continue to jump on this bandwagon and attempt to shift away from proprietary technology and toward proprietary data? Not all companies own the kind of data that Google collects through Android or Search, or that IBM picked up with its acquisition of The Weather Company’s B2B, mobile, and cloud-based web-properties. Fortunately, a proprietary data strategy is not the panacea for competitive advantage in data and analytics. As more devices get connected, technology will play an increasingly important role for balancing insight generation from previously untapped datasets, and the capacity to derive value from the highly variable, high-volume data that comes with these new endpoints—at a cloud scale, with zero manual tuning.

Collaboration

Collaborative economics is an important component in the analytics product and service strategies of these four leading digital companies all seeking to build a greater presence in IoT and more broadly the convergence of the digital and the physical. But “collaboration” should be placed in context. Once one company open-sourced its ML libraries, other companies were forced to release theirs as well. Millions of developers are far more powerful than a few thousand in-house employees. As well, open sourcing offers these companies tremendous benefits because they can use the new tools to enhance their own operations. For example, Baidu’s Paddle ML software is being used in 30 different online and offline Baidu businesses ranging from health to financial services.

And there are other areas for these companies to invest resources that go beyond the analytics toolsets. Identity management services, data exchange services and data chain of custody are three key areas that will be critical in the growth of IoT and the digital/physical convergence. Pursuing ownership or proprietary access to important data has its appeal. But the new opportunities in the IoT landscape will rely on great technology and the scale these companies possess for a connected world that will in the decades to come reach hundreds of billions of endpoints.

By Dan Shey