IoT Eye Tracking
Smelling may be the first of the perceptible senses, but the eye is the fastest moving organ in the human body. While the first, second, and third screens have historically exercised the potential of this organ to digest output information, the fourth (wearables) presents an attractive first foray for sight as an input.
Wrist-based, head-worn, and other wearable devices all benefit from the strengths of their predecessors by building on the familiarity of tap, touch, and talk-based interactions. The challenge is that smaller screens compromise the capacity to communicate and consume information, and, up until recently, there were few commercially-available alternatives to bridge this gap.
While eye-tracking technologies represent a nascent market, a number of players have already started to enter the space, bringing an array of new solutions to enhance the glanceable, intuitive, and more immersive experiences that the fourth screens are designed to deliver.
Companies such as CA-based startup Eyefluence, which was acquired by Google in 4Q 2016, and The Eye Tribe, which was just acquired by Oculus (Facebook), are on an ambitious, yet attainable, quest to capitalize on this notion by bringing eye-tracking technologies to wearable AR/VR devices. With eye tracking, the idea is to use vision as a vehicle to measure intent. The long-term outlook is for this kind of functionality to complement, rather than compete with, various components that make more spatially-aware, contextual computing solutions an IoT reality.
To compete in this market, it is important to understand how best to leverage the Optimal Recognition Point (ORP), or the point within a word to which a reader’s eyes naturally gravitate before the brain starts to process its meaning, using these new technologies.
When reading traditionally formatted, line-by-line text, the eye jumps from one word to the next, identifying each ORP along the way until a punctuation mark signals the brain to pause and make sense of it all. This is one of the reasons why it’s hard to recite the alphabet or a song backwards; the components that make them whole are learned in sequence.
Positional Memory Matters
When working with sequential data—which is often the case in IoT—positional memory matters.
Regular neural nets are generally oriented around fixed-size inputs and outputs based on the unidirectional flow of input data in the hidden layer (e.g., a feedforward neural network). Recurrent neural nets (RNNs) incorporate the concept of memory. To do this, a combination of input data and hidden-layer information from each timestep is used as an input for the previous timestep, recursively. It’s this hidden recurrence that adds both the context and the back-end framework that advanced analytics and Machine Learning rely on, spanning everything from handwriting and image recognition to speech and natural language processing (NLP).
The same overarching methodology is gaining attention from companies like Google, via its 2014 acquisition of DeepMind, as the company’s AI-inspired Garage projects make their way into medical, industrial, and retail operations, among others. This methodology will also be crucial for emerging eye-tracking technology solutions to get on the radar.
In the interim, gesture control will serve as the stepping stone for the forthcoming ubiquity of eye-tracking technologies, particularly as they permeate the broader IoT landscape. The fundamental difference is simple: gestures are based on existing hardware but require the user to learn, initiate, and engage with the device for the interaction to occur; eye-tracking technologies allow the user to achieve the same ends as gesture technologies by creating value from something they already do… look.
By Ryan Martin, Senior Analyst at ABI Research