‘IoT’ is, at its core, a functional requirement for an object to be connected to a network and accessible through some API. From an engineering perspective, it’s a lot like object-oriented programming with real-world objects: ‘things’ are ‘objects’ with properties that can perform functions. In this way, many IoT objects are able to support the recent trend toward serverless architectures by handling at least a portion of their own processing needs.
For IoT devices, processing signals on the same layer in which they are collected is desirable, because it lessens network dependency and output latency. Achieving this effect with deterministic types of processing is fairly straightforward, as it is simply a matter of running the formerly network-accessible program on the local hardware.
Challenges for Machine Learning on the Edge
For non-deterministic types of programs, such as those enabled by modern machine learning techniques, there are a few more considerations. Requisite to these techniques is a training process that is both data heavy and compute intensive. This is a significant constraint to consider because most IoT hardware is purpose-built for collecting and relaying signals, and, therefore, woefully ill-equipped to handle the intensive training process by which these systems “learn”. Once trained, however, such programs shed their dependency on high-end hardware and run perfectly well on minimally equipped systems.
Without on-board training capabilities, these programs can be deployed to process data, but they have no hope improving their output by learning about their inputs.
A Model for Over the Air Updates
The system outlined in this post aims to solve this problem by introducing a data-input, model-output feedback loop between the edge and cloud layers. The system uses networked machines to perform model training in the cloud as new data becomes available and pushes markedly improved models downstream to edge devices efficiently using delta-compression. In effect, this system can use data collection at the edge to build up the volume of data necessary to train a useful model as benchmarked by a given threshold. Once a given performance milestone is reached, either by improvements to the model’s machine learning architecture, or, from newly available training data, the updated model is sent to edge devices so that all future processing can occur at that layer.
An ideal use case for this system would be to build an unsupervised anomaly detection system that could live on-board edge devices. In this use case, the device would initially send all data up the cloud to build a training set. As the dataset grows, models will be trained at regular intervals until a target performance metric crosses a set threshold. An example would be the implementation of an anomaly detection module by use of a k-means clustering algorithm. In this example, an internal evaluation metric, such as the Silhouette coefficient, would be used as the target performance metric. Once the fully trained model is in place on-board the edge device, no further network usage will be necessary to detect signal anomalies.
As a trial for this setup, a convolutional neural network (CNN) trained on the MNIST dataset, is embedded within an iOS application using the native CoreML framework. The model is controlled by a custom SDK that is capable of “hot swapping” the existing model for an updated one without requiring any other changes to the host apps binary (which would require Apple App Store approval).
The initial version of the CNN model is handicapped because it was only trained on 10% of the data, and, thus, is not adequately performant.
As expected, the model’s performance receives a big boost when more training data becomes available. Once the new data is detected by a CD system (Travis CI was used for this example) the model is retrained, tested and, if found to be sufficiently performant, exported according to a configuration file. The new exported model is statically hosted and a message is sent to the SDK to pull down the latest model and swap it out.
Sixgill’s triple agenda for limitless scalability, grinding down latency and continually evolving system intelligence makes edge computing machine learning central to our design for Sense. At the edge, we gain flexibility, preserve functionality when connection is interrupted, boost system adaptability to changing circumstances, and ensure that a real-time ingestion – normalization – exception event identifier – programmatic response system delivers a decision while it’s still relevant to have made one.