Off the Beaten Path – HTM-based Strong AI Beats RNNs and CNNs at Prediction and Anomaly Detection

Off the Beaten Path – HTM-based Strong AI Beats RNNs and CNNs at Prediction and Anomaly Detection

Summary: This is the second in our “Off the Beaten Path” series looking at innovators in machine learning who have elected strategies and methods outside of the mainstream.  In this article we look at Numenta’s unique approach to scalar prediction and anomaly detection based on their own brain research.
Numenta, the machine intelligence company founded in 2005 by Jeff Hawkins of Palm Pilot fame might well be the poster child for ‘off the beaten path’.  More a research laboratory than commercial venture, Hawkins has been pursuing a strong-AI model of computation that will at once directly model the human brain, and as a result be a general purpose solution to all types of machine learning problems.
After swimming against the tide of the ‘narrow’ or ‘weak’ AI approaches represented by deep learning’s CNNs and RNN/LSTMs his bet is starting to pay off.  There are now benchmarked studies showing that Numenta’s strong AI computational approach can outperform CNN/RNN based deep learning (which Hawkins characterizes as ‘classic’ AI) at scalar predictions (future values of things like commodity, energy, or stock prices) and at anomaly detection.
How Is It Different from Current Deep Learning
The ‘strong AI’ approach pursued by Numenta relies on computational models drawn directly from the brain’s own architecture. 
‘Weak AI’ by contrast, represented by the full panoply of deep neural nets, acknowledges that it is only suggestive of true brain function, but it gets results. 
We are all aware of the successes in image, video, text, and speech analysis that CNNs and RNN/LSTMs have achieved and that is their primary defense.  They work.  They give good commercial results.  But we are also beginning to recognize their weaknesses: large training sets, susceptibility to noise, long training times, complex setup, inability to adapt to changing data, and time invariance that begin to show us where the limits of their development will lead us.
Numenta’s computational approach has a few similarities to these and many unique contributions that require those of us involved in deep learning to consider a wholly different computational paradigm.
Hierarchical Temporal Memory (HTM)
It would take several articles of this length to do justice to the methods introduced by Numenta.  Here are the highlights.
HTM and Time:  Hawkins uses the term Hierarchical Temporal Memory (HTM) to describe his overall approach.  The first key to understanding is that HTM relies on data that streams over time.  Quoting from previous interviews, Hawkins says,
“The brain does two things: it does inference, which is recognizing patterns, and it does behavior, which is generating patterns or generating motor behavior.  Ninety-nine percent of inference is time-based – language, audition, touch – it’s all time-based. You can’t understand touch without moving your hand. The order in which patterns occur is very important.”
So the ‘memory’ element of HTM is how the brain differently interprets or relates each sequential input, also called ‘sequence memory’.
By contrast, conventional deep learning uses static data and is therefore time invariant.  Even RNN/LSTMs that process speech, which is time based, actually do so on static datasets.
Sparse Distributed Representations (SDRs):  Each momentary input to the brain, for example from the eye or from touch is understood to be some subset of all the available neurons in that sensor (eye, ear, finger) firing and forwarding that signal upward to other neurons for processing.
Since not all the available neurons fire for each input, the signal sent forward can be seen as a Sparse Distributed Representation (SDR) of those that have fired (the 1s in the binary representation) versus the hundreds or thousands that have not (the 0s in the binary representation).  We know from research that on average only about 2% of neurons fire with any given event giving meaning to the term ‘sparse’.
SDRs lend themselves to vector arrays and because they are sparse have the interesting characteristics that they can be extensively compressed without losing meaning and are very resistant to noise and false positives.
By comparison, deep neural nets fire all neurons in each layer, or at least those that have reached the impulse threshold.  This is an acknowledged drawback by current researchers in moving DNNs much beyond where they are today.
Learning is Continuous and Unsupervised:  Like CNNs and RNNs this is a feature generating system that learns from unlabeled data.
When we look at diagrams of CNNs and RNNs they are typically shown as multiple (deep) layers of neurons which decrease in pyramidal fashion as the signal progresses.  Presumably discovering features in this self-constricting architecture down to the final classifying layer.
HTM architecture by contrast is simply columnar with columns of computational neurons passing the information on to upward layers in which pattern discovery and recognition occurs organically by the comparison of one SDR (a single time signal) to the others in the signal train. 
HTM has the characteristic that it discovers these patterns very rapidly, with as few as on the order of 1,000 SDR observations.  This compares with the hundreds of thousands or millions of observations necessary to train CNNs or RNNs.
Also the pattern recognition is unsupervised and can recognize and generalize about changes in the pattern based on changing inputs as soon as they occur.  This results in a system that not only trains remarkably quickly but also is self-learning, adaptive, and not confused by changes in the data or by noise.
Numenta offers a deep library of explanatory papers and YouTube videos for those wanting to experiment hands-on.
Where Does HTM Excel
HTM has for many years been a work in progress.  That has recently changed.  Numenta has published several peer reviewed performance papers and established benchmarking in areas of its strength that highlight its superiority over traditional DNNs and other ML methods on particular types of problems.
In general, Numenta says that the current state of its technology represented by its open source project NuPIC (Numenta Platform for Intelligent Computing) currently excels in three areas:
Anomaly Detection in streaming data.  For example:

Highlighting anomalies in the behavior of moving objects, such as tracking a fleet’s movements on a truck by truck basis using geospatial data.
Understanding if human behavior is normal or abnormal on a securities trading floor.
Predicting failure in a complex machine based on data from many sensors.

Scalar Predictions, for example:

Predicting energy usage for a utility on a customer by customer basis.
Predicting New York City taxi passenger demand 2 ½ hours in advance based on a public data stream provided by the New York City Transportation Authority.

Highly Accurate Semantic Search on static and streaming data (these examples are from Corticol.Io a Numenta commercial partner using the SDR concept but not NuPICS).

Automate extraction of key information from contracts and other legal documents.
Quickly find similar cases to efficiently solve support requests.
Extract topics from different data sources (e.g. emails, social media) and determine customers’ intent.
Terrorism Prevention: Monitor all social media messages alluding to terrorist activity even if they don’t use known keywords.
Reputation Management: Track all social media posts mentioning a business area or product type without having to type hundreds of keywords.

Two Specific Examples of Performance Better Than DNNs
Taxi Demand Forecast
In this project, the objective was to predict the demand for New York City taxi service 2 ½ hours in advance based on a public data stream provided by the New York City Transportation Authority.  This was based on historical streaming data at 30 minutes intervals using the previous 1,000, 3,000, or 6,000 observations as the basis for the forward projection 5 periods (2 ½ hours) in advance.  The study (which you can see here) compared ARIMA, TDNN, and LSTM to the HTM method where HTM demonstrated the lowest error rate.
Machine Failure Prediction (Anomaly)
The objective of this test was to compare two of the most popular anomaly detection routines (Twitter ADVec and Etsy Skyline) against HTM in a machine failure scenario.  In this type of IoT application it’s important that the analytics detect all the anomalies present, detect them as far before occurrence as possible, trigger no false alarms (false positives), and work with real time data.  A full description of the study can be found here.
The results showed that the Numenta HTM outperformed the other methods by a significant margin. 
Even more significantly, as noted in the caption below, the Numenta HTM method identified the potential failure a full 3 hours before the other techniques.
You can find other benchmark studies on the Numenta site.
The Path Forward
Several things are worthy of note here since as we mentioned earlier the Numenta HTM platform is still a work in progress.
Numenta’s business model currently calls for it to be the center of a commercial ecosystem while retaining its primary research focus.  Currently Numenta has two commercially licensed partners, Corticol.Io which focuses on streaming text and semantic interpretation.  The second is Grok ( which has adapted the NuPIC core platform for anomaly detection in all types of IT operational scenarios.  The core NuPICs platform is open source if you’re motivated to experiment with potential commercial applications.
Image and Text Classification
A notable absence from the current list of capabilities is image and text classification.  There are no current plans for Numenta to develop image classification from static data since that is not on the critical path defined by streaming data.  It’s worth noting that others have demonstrated the use of HTM as a superior technique for image classification not using the NuPICs platform.
Near Term Innovation
In my conversation with Christy Maver, Numenta’s VP of Marketing she expressed that there is a feeling at Numenta that the long period of development may be complete within the timeframe of perhaps a year.  This last push is in the area of sensorimotor integration that would be the core concept in applying the HTM architecture to robotics.
For commercial development, the focus will be on partners to license the IP.  Even IBM established a Cortical Research Center a few years back staffed with about 100 researchers to examine the Numenta HTM approach.  Like so many others now trying to advance AI by more closely modeling the brain, IBM, like Intel and others has moved off in the direction of specialty chips that fall in the category on neuromorphic or spiking chips.   Brainchip out of Irvine already has a spiking neuromorphic chip in commercial use.  As Maver alluded, there may be a silicon representation of Numenta’s HTM in the future.
Other articles in this series:
Off the Beaten path – Using Deep Forests to Outperform CNNs and RNNs
About the author:  Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist since 2001.  He can be reached at:

Link: Off the Beaten Path – HTM-based Strong AI Beats RNNs and CNNs at Prediction and Anomaly Detection