Deep learning is a prevalent buzzword these days. Some analysts say it will eclipse other machine learning (ML) methods known today. Others size its market to be a goldmine, saying the technology is poised to take over from a number of human interventions across industries and applications. In today’s column, I’ve tried to simplify the concept of deep learning and made some observations on its future within the ML ecosystem.
Interestingly, the concept of deep learning isn’t new. It is nothing but an extension of the Artificial Neural Network (ANN) algorithms that took the world by fancy in the ’90s but were later marginalized. Why? Because ANN demanded extremely large data sets for training which wasn’t practically possible. With deep learning, ANN has breathed new life. How? Deep learning algorithms train multi-layer ANNs, managing with lesser sized data than earlier expected.
Why do we need deep learning? Well, the answer is really simple. Learning algorithm outcome can be improved by either using more data for prediction or using better algorithms. Deep learning scales to large data sets better than other machine learning (ML) methods for certain applications (we’ll talk about the exceptions in a while) are certainly better. How does that translate into real life situations? Deep learning works better with unlabeled data as it goes beyond natural language processing (NLP), which is mostly limited to entity recognition.
How does deep learning compares to machine learning? In simplest layman terms, deep learning explores the possibilities of neural networks, beyond what other, traditional ML tools do. Those who are bear some familiarity with ML will concur that deep learning algorithms work much better with unlabeled data and riding on their strong feature extraction (deep architecture framework), are more suited for pattern recognition (images, texts, audio) than other tools. A lot of it can be attributed to its ability to reduce the number of free parameters in the models.
But should we say that deep learning will replace all other algorithmic tools? Not necessarily. For a lot of applications, simpler algorithms such as logistic regression and support vector machines will actually be more efficient. Though for certain supervised learning algorithms, deep learning may push conventional methods into extinction. There are also workaround solutions, employing which, one can increase the size of training data sets to make it fit for deep learning algorithms.
Why are proponents excited about deep learning? A primary reason is that it’s a meta-algorithm unlike the linear or kernel models such as support vector machines of logistics regressions. It means that deep learning isn’t characterized by any particular loss function and isn’t constrained by specific formulations. This opens up the algorithm for scientists to play with, and add to it, in ways better than other ML tools. It is a fact that deep learning is transforming feature learning.
Is deep learning the most brain-like algorithm we have? If some analysts are saying that deep learning is more brain-like and other ML tools aren’t – they aren’t exactly telling you the full story. If you’ve heard of Numenta, you’ll know why I’m saying this. Numenta cortical algorithm is based on hierarchical temporal memory (HTM), which succeeded the original concept of sparse distributed memory, a mathematical model to learn long term human memory. How is Numenta better? Its features extraction works in pattern extraction both in time (deep learning doesn’t) as well as computational space dimensions – giving it a brain-like imitation. This can be understood a little differently as well. Numenta sparse distributed memory uses binary data representation (typically 10,000 bits), firing them sparsely for learning data representation (hence the name). The algorithm compares the bits and tries to find patterns in it and searches for matches. In contrast, deep learning uses float vectors (typically 100 elements for each data) and combines algorithms such as gradient descent on multiple layers of network to learn the data representation.
Given this, I often find the ongoing debate between deep learning and Numenta not reasonable. First, let us see what the defendants argue. For starters, some don’t see much learning in Numenta. Others find Numenta HTM apt for unsupervised learning, citing that as a significant advantage over deep learning algorithms. Numenta is also considered to be online learning with leaner time-memory requirements. The fact is Numenta is gaining attention. In April this year, IBM created a research group to test Numenta. But a bigger, and often ignored, fact is that deep learning is a multi-layer model. Most, if not all, neural network models are generally able to solve only one problem type at a time. For multiple problem types, ensemble or mixture models need to be used. And deep learning multi-layer model is a significant advantage in that aspect, not matched by others, including Numenta. However, there is speculative literature on temporal pooling and sensorimotor extensions to HTM for multi-layer modeling in Numenta.
Whatever the outcome be, these are really interesting times for all of us. There is no doubt the technology and the tools will continue on evolving and we might see the debate take a complete different turn a couple of years from now. Demand for better tools will be driven by both tech companies aiming to find meaning in data as well as data gatherers, be it mobile app developers or sensor owner. What can be executed faster and with ease will always get preferred by the users. Deep learning scores heavily on these parameters and is definitely here to stay.