I’ve written in the past about Deep Learning and Machine learning and how the two provide unimaginable opportunities and consequences for innovation and artificial intelligence. Deep learning offers class of machine learning (ML) techniques that combine the large neural data sets.
Deep learning techniques currently achieve state of the art performance in a multitude of problem domains (vision, audio, robotics, natural language processing, to name a few). Recent advances in Deep Learning also incorporate ideas from statistical learning, reinforcement learning and numerical optimization.
In no particular order, here are some product categories made possible with today’s deep learning techniques: customized data compression, compressive sensing, data-driven sensor calibration, offline AI, human-computer interaction, gaming, artistic assistants, unstructured data mining, voice synthesis.
Customized data compression
Suppose you are designing a video conferencing app and want to come up with a lossy encoding scheme to reduce the number of packets you need to send over the Internet. You could use an off-the-shelf codec like H.264, but H.264 is not optimal because it is calibrated for generic video—anything from cat videos to feature films to clouds. It would be nice if instead we had a video codec that was optimized for specifically FaceTime videos. We can save even more bytes than a generic algorithm if we take advantage of the fact that most of the time, there is a face in the center of the screen. However, designing such an encoding scheme is tricky. How do we specify where the face is positioned, how much eyebrow hair the subject has, what color their eyes are, the shape of their jaw, etc? What if their hair is covering one of their eyes? What if there are zero or multiple faces in the picture?
Deep learning can be applied here. Auto-encoders are a type of algorithms whose output is merely a copy of the input data. Learning this “identity mapping” would be trivial if it weren’t for the fact that the hidden layers of the auto-encoder are chosen to be smaller than the input layer. This “information bottleneck” forces the auto-encoder to learn an compressed representation of the data in the hidden layer, which is then decoded back to the original form by the remaining layers in the network.
Through end-to-end training, auto-encoders and other deep learning techniques adapt to the specific nuances of your data. Unlike principal components analysis, the encoding and decoding steps are not limited to affine (linear) transformations. PCA learns an “encoding linear transform”, while auto-encoders learn a “encoding program”. This makes neural nets far more powerful, and allows for complex, domain-specific compression; anything from storing a gazillion selfies on Facebook, to faster YouTube video streaming, to scientific data compression, to reducing the space needed for your personal iTunes library.
Now, Imagine if your iTunes library learned a “country music” auto-encoder just to compress your personal music collection!
Compressive sensing is closely related to the decoding aspects of lossy compression. Many interesting signals have a particular structure to them—that is, the distribution of signals is not completely arbitrary. This means that we don’t actually have to sample at the Nyquist limit in order to obtain a perfect reconstruction of the signal, as long our decoding algorithm can properly exploit the underlying structure.
Deep learning is applicable here because we can use neural networks to learn the sparse structure without manual feature engineering. Some product applications:
- Super-resolution algorithms (waifu2X), literally an “enhance” button like those from CSI Miami.
- Using WiFi radio wave interference to see people through walls (MIT Wi-Vi).
- Interpreting 3D structure of an object given incomplete observations (such as a 2D image or partial occlusion).
- More accurate reconstructions from sonar / LIDAR data.
Data-driven sensor calibration
Good sensors and measurement devices often rely on expensive, precision-manufactured components.
Take digital cameras, for example. Digital cameras assume the glass lens is of a certain “nice” geometry. When taking a picture, the onboard processor solves the light transport equations through the lens to compute the final image.
If the lens is scratched, or warped or shaped like a bunny (instead of a disc) these assumptions are broken and the images no longer turn out well.
Another example: our current decoding models used in MRI and EEG assume the cranium is a perfect sphere in order to keep the math manageable. This sort of works, but sometimes we miss the location of a tumor by a few mm. More accurate photographic and MRI imaging ought to compensate for geometric deviation, whether they result from underlying sources or manufacturing defects.
Fortunately, deep learning allows us to calibrate our decoding algorithms with data.
Instead of a one-size-fits-all decoding model (such as a Kalman filter), we can express more complex biases specifically tuned to each patient or each measuring device. If our camera lens is scratched, we can train the decoding software to implicitly compensate for the altered geometry. This means we no longer have to manufacture and align sensors with utmost precision, and this saves a lot of money.
In some cases, we can do away with hardware completely and let the decoding algorithm compensate for that; the Columbia Computational Photography lab has developed a kind of camera that doesn’t have a lens. Software-defined imaging, so to speak.
Being able to run AI algorithms without Internet is crucial for apps that have low latency requirements (i.e. self driving cars & robotics) or do not have reliable connectivity (smartphone apps for traveling).
Deep Learning is especially suitable for this. After the training phase, neural networks can run the feed forward step very quickly. Furthermore, it is straightforward to shrink down large neural nets into small ones, until they are portable enough to run on a smartphone (at the expense of some accuracy).
Google has already done this in their offline camera translation feature in Google Translate App.
Some other possibilities:
- Intelligent assistants (e.g. Siri) that retain some functionality even when offline.
- Wilderness survival app that tells you if that plant is poison ivy, or whether those mushrooms are safe to eat.
- Small drones with on-board TPU chips  that can perform simple obstacle avoidance and navigation.
Deep Neural Networks are the first kind of models that can really see and hear our world with an acceptable level of robustness. This opens up a lot of possibilities for Human-Computer Interaction.
Cameras can now be used to read sign language and read books aloud to people. In fact, deep neural networks can now describe to us in full sentences what they see. Baidu’s DuLight project is enabling visually-impaired people to see the world around them through a sight-to-speech earpiece.
Dulight–Eyes for visually impaired
We are not limited to vision-based HCI. Deep learning can help calibrate EEG interfaces for paraplegics to interact with computers more rapidly, or provide more accurate decoding tech for projects like Soli.
Games are computationally challenging because they run physics simulation, AI logic, rendering, and multiplayer interaction together in real time. Many of these components have at least O(N^2) in complexity, so our current algorithms have hit their Moore’s ceiling.
Deep learning pushes the boundaries on what games are capable of in several ways.
Obviously, there’s the “game AI” aspect. In current video games, AI logic for non-playable characters (NPC) are not much more than a bunch of if-then-else statements tweaked to imitate intelligent behavior. This is not clever enough for advanced gamers, and leads to somewhat unchallenging character interaction in single-player mode. Even in multiplayer, a human player is usually the smartest element in the game loop.
This changes with Deep Learning. Google Deepmind’s AlphaGo has shown us that Deep Neural Networks, combined with policy gradient learning, are powerful enough to beat the strongest of human players at complex games like Go. The Deep Learning techniques that drive AlphaGo may soon enable NPCs that can exploit the player’s weaknesses and provide a more engaging gaming experience. Game data from other players can be sent to the cloud for training the AI to learn from its own mistakes.
Another application of deep learning in games is physics simulation. Instead of simulating fluids and particles from first principles, perhaps we can turn the nonlinear dynamics problem into a regression problem. For instance, if we train a neural net to learn the physical rules that govern fluid dynamics, we can evaluate it quickly during gameplay without having to perform large-scale solutions to Navier stokes equations in real time.
For more go to