editione1.0.2
Updated November 2, 2022Youโre reading an excerpt of Making Things Think: How AI and Deep Learning Power the Products We Use, by Giuliano Giacaglia. Purchase the book to support the author and the ad-free Holloway reading experience. You get instant digital access, plus future updates.
John Anderton: Whyโd you catch that?
Danny Witwer: Because it was going to fall.
John Anderton: Youโre certain?
Danny Witwer: Yeah.
John Anderton: But it didnโt fall. You caught it. The fact that you prevented it from happening doesnโt change the fact that it was going to happen.
โMinority Report (2002)
A study in 1981 by James McClelland and David Rumelhart at the University of California, San Diego, showed that the human brain processes information by generating a hypothesis of the input and then updating it as the brain receives data from its senses.* They demonstrated that people are able to identify letters when situated in the context of words, compared to words without that semantic setting.
In 1999, neuroscientists Rajesh Rao and Dana Ballard created a computational model of vision that replicated many well-established receptive field effects.* The paper demonstrated that there could be a generative model of a scene (top-down processing) that received feedback via error signals (how much the visual input varied from prediction), which in turn led to updating the prediction. The process of creating the generative model of the scene is called predictive coding, whereby the brain creates higher-level information and fills in the gaps of what the sensory input generates.
Figure: An example of a sentence that has flipped words. The brain uses predictive coding to correct them.
An example of predictive coding is when you read a sentence that contains a word that is reversed or contains a letter in the middle that should not be there, like in the above image. The brain erases the error, and the sentence seems correct. This happens because the brain expects that the wording is correct when it is first encountered. As our brain processes the sentence, it predicts what should be written and sends that information downstream to the lower levels of the brain. Predictive coding works not only on sentences but also in many different systems inside the brain.
Figure: Predictive coding works in the brain, predicting which images are in the blind spot in peopleโs eyes.
The human eye has a blind spot, which is caused by the lack of visual receptors inside the retina where the optic nerve, which transmits information to the visual cortex, is located. This blind spot does not produce an image in peopleโs brains, but they do not notice the gap because the human brain fills it in in the same way the brain updates an incorrect word in a sentence. The human brain expects the missing part of the image even though it is not there. The brain takes care of filling in images and correcting words subconsciously.
Figure: Demonstration of the blind spot. Close one eye and focus the other on the letter R. Place your eye a distance from the screen approximately equal to three times the distance between the R and the L. Move your eye towards or away from the screen until you notice the letter L disappear.
To demonstrate that the blind spot is present in your eyes, place your eyes a distance equivalent to three times the distance between the R and L in the figure above. Close one of your eyes and focus the other eye on the appropriate letter. If the right eye is open, focus on the R, or vice versa. Move your closer or farther from the screen until the other letter disappears. The letter will disappear due to the eyeโs blind spot.*
Yann LeCun, the Chief Artificial Intelligence Scientist at Facebook AI Research and founder of CNNs, is working on making predictive coding work in computers.*
In computer science, predictive coding is a model of neural networks that generates and updates a model of the environment, predicting what will happen next.
LeCunโs technique is called predictive learning, which alludes to the fact that it is trying to predict what is going to happen in the near future as well as fill in the gaps when information is incomplete or incorrect.* He developed the technique using generative adversarial networks to create a video of what is most likely to happen in the future. To achieve that, LeCunโs software analyzed video frames and, based on those, created the next frames of the video. The technique minimizes how different the generated frames are from the analyzed video frames, a measurement known as distance. For example, if the generated frames contain an image of a cat and the original frames do not, then the distance between the frames will be high. If they contain very similar elements, then the distance is small. Currently, the technique can predict up to the next eight frames in the future, but it is not too unthinkable to see a future where machines can predict future outcomes better than humans.
Figure: The first frame comes from a real video, and a machine predicts the next step of the video in the second frame.
The hippocampus is not only responsible for remembering but also for planning and future thinking, that is, constructing potential scenarios. Patients with hippocampal damage have difficulty imagining the future and are unable to describe fictitious scenes. Moreover, functional magnetic resonance imaging (fMRI) indicates multiple brain areas, including the hippocampus, engaged during remembering as well as imagining events.
Research shows that reversed hippocampal replay more frequently represents novel as opposed to familiar environments. This effect, measured by coactivations of cell pairs, was more pronounced on the first day of exposure to a novel environment than on subsequent days.
Generative adversarial networks serve as a way to construct images and scenarios. In a way, however, GANs decode information. Techniques exist to generate images based on a few parameters. For example, they can generate images of a smiling woman.
Similarly, the process of remembering or imagining the future, which is done by the hippocampus, is sometimes activated by the prefrontal cortex and is seen as decoding information with parameters. GANs consist of two neural networks, one that encodes information and the other that decodes it. In the same way, the human brain has two circuits that encode information from the hippocampus to the prefrontal cortex and decode information in the other direction. It will be no surprise if the same mechanism that trains GANs (and autoencoders) is done in the human brain.
GANs could serve to simulate the real world and are already used to create reproduced images and videos. The problem is that most of the best AI systems are made for game engines. Some argue that the reason why AI systems work so well in games is that game engines are their own version of the world. That means that AI systems can practice and learn in a virtual environment.
In the real world, for example, a self-driving system cannot drive a car off a cliff thousands of times to learn. In fact, a car driving off a cliff is already fatal, and a system that drives off a cliff once cannot work in the real world. Some say that to train an artificial intelligence system, it is necessary to train it in a simulated world. For supervised and unsupervised learning algorithms, the system must see at least 1,000 examples of what itโs trying to learn. Reinforcement learning algorithms also must practice and learn through many cases. Either researchers must create more efficient algorithms that can learn with fewer examples or reproduce many situations in which the system can acquire experience.
For games, you can use the game engine itself to train the system since all the constraints are defined there and already simulate many of these possible scenarios. So, if you design an AI agent to perform in a game, the agent can play multiple different variations that it wants to test and figure out the best move it should make in the future.
The problem with AI agents in the real world is that they are much more challenging to simulate compared to a game. No clear way exists of creating the real world and testing a few hypotheses. GANs might help solve this problem. LeCun is already using them to create future predictions of video frames. They may end up being used for more long-term predictions of the future. And, it would not be a coincidence that the brain also uses the same system for imagination and memory recall.
Humans may run simulations in their minds of possible scenarios and learn from those scenes. For example, they can imagine driving a car and the different situations that would arise based on the actions that they take. What would happen if they turn left instead of right? Some people argue that for computers to function as well as humans, they need to perform something similar. That means that they, with a few variables like turning left or right, can simulate and imagine the scenario and play it out to figure out the best action to take in the future based on that situation.