editione1.0.2
Updated November 2, 2022Youβre reading an excerpt of Making Things Think: How AI and Deep Learning Power the Products We Use, by Giuliano Giacaglia. Purchase the book to support the author and the ad-free Holloway reading experience. You get instant digital access, plus future updates.
If after I die, people want to write my biography, there is nothing simpler. They only need two dates: the date of my birth and the date of my death. Between one and another, every day is mine.Fernando Pessoa*
The birth of artificial intelligence was seen with the initial development of neural networks including Frank Rosenblattβs creation of the perceptron model and the first demonstration of supervised learning. That led to the Georgetown-IBM experiment, an early language translation system. Finally, the end of the beginning was marked by the Dartmouth Conference, at which artificial intelligence was officially launched as a field in computer science, leading to the first government funding of AI.
In 1943, Warren S. McCulloch, a neurophysiologist, and Walter Pitts, a mathematical prodigy, created the concept of artificial neural networks. They designed their system based on how our brains work and patterned it after the biological model of how neuronsβbrain cellsβwork with each other. Neurons interact with their extremities, firing signals via their axon across a synapse to neighboring neuronsβ dendrites. Depending on the voltage of this electrical charge, the receiving neuron proceeds to either fire a new charge of electrical pulse to the next set of neurons, or not.
Figure: Artificial neural networks are based on the simple principle of electrical charges and how they are passed in the brain.
The hard part of modeling the correct artificial neural network, that is, one that achieves the task that you are trying to solve, is that you need to figure out what voltage one neuron should pass to another as well as what it takes for a neuron to fire.
Both the voltages and the firing criteria become variables that need to be determined for the model. In an artificial neural network, the voltage that is passed from neuron to neuron is called a weight. These weights need to be trained so that the artificial neural network performs the task at hand. One of the earliest ways to do this is called Hebbian learning, which weβll talk about next.
In 1947, around the same time that Arthur Samuel was working on the first computer that would beat a state checker champion, Donald Hebb, a Canadian psychologist with a PhD from Harvard University, became a Professor of Psychology at McGill University. Hebb would later be the first to develop the idea of neural networks.
In 1949, Hebb developed a theory known as Hebbian learning, which proposes an explanation for how our neurons fire and change when we learn something new. It states that when one neuron fires to another, the connection between them develops or enlarges. That means that whenever two neurons are active together, because of some sensory input or other reason, these neurons tend to become associated.
Therefore, the connections among neurons become stronger or grow when the neurons fire together, making the link between the two neurons harder to break. Hebb explained how that is the way humans learn. Hebbian learning, the process of making connections stronger between neurons that fire together, was the way to create artificial neural networks early on, but later, other techniques became more predominant.
The way this network of neurons become associated with a memory or some pattern that causes all these neurons to fire together became known as an engram. Gordon Allport defines engrams as, βIf the inputs to a system cause the same pattern of activity to occur repeatedly, the set of active elements constituting that pattern will become increasingly strongly inter-associated. That is, each element will tend to turn on every other element and (with negative weights) to turn off the elements that do not form part of the pattern. To put it another way, the pattern as a whole will become βauto-associated.β We may call a learned (auto-associated) pattern an engram.β*
With these models in mind, in the summer of 1951, Marvin Minsky, together with two other scientists, developed the Stochastic Neural Analog Reinforcement Calculator (SNARC)βa machine with a randomly connected neural network of approximately 40 artificial neurons.* The SNARC was built to try and find the exit from a maze in which the machine played the part of the rat.
Minsky, with the help of an American psychologist from Harvard, George Miller, developed the neural network out of vacuum tubes and motors. The machine first proceeded randomly, then the correct choices were reinforced by making it easier for the machine to make those choices again, thus increasing their probability compared to other paths. The device worked and made the imaginary rat find a path to the exit. It turned out that, by an electronic accident, they could simulate two or three rats in the maze at the same time. And, they all found the exit.
Minsky thought that if he βcould build a big enough network, with enough memory loops, it might get lucky and acquire the ability to envision things in its head.β* In 1954, Minsky published his PhD thesis, presenting a mathematical model of neural networks and its application to the brain-model problem.*
This work inspired young students to pursue a similar idea. They sent him letters asking why he did not build a nervous system based on neurons to simulate human intelligence. Minsky figured that this was either a bad idea or would take thousands or millions of neurons to make work.* And at the time, he could not afford to attempt building a machine like that.
In 1956, Frank Rosenblatt implemented an early demonstration of a neural network that could learn how to sort simple images into categories, like triangles and squares.*
Figure: Frank Rosenblatt* and an image with 20x20 pixels.
He built a computer with eight simulated neurons, made from motors and dials, connected to 400 light detectors. Each of the neurons received a set of signals from the light detectors and spat out either a 0 or 1 depending on what those signals added up to.
Rosenblatt used a method called supervised learning, which is a way of saying that the data that the software looks at also has information identifying what type of data it is. For example, if you want to classify images of apples, the software would be shown photos of apples together with the tag βapple.β This approach is much like how toddlers learn basic images.
Figure: The Mark I Perceptron.
Perceptron is a supervised learning algorithm for binary classifiers. Binary classifiers are functions that determine if an input, which can be a vector of numbers, is part of a class.
The perceptron algorithm was first implemented on the Mark I Perceptron. It was connected to a camera that used a 20x20 grid of cadmium sulfide* photocells* producing a 400-pixel image. Different combinations of input features could be experimented with using a patchboard. The array of potentiometers on the right* implemented the adaptive weights.*
Rosenblattβs perceptrons classified images into different categories: triangles, squares, or circles. The New York Times featured his work with the headline βElectronic βBrainβ Teaches Itself.β* His work established the principles of neural networks. Rosenblatt predicted that perceptrons would soon be capable of feats like greeting people by name. The problem is, however, that his algorithm did not work with multiple layers of neurons due to the exponential nature of the learning algorithm: it required too much time for perceptrons to converge to what engineers wanted them to learn. This was eventually solved, years later, by a new algorithm called backpropagation, which weβll cover in the section on deep learning.
A multilayer neural network consists of three or more layers of artificial neuronsβan input layer, an output layer, and at least one hidden layerβarranged so that the output of one layer becomes the input of the next layer.
Figure: A multilayer neural network.
The Georgetown-IBM experiment translated English sentences into Russian and back into English. This demonstration of machine translation happened in 1954 to attract not only public interest but also funding.* This system specialized in organic chemistry and was quite limited, with only six grammar rules. An IBM 701 mainframe computer, designed by Nathaniel Rochester and launched in April 1953, ran the experiment.*
A feature article in the New York Times read, βA public demonstration of what is believed to be the first successful use of a machine to translate meaningful texts from one language to another took place here yesterday afternoon. This may be the cumulation of centuries of search by scholars for a mechanical translator.β
Figure: The Georgetown-IBM experiment translated 250 sentences from English to Russian.
The demo worked in some cases, but it failed for most of the sentences. A way of verifying if the machine translated a phrase correctly was to translate it from English to Russian and then back into English. If the sentence had the same meaning or was similar to the original, then the translation worked. But in the experiment, many sentences ended up different from the original and with an entirely new meaning. For example, given the original sentence βThe spirit is willing, but the flesh is weak,β the result was βThe whiskey is strong, but the meat is rotten.β
The system simply could not understand the meaning, or semantics, of the sentence, making mistakes in translation as a result. The errors mounted, completely losing the original message.
AI was defined as a field of research in computer science in a conference at Dartmouth College in the summer of 1956. Marvin Minsky, John McCarthy, Claude Shannon, and Nathaniel Rochester organized the conference. They would become known as the βfounding fathersβ of artificial intelligence.
At the conference, these researchers wrote a proposal to the US government for funding. They divided the field into six subfields of interest: computers, natural language processing, neural networks, theory of computation, abstraction, and creativity.
From left to right: Trenchard More, John McCarthy, Marvin Minsky, Oliver Selfridge, and Ray Solomonoff.
At the conference, many predicted that a machine as intelligent as a human being would exist in no more than a generation, about 25 years. As you know, that was an overestimation of how quickly development of artificial intelligence would proceed. The workshop lasted six weeks and started the funding boom into AI, which continued for 16 years until what would be called the First AI Winter.
The Defense Advanced Research Projects Agency (DARPA) poured most of the money that went into the field during the period known as the Golden Years in artificial intelligence.
During this βgoldenβ period, the early AI pioneers set out to teach computers to do the same complicated mental tasks that humans do, breaking them into five subfields: reasoning, knowledge representation, planning, natural language processing (NLP), and perception.
These general-sounding terms do have specific technical meanings, still in use today:
Reasoning. When humans are presented with a problem, we can work through a solution using reasoning. This area involved all the tasks involved in that process. Examples include playing chess, solving algebra problems, proving geometry theorems, and diagnosing diseases.
Knowledge representation. In order to solve problems, hold conversations, and understand people, computers must have knowledge about the real world, and that knowledge must be represented in the computer somehow. What are objects, what are people? What is speech? Specific computer languages were invented for the purpose of programming these things into the computer, with Lisp being the most famous. The engineers building Siri had to solve this problem for it to respond to requests.
Planning. Robots must be able to navigate in the world we live in, and that takes planning. Computers must figure out, for example, how to move from point A to point B, how to understand what a door is, and where it is safe to go. This problem is critical for self-driving cars so they can drive around roads.
Natural language processing. Speaking and understanding a language, and forming and understanding sentences are skills needed for machines to communicate with humans. The Georgetown-IBM experiment was an early demonstration of work in this area.
Perception. To interact with the world, computers must be able to perceive it, that is, they need to be able to see, hear, and feel things. Sight was one of the first tasks that computer scientists tackled. The Rosenblatt perceptron was the first system to address such a problem.
The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.Edsger Dijkstra
The Golden Years of AI started with the development of Micro-Worlds by Marvin Minsky as well as John McCarthyβs development of Lisp, the first programming language optimized for artificial intelligence. This era was marked by the creation of the first chatbot, ELIZA, and Shakey, the first robot to move around on its own.
The years after the Dartmouth Conference were an era of discovery. The programs developed during this time were, to most people, simply astonishing. The next 18 years, from 1956 to 1974, were known as the Golden Years.* Most of the work developed in this era was done inside laboratories in universities across the United States. These years marked the development of the important AI labs at the Massachusetts Institute of Technology (MIT), Stanford, Carnegie Mellon University, and Yale. DARPA funded most of this research.*