Roboticist; Panasonic Professor of Robotics (emeritus) , MIT; Founder, Chairman & CTO, Rethink Robotics; Author, Flesh and Machines
Mistaking Performance For Competence Misleads Estimates Of AI's 21st Century Promise And Danger


"Think" and "intelligence" are both what Marvin Minsky has called suitcase words. They are words into which we pack many meanings so that we can talk about complex issues in a shorthand way. When we look inside these words we find many different aspects, mechanisms, and levels of understanding. This makes answering the perennial questions of "can machines think?" or "when will machines reach human level intelligence?" fraught with danger. The suitcase words are used to cover both specific performance demonstrations by machines and more general competence that humans might have. People are getting confused and generalizing from performance to competence and grossly overestimating the real capabilities of machines today and in the next few decades.

In 1997 a super computer beat world chess champion Garry Kasparov in a tournament. Today there are dozens of programs that run on laptop computers and have higher chess rankings than those ever achieved by humans. Computers can definitely perform better than humans at playing chess. But they have nowhere near human level competence at chess.

All chess playing programs use Turing's brute force tree search method with heuristic evaluation. Computers were fast enough by the seventies that this approach overwhelmed other AI programs that tried to play chess with processes that emulated how people reported that they thought about their next move, and so those approaches were largely abandoned.

Today's chess programs have no way of saying why a particular move is "better" than another move, save that it moves the game to a part of a tree where the opponent has less good options. A human player can make generalizations and describe why certain types of moves are good, and use that to teach a human player. Brute force programs cannot teach a human player, except by being a sparing partner. It is up to the human to make the inferences, the analogies, and to do any learning on their own. The chess program doesn't know that it is outsmarting the person, doesn't know that it is a teaching aid, doesn't know that it is playing something called chess nor even what "playing" is. Making brute force chess playing perform better than any human gets us no closer to competence in chess.

Now consider deep learning that has caught people's imaginations over the last year or so. It is an update to backpropagation, a thirty-year old learning algorithm very loosely based on abstracted models of neurons. Layers of neurons map from a signal, such as amplitude of a sound wave or pixel brightness in an image, to increasingly higher-level descriptions of the full meaning of the signal, as words for sound, or objects in images. Originally backpropagation could only practically work with just two or three layers of neurons, so it was necessary to fix preprocessing steps to get the signals to more structured data before applying the learning algorithms. The new versions work with more layers of neurons, making the networks deeper, hence the name, deep learning. Now early processing steps are also learned, and without misguided human biases of design, the new algorithms are spectacularly better than the algorithms of just three years ago. That is why they have caught people's imaginations. The new versions rely on massive amounts of computer power in server farms, and on very large data sets that did not formerly exist, but critically, they also rely on new scientific innovations.

A well-known particular example of their performance is labeling an image, in English, saying that it is a baby with a stuffed toy. When a person looks at the image that is what they also see. The algorithm has performed very well at labeling the image, and it has performed much better than AI practitioners would have predicted for 2014 performance only five years ago. But the algorithm does not have the full competence that a person who could label that same image would have.

The learning algorithm knows there is a baby in the image but it doesn't know the structure of a baby, and it doesn't know where the baby is in the image. A current deep learning algorithm can only assign probabilities to each pixel that that particular pixel is part of a baby. Whereas a person can see that the baby occupies the middle quarter of the image, today's algorithm has only a probabilistic idea of its spatial extent. It cannot apply an exclusionary rule and say that non-zero probability pixels at extremes of the image cannot both be part of the baby. If we look inside the neuron layers it might be that one of the higher level learned features is an eye-like patch of image, and another feature is a foot-like patch of image, but the current algorithm would have no capability of relating the constraints of where and what spatial relationships could possibly be valid between eyes and feet in an image, and could be fooled by a grotesque collage of baby body parts, labeling it a baby. In contrast no person would do so, and furthermore would immediately know exactly what it was—a grotesque collage of baby body parts. Furthermore the current algorithm is completely useless at telling a robot where to go in space to pick up that baby, or where to hold a bottle and feed the baby, or where to reach to change its diaper. Today's algorithm has nothing like human level competence on understanding images.

Work is underway to add focus of attention and handling of consistent spatial structure to deep learning. That is the hard work of science and research, and we really have no idea how hard it will be, nor how long it will take, nor whether the whole approach will reach a fatal dead end. It took thirty years to go from backpropagation to deep learning, but along the way many researchers were sure there was no future in backpropagation. They were wrong, but it would not have been surprising if they had been right, as we knew all along that the backpropagation algorithm is not what happens inside people's heads.

The fears of runaway AI systems either conquering humans or making them irrelevant are not even remotely well grounded. Misled by suitcase words, people are making category errors in fungibility of capabilities. These category errors are comparable to seeing more efficient internal combustion engines appearing and jumping to the conclusion that warp drives are just around the corner.