Two of the biggest names in AI/Learning, Noam Chomsky--who needs no introduction--and Peter Norvig--Director of Research at Google and well-known author of AI textbooks--have been debating about their favored approaches with increasing acrimony, as is detailed in the linked article ("Norvig vs. Chomsky and the Fight for the Future of AI" by Kevin Gold, tor.com, 11Jun2011). To oversimplify, their positions are:
- Chomsky has spent a half century building an ever more complex universal grammar for how languages are put together as a mechanism for "understanding" language, and argues that humans (at least) have a mechanism for utilizing this grammar to enable language and learning by being able to map the parts of the language onto concepts and information.
- Norvig has--quite successfully--shown that ignoring grammar and simply using neural network technology along with massive amounts of data (like what Google Hoovers up on the internet) can be used to map anything in one language into any other. This approach has been used not only for language translation, but to create the quite impressive Jeopardy-playing contraption "Watson" at IBM and many other technologies that seek to allow machine "understanding".
Chomsky, one of the old guard, wishes for an elegant theory of intelligence and language that looks past human fallibility to try to see simple structure underneath. Norvig, meanwhile, represents the new philosophy: truth by statistics, and simplicity be damned. Disillusioned with simple models, or even Chomsky’s relatively complex models, Norvig has of late been arguing that with enough data, attempting to fit any simple model at all is pointless. The disagreement between the two men points to how the rise of the Internet poses the same challenge to artificial intelligence that it has to human intelligence: why learn anything when you can look it up?
What occurred to me in reading this, is that the two sides are really talking right past one another, mainly because they lack agreement on what the *goal* of all this is. Chomsky is closer to understanding this (which is no surprise to me because he's a cognitive scientist, not just a computer scientist):
Norvig is obviously arguing for "going with what works":
Chomsky started the current argument with some remarks made at a symposium commemorating MIT’s 150th birthday. According to MIT’s Technology Review,
Chomsky derided researchers in machine learning who use purely statistical methods to produce behavior that mimics something in the world, but who don’t try to understand the meaning of that behavior. Chomsky compared such researchers to scientists who might study the dance made by a bee returning to the hive, and who could produce a statistically based simulation of such a dance without attempting to understand why the bee behaved that way. “That’s a notion of [scientific] success that’s very novel. I don’t know of anything like it in the history of science,” said Chomsky.
...with enough data from the internet, you can reason statistically about what the next word in a sentence will be, right down to its conjugation, without necessarily knowing any grammatical rules or word meanings at all. The limited understanding employed in this approach is why machine translation occasionally delivers amusingly bad results. But the Google approach to this problem is not to develop a more sophisticated understanding of language; it is to try to get more data, and build bigger lookup tables. Perhaps somewhere on the internet, somebody has said exactly what you are saying right now, and all we need to do is go find it. AIs attempting to use language in this way are like elementary school children googling the answers to their math homework: they might find the answer, but one can’t help but feel it doesn’t serve them well in the long term.
In his essay, Norvig argues that there are ways of doing statistical reasoning that are more sophisticated than looking at just the previous one or two words, even if they aren’t applied as often in practice. But his fundamental stance, which he calls the “algorithmic modeling culture,” is to believe that “nature’s black box cannot necessarily be described by a simple model.” He likens Chomsky’s quest for a more beautiful model to Platonic mysticism, and he compares Chomsky to Bill O’Reilly in his lack of satisfaction with answers that work. (emphasis Buffy)
When I think about how to translate these two points of view, I see a huge difference in the two combatants goals:
- Chomsky is looking for a way to get a base of code to logically describe knowledge about the world: something that you can actually learn in an abstract way and then apply *precisely* because it is described in an abstract manner. That is the software actually "understands" how things work and can "use" that knowledge in new and creative ways.
- Norvig is looking for a way to gather together enough data, such that programs produce "correct results" but "understanding" is really irrelevant: no matter what you want the software to do, there's an analogue out there that's close enough that you'll get the right result a percentage of the time that is proportional to the amount of modelling data that you can get your hands on.
When I think about Norvig's argument I can't help but think of the old saw about if you have enough monkey's typing on enough typewriters, eventually one of them will type the entire works of Shakespeare, the problem being the definition of the word "enough". Norvig's logic really completely depends on--to quote Shakepeare--there being (almost) nothing new under the sun. Unless you have something "close" to a desired result in your learning set, your neural network is unlikely to produce that result.
This really points at the more limited definition of "intelligence" that we get from Turing: if an observer cannot distinguish between a human and a computer, then we can call it "intelligent". Watson playing Jeopardy was an excellent example of this, and we all marveled at how human Watson could be *in real time*. But given enough time spent observing, enough of those hilariously off answers would creep in and start to allow that observer to fail Watson eventually. All that Norvig's approach of "just get more data to have it learn from" does is increase the amount of observing time necessary for it to hit the hilarious failure.
Moreover it cannot be overemphasized that projects like Watson require huge amounts of time and resources to solve an *extremely limited problem set*. Yes, Ken Jennings no longer even has to think because of all the money he won on Jeopardy, but just knowing how to play Jeopardy would not allow him to write his autobiography or be the producer of another game show. It is exactly that inability to *use* all that "knowledge" that is the limitation of the statistical approach.
Now the other important point here is that it's not that Norvig's preferred technology is any more of a dead end than Chomsky's: remember that the brain is a huge neural network and it *does indeed* implement the more sophisticated form of intelligence that Chomsky is seeking to harness. But the point is that the neural network in the case of the brain is used to *implement* a conceptual framework for that intelligence. Once you start to think about the fact that silicon-ware vs. wet-ware is a *platform issue*, you realize it has nothing do with the program that's implemented on top of that hardware (in either silicon or neurons) designed to implement "generalized intelligence." Norvig is using the neuron model directly to sift through data to simply ensure that within some well-defined problem set that "reasonable" answers are obtained. Going outside the problem set runs into the same problems that the grammar/logic folks ran into decades ago with the recognition of the need for "world knowledge" to achieve "generalized intelligence".
That is to say, having neural networks is not sufficient to achieve such generalized intelligence, there's got to be something programmed in to the network that actually implements a system for dealing with abstract concepts. Since a brain could do it, it could indeed be all neural nets, but it might take 200 million years of trials to develop, just as it did to get to our brains. Chomsky is in essence arguing that if we can figure out what that "system for dealing with abstract concepts" is, we could short circuit the process and maybe get it done in our lifetimes, and deal with a nagging hole in the "pure neural network" approach that is coming not from the computer or cognitive science fields, but that other favorite topic of Chomsky's: pubic policy which I will get to in a second.
Having spent quite a bit of time with both technologies, I have to say I get really tired of the debate, because as I see it, any true generalized intelligence is going to require BOTH approaches. Neural networks are excellent for tuning and optimizing behavior using real-world feedback loops to implement solutions for limited problem sets. But if you're going to put the big pieces together, you absolutely are going to need logical/semantic programming.
The Public Policy issue that has flared recently has to do with how we deal with "robots" that are autonomous. The two most notable examples are self-driving cars and military drones. With both there is an increasing desire to have these operate without human intervention, either because the human cannot be trusted (e.g. a driver who's had too many to drink), or human intervention is increasingly impractical (military drones needing to be able to do without human input due to communications delays). The question becomes, as a legal and moral and optimized outcome, when can we ENTIRELY trust the computer to "do the right thing?" Issac Asimov famously posited that we needed to logically program in his Laws of Robotics, but it's not entirely clear how its possible to merge such logic into a black box neural network with any assurance that the logic would be obeyed, when the network could have some data that simply avoids all the tests for adherence to that logic.
Unfortunately, it seems to me that that breakthrough of merging logical/conceptual frameworks with statistically based modules is what we really need before we have "real artificial intelligence."
People locked into such scientific battles like to hear "you're both right" even less than "he's right and you're wrong." But lets hope for (and lobby for!) just that sort of change in thinking.
Colorless green ideas sleep furiously,