SAIL Room - 111 Levin Building
Department of Brain and Cognitive Sciences
Massachusetts Institute of Technology
On what you can't learn from (merely) all the data in the world, and what else is needed
Recent successes with recurrent neural networks and other big-data techniques in AI applications raise the question of whether similar approaches might explain human language acquisition. How far can the data of language take us alone, with little other structure? I will first describe some experiments testing RNN models developed by Google that can perform some truly impressive feats in language technology, yet at the same time fail a number of basic tests of understanding syntax and semantics that cognitive scientists have long been interested in, as well as some new benchmarks that we have come up with. They often fail for interesting reasons, based on the differences between their linear (sequential) processing architecture and the hierarchical structure of thought, their emphasis on character-level modeling as opposed to words and phrases, and their lack of interfaces to core cognition outside language. Their successes and failures illustrate how both advocates and critics of early statistical language learning were correct — Chomsky and Gleitman and Pinker were right after all, but Elman and Hinton were also right. They were just right about different things, and we can learn much by reinterpreting early debates. As a way forward, I argue for combining smart statistics with more structured, hierarchical representations, interfacing to a cognitively grounded semantics. I report some promising results, although we are far from being able to implement this at the scale Google requires. I will also sketch ideas for how RNNs can make these more structured approaches work better, with the hope of integrating these often-opposing traditions to best make progress.
The talk will begin at 12:00pm. A pizza lunch will be served at 11:45am.