IN GRAMMATICA NIHIL NISI EX CORPORE
|Shimon Edelman||Zach Solan||Eytan Ruppin||David Horn|
Building a computational model. The objective here is to construct an explicit computational model of language learning and processing, which would be capable of acquiring distributional information from a corpus of English language, and of supporting efficient abstraction of syntactic knowledge from the raw representation. The basic data structure is a directed graph, with vertices corresponding initially to words, and later to morphemes extracted using novel probabilistic/structural principles. Our research shows this data structure to be computationally tractable and easy to train. As an essentially distributed network architecture, such a graph also has the advantage of certain biological relevance.
Training the model on corpus data. The model is trained on corpora from the CHILDES compendium [MacWhinney and Snow 1985], which contains material specially relevant to the acquisition of language by children, such as transcripts of parent-child conversations and of popular children TV shows. We monitor the progress of the model's learning at every stage, and compare it, both qualitatively and quantitatively, with the progress of children learning language, as documented by published studies in developmental psychology, such as [Bates and Goodman 1999]. Special attention is paid to the parallel development of lexical and syntactic performance features. The trained model is being subjected to the same tests as our human subjects.
Behavioral studies. In the psycholinguistic part of this research, we work on characterizing the performance of adult human subjects in a variety of syntax-related tasks, including grammaticality judgment, sentence comprehension, and structural similarity scaling (preliminary results from the first class of tasks are already available). The patterns of performance (both across conditions within each experiment, and across experiments) are compared to those produced by the computational model mentioned above.
Game-theoretic evolutionary studies. The distributional approach to the representation of linguistic knowledge has a natural affinity to population-related concepts, which makes it an ideal testbed for the exploration of evolutionary aspects of the emergence of language. We employ methods of evolutionary game theory, using real language data, to study the ability of our graph-based model of syntax to support the coordination of structured representations in a population of communicating agents. Such coordination is a prerequisite for the emergence of a compositional vehicle of communication, which transcends mere exchange of isolated shared-reference symbols, and which thereby becomes a language endowed with a systematic syntax.
SUPPORT. The project is partially funded by a grant from the US-Israel Binational Science Foundation (2002-2006).
JOINING IN. We are looking to recruit graduate students to work on various aspects of this project, psycholinguistic and others.
|Cornell NLP||Edelman's EEP||the ADIOS project|