Unsupervised syntactic category learning from child-directed speech

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010. === Cataloged from PDF version of thesis. === Includes bibliographical references (p. 57-59). === The goal of this research was to discover what kinds of syntactic categories can be l...

Full description

Bibliographic Details
Main Author: Wichrowska, Olga N
Other Authors: Robert C. Berwick.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2011
Subjects:
Online Access:http://hdl.handle.net/1721.1/62756
Description
Summary:Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010. === Cataloged from PDF version of thesis. === Includes bibliographical references (p. 57-59). === The goal of this research was to discover what kinds of syntactic categories can be learned using distributional analysis on linear context of words, specifically in child-directed speech. The idea behind this is that the categories used by children could very well be different from adult categories. There is some evidence that distributional analysis could be used for some aspects of language acquisition, though very strong arguments exist for why it is not enough to acquire grammar. These experiments can help identify what kind of data can be learned from linear context and statistics only. This paper reports the results of three established automatic syntactic category learning algorithms on a small, edited input set of child-directed speech from the CHILDES database. Hierarchical clustering, K-Means analysis, and an implementation of a substitution algorithm are all used to assign syntactic categories to words based on their linear distributional context. Overall, open classes (nouns, verbs, adjectives) were reliably categorized, and some methods were able to distinguish prepositions, adverbs, subjects vs. objects, and verbs by subcategorization frame. The main barrier standing between these methods and human-like categorization is the inability to deal with the ambiguity that is omnipresent in natural language and poses an important problem for future models of syntactic category acquisition. === by Olga N. Wichrowska. === M.Eng.