Unsupervised syntactic category learning from child-directed speech
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010. === Cataloged from PDF version of thesis. === Includes bibliographical references (p. 57-59). === The goal of this research was to discover what kinds of syntactic categories can be l...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | English |
Published: |
Massachusetts Institute of Technology
2011
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/62756 |
id |
ndltd-MIT-oai-dspace.mit.edu-1721.1-62756 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-MIT-oai-dspace.mit.edu-1721.1-627562019-05-02T16:26:24Z Unsupervised syntactic category learning from child-directed speech Wichrowska, Olga N Robert C. Berwick. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010. Cataloged from PDF version of thesis. Includes bibliographical references (p. 57-59). The goal of this research was to discover what kinds of syntactic categories can be learned using distributional analysis on linear context of words, specifically in child-directed speech. The idea behind this is that the categories used by children could very well be different from adult categories. There is some evidence that distributional analysis could be used for some aspects of language acquisition, though very strong arguments exist for why it is not enough to acquire grammar. These experiments can help identify what kind of data can be learned from linear context and statistics only. This paper reports the results of three established automatic syntactic category learning algorithms on a small, edited input set of child-directed speech from the CHILDES database. Hierarchical clustering, K-Means analysis, and an implementation of a substitution algorithm are all used to assign syntactic categories to words based on their linear distributional context. Overall, open classes (nouns, verbs, adjectives) were reliably categorized, and some methods were able to distinguish prepositions, adverbs, subjects vs. objects, and verbs by subcategorization frame. The main barrier standing between these methods and human-like categorization is the inability to deal with the ambiguity that is omnipresent in natural language and poses an important problem for future models of syntactic category acquisition. by Olga N. Wichrowska. M.Eng. 2011-05-09T15:30:47Z 2011-05-09T15:30:47Z 2010 2010 Thesis http://hdl.handle.net/1721.1/62756 717716094 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 59 p. application/pdf Massachusetts Institute of Technology |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Electrical Engineering and Computer Science. |
spellingShingle |
Electrical Engineering and Computer Science. Wichrowska, Olga N Unsupervised syntactic category learning from child-directed speech |
description |
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010. === Cataloged from PDF version of thesis. === Includes bibliographical references (p. 57-59). === The goal of this research was to discover what kinds of syntactic categories can be learned using distributional analysis on linear context of words, specifically in child-directed speech. The idea behind this is that the categories used by children could very well be different from adult categories. There is some evidence that distributional analysis could be used for some aspects of language acquisition, though very strong arguments exist for why it is not enough to acquire grammar. These experiments can help identify what kind of data can be learned from linear context and statistics only. This paper reports the results of three established automatic syntactic category learning algorithms on a small, edited input set of child-directed speech from the CHILDES database. Hierarchical clustering, K-Means analysis, and an implementation of a substitution algorithm are all used to assign syntactic categories to words based on their linear distributional context. Overall, open classes (nouns, verbs, adjectives) were reliably categorized, and some methods were able to distinguish prepositions, adverbs, subjects vs. objects, and verbs by subcategorization frame. The main barrier standing between these methods and human-like categorization is the inability to deal with the ambiguity that is omnipresent in natural language and poses an important problem for future models of syntactic category acquisition. === by Olga N. Wichrowska. === M.Eng. |
author2 |
Robert C. Berwick. |
author_facet |
Robert C. Berwick. Wichrowska, Olga N |
author |
Wichrowska, Olga N |
author_sort |
Wichrowska, Olga N |
title |
Unsupervised syntactic category learning from child-directed speech |
title_short |
Unsupervised syntactic category learning from child-directed speech |
title_full |
Unsupervised syntactic category learning from child-directed speech |
title_fullStr |
Unsupervised syntactic category learning from child-directed speech |
title_full_unstemmed |
Unsupervised syntactic category learning from child-directed speech |
title_sort |
unsupervised syntactic category learning from child-directed speech |
publisher |
Massachusetts Institute of Technology |
publishDate |
2011 |
url |
http://hdl.handle.net/1721.1/62756 |
work_keys_str_mv |
AT wichrowskaolgan unsupervisedsyntacticcategorylearningfromchilddirectedspeech |
_version_ |
1719040960624590848 |