A Thesaurus-Based Semantic Classification of English Collocations

碩士 === 國立清華大學 === 資訊系統與應用研究所 === 96 === New computational tools for extracting collocations are a great boon to both language learners and lexicographers alike. A new method is proposed in this paper to organize the extremely numerous collocates that these tools can return into semantic thesaurus ca...

Full description

Bibliographic Details
Main Authors: Kate H. Kao, 高紅雯
Other Authors: Jason S. Chang
Format: Others
Language:en_US
Published: 2008
Online Access:http://ndltd.ncl.edu.tw/handle/71929995396508573779
id ndltd-TW-096NTHU5394003
record_format oai_dc
spelling ndltd-TW-096NTHU53940032015-10-13T14:08:35Z http://ndltd.ncl.edu.tw/handle/71929995396508573779 A Thesaurus-Based Semantic Classification of English Collocations 建立英語分類搭配詞典:搭配詞之語意分類與標示 Kate H. Kao 高紅雯 碩士 國立清華大學 資訊系統與應用研究所 96 New computational tools for extracting collocations are a great boon to both language learners and lexicographers alike. A new method is proposed in this paper to organize the extremely numerous collocates that these tools can return into semantic thesaurus categories. The approach introduces a thesaurus-based semantic classification model automatically learning semantic relations for classifying adjective-noun (A-N) and verb-noun (V-N) collocations into different categories. As it is most relevant to language learners, the research focuses on the frequent patterns of collocation errors, A-N and V-N collocation pairs. Our model uses a random walk over vertices and edges on a weighted graph derived from WordNet semantic relations. We compute a semantic label stationary distribution via an iterative graphical algorithm. Semantic label of a collocate is scored by a novel divergence measure that imposes a thesaurus structure on collocation reference tools. In our experiment the resulting semantic relatedness is the WordNet-based measure, most highly correlated with human similarity judgments. The evaluation is conducted on a set of collocations whose collocates involve varying level of abstractness in the collocation usage box of Macmillan English Dictionary. We present our experimental evaluation with a collection of 150 multiple-choice questions commonly used as a similarity benchmark in TOEFL synonym test. The experimental results show that a thesaurus structure is successfully imposed to help enhance collocation production for L2 learners and significantly outperform existing collocation reference tools. The resulting semantic classification establishes close consistency among human judgments as fairly refined examples for evaluation of the model. The methodology neatly improves the performance of collocation reference tools and imposes semantic structure to collocations, which is a good starting point for a much improved and useful presentation of collocations and has been lived up to have positive consequences on robustness for semantic classification for collocations, an attractive feature for organizing broad-coverage machine-readable data to be merged together for catalogued usages of natural language processing. Jason S. Chang 張俊盛 2008 學位論文 ; thesis 79 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立清華大學 === 資訊系統與應用研究所 === 96 === New computational tools for extracting collocations are a great boon to both language learners and lexicographers alike. A new method is proposed in this paper to organize the extremely numerous collocates that these tools can return into semantic thesaurus categories. The approach introduces a thesaurus-based semantic classification model automatically learning semantic relations for classifying adjective-noun (A-N) and verb-noun (V-N) collocations into different categories. As it is most relevant to language learners, the research focuses on the frequent patterns of collocation errors, A-N and V-N collocation pairs. Our model uses a random walk over vertices and edges on a weighted graph derived from WordNet semantic relations. We compute a semantic label stationary distribution via an iterative graphical algorithm. Semantic label of a collocate is scored by a novel divergence measure that imposes a thesaurus structure on collocation reference tools. In our experiment the resulting semantic relatedness is the WordNet-based measure, most highly correlated with human similarity judgments. The evaluation is conducted on a set of collocations whose collocates involve varying level of abstractness in the collocation usage box of Macmillan English Dictionary. We present our experimental evaluation with a collection of 150 multiple-choice questions commonly used as a similarity benchmark in TOEFL synonym test. The experimental results show that a thesaurus structure is successfully imposed to help enhance collocation production for L2 learners and significantly outperform existing collocation reference tools. The resulting semantic classification establishes close consistency among human judgments as fairly refined examples for evaluation of the model. The methodology neatly improves the performance of collocation reference tools and imposes semantic structure to collocations, which is a good starting point for a much improved and useful presentation of collocations and has been lived up to have positive consequences on robustness for semantic classification for collocations, an attractive feature for organizing broad-coverage machine-readable data to be merged together for catalogued usages of natural language processing.
author2 Jason S. Chang
author_facet Jason S. Chang
Kate H. Kao
高紅雯
author Kate H. Kao
高紅雯
spellingShingle Kate H. Kao
高紅雯
A Thesaurus-Based Semantic Classification of English Collocations
author_sort Kate H. Kao
title A Thesaurus-Based Semantic Classification of English Collocations
title_short A Thesaurus-Based Semantic Classification of English Collocations
title_full A Thesaurus-Based Semantic Classification of English Collocations
title_fullStr A Thesaurus-Based Semantic Classification of English Collocations
title_full_unstemmed A Thesaurus-Based Semantic Classification of English Collocations
title_sort thesaurus-based semantic classification of english collocations
publishDate 2008
url http://ndltd.ncl.edu.tw/handle/71929995396508573779
work_keys_str_mv AT katehkao athesaurusbasedsemanticclassificationofenglishcollocations
AT gāohóngwén athesaurusbasedsemanticclassificationofenglishcollocations
AT katehkao jiànlìyīngyǔfēnlèidāpèicídiǎndāpèicízhīyǔyìfēnlèiyǔbiāoshì
AT gāohóngwén jiànlìyīngyǔfēnlèidāpèicídiǎndāpèicízhīyǔyìfēnlèiyǔbiāoshì
AT katehkao thesaurusbasedsemanticclassificationofenglishcollocations
AT gāohóngwén thesaurusbasedsemanticclassificationofenglishcollocations
_version_ 1717749319990247424