A Multi-layered Resolution for Disambiguation: Insight from Mandarin Verbs

碩士 === 國立交通大學 === 外國文學與語言學碩士班 === 94 === Abstract This study explores how multiple senses of polysemous words could be distinguished. It proposes a hybrid and corpus-based linguistic model and specifies the procedures to build an automatic tagger for sense disambiguation based on Mandarin verbs. It...

Full description

Bibliographic Details
Main Authors: Yaling Hsu, 徐雅苓
Other Authors: Me-Chun Liu
Format: Others
Language:en_US
Published: 2006
Online Access:http://ndltd.ncl.edu.tw/handle/90695539503509672760
id ndltd-TW-094NCTU5462006
record_format oai_dc
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立交通大學 === 外國文學與語言學碩士班 === 94 === Abstract This study explores how multiple senses of polysemous words could be distinguished. It proposes a hybrid and corpus-based linguistic model and specifies the procedures to build an automatic tagger for sense disambiguation based on Mandarin verbs. It seeks to provide a linguistically motivated solution for detecting meaning with the aid of linguistic theories such as Frame Semantics (Fillmore and Atkins 1992 ), Construction Grammar (Goldberg 1996) and discourse analysis (Hopper and Thompson 1980). Being an essential property of the lexicon, polysemy is the key to understanding the interplay between syntax, semantics and pragmatics. Although polysemy has been investigated in a number of approaches, including classical feature analysis, prototype theory, frame-based approach, relational approach, and so on, a systematic and applicable solution is still lacking. Recently, working on Mandarin lexical semantics, Liu and Wu (2004) proposed a frame-based perspective in viewing polysemy as belong to different ‘frames’, which is defined by Fillmore and Atkins (1992). Making use of the distinctions in frame elements and their grammatical realizations, Liu and Wu (2004) is able to show that semantic differences may be attributed to different semantic frames the verb belongs to, following ‘the one sense, one frame’ hypothesis. However, there are cases where two separate meanings of the same verb may show exactly the same surface patterns with the same sets of frame elements. For example, in the case of the motion verb NA拿, two separate senses may end up with the same number and pattern of frame elements, as shown in (1): (1) Agent < V <Theme: a. …病人[Agent] 拿 著 健保卡[Theme] 上門… (sense 1 ‘carrying’) bing ren na zhe bao jian ka shang men patien take ZHE health insurance card up door ‘The patient carried the health insurance card to the counter.’ b.…我[Agent]可不可以順道 拿 個 研究學位[Theme]?(sense 2 ‘getting’) wo ke bu ke yi shun dao na ge yan jiu xue wei I can not can by the way take CL research academic degree ‘By the way, can I get an academic research degree?’ Therefore, it is apparent that a purely frame-based approach may be insufficient in dealing with polysemes. When frame elements fail to provide determining clues, what else should be taken into consideration? The model proposed in this study calls for consideration of two other variables: colloconstructions and contextual dependencies. This study aims to propose a hybrid multi-module solution to identify the most appropriate lexical sense in various expressions of a polyseme. The hybrid approach can be viewed as a sense disambiguating model based on three steps: 1) frame-based distinction, 2) colloconstruction distinction, and 3) contextual dependence distinction. The study is based on naturally occurring data extracted from the Sinica Balanced Corpus, which is established by the CKIP (Chinese Knowledge and Information Processing) group at Academia Sinica and open to the public at the Internet site: http://www.sinica.edu.tw/ftms-bin/kiwi.sh/. Given the high frequency of occurrences of the target words, only 200 entries are examined closely for the discussion. Corpus data provide explicit and implicit distributional tendencies which may go beyond native speaker’s intuition. Using corpus data as the input, the first step of the proposed model is to identify the senses of a polysemous word corresponding to the distinctions in semantic frames, following FrameNet. The extracted data from Sinica Corpus can be roughly classified into several frames by their basic patterns of expressing the core frame elements (arguments). When distinctions of frame elements and their basic patterns fail, senses are further identified by the second module - Colloconstrucion. In this step, attention is paid to the collocational patterns of non-core arguments. These non-core arguments can be classified into various syntactic categories, such as adverbials, adjectives, aspectual markers, and so forth. And frequent collocates, be it grammatical or lexical, will be identified with each individual sense. However, when colloconstruction fails to indicate any decisive cues, the third module - contextual information is called upon. In this module, the relevant contextual elements are thoroughly searched to establish a relational link within or cross clausal boundaries. The relational link may be established by any semantic/pragmatic associations between the polyseme and the contextual element that a larger semantic taxonomy, such as SUMO synsets (translated in BOW). To demonstrate the model, four sets of verbs (zou 走, na 拿, ting 聽, kan看) will be used as illustrations. By redefining polysemy with operational mechanisms, this study successfully provides a linguistic model with theoretical validity to develop a computational system for sense disambiguation.
author2 Me-Chun Liu
author_facet Me-Chun Liu
Yaling Hsu
徐雅苓
author Yaling Hsu
徐雅苓
spellingShingle Yaling Hsu
徐雅苓
A Multi-layered Resolution for Disambiguation: Insight from Mandarin Verbs
author_sort Yaling Hsu
title A Multi-layered Resolution for Disambiguation: Insight from Mandarin Verbs
title_short A Multi-layered Resolution for Disambiguation: Insight from Mandarin Verbs
title_full A Multi-layered Resolution for Disambiguation: Insight from Mandarin Verbs
title_fullStr A Multi-layered Resolution for Disambiguation: Insight from Mandarin Verbs
title_full_unstemmed A Multi-layered Resolution for Disambiguation: Insight from Mandarin Verbs
title_sort multi-layered resolution for disambiguation: insight from mandarin verbs
publishDate 2006
url http://ndltd.ncl.edu.tw/handle/90695539503509672760
work_keys_str_mv AT yalinghsu amultilayeredresolutionfordisambiguationinsightfrommandarinverbs
AT xúyǎlíng amultilayeredresolutionfordisambiguationinsightfrommandarinverbs
AT yalinghsu qíyìxiànxiàngdeduōcéngcìfēnxījiàgòuyóuzhōngwéndòngcíchūfā
AT xúyǎlíng qíyìxiànxiàngdeduōcéngcìfēnxījiàgòuyóuzhōngwéndòngcíchūfā
AT yalinghsu multilayeredresolutionfordisambiguationinsightfrommandarinverbs
AT xúyǎlíng multilayeredresolutionfordisambiguationinsightfrommandarinverbs
_version_ 1718133573229215744
spelling ndltd-TW-094NCTU54620062015-11-23T04:02:53Z http://ndltd.ncl.edu.tw/handle/90695539503509672760 A Multi-layered Resolution for Disambiguation: Insight from Mandarin Verbs 歧義現象的多層次分析架構-由中文動詞出發 Yaling Hsu 徐雅苓 碩士 國立交通大學 外國文學與語言學碩士班 94 Abstract This study explores how multiple senses of polysemous words could be distinguished. It proposes a hybrid and corpus-based linguistic model and specifies the procedures to build an automatic tagger for sense disambiguation based on Mandarin verbs. It seeks to provide a linguistically motivated solution for detecting meaning with the aid of linguistic theories such as Frame Semantics (Fillmore and Atkins 1992 ), Construction Grammar (Goldberg 1996) and discourse analysis (Hopper and Thompson 1980). Being an essential property of the lexicon, polysemy is the key to understanding the interplay between syntax, semantics and pragmatics. Although polysemy has been investigated in a number of approaches, including classical feature analysis, prototype theory, frame-based approach, relational approach, and so on, a systematic and applicable solution is still lacking. Recently, working on Mandarin lexical semantics, Liu and Wu (2004) proposed a frame-based perspective in viewing polysemy as belong to different ‘frames’, which is defined by Fillmore and Atkins (1992). Making use of the distinctions in frame elements and their grammatical realizations, Liu and Wu (2004) is able to show that semantic differences may be attributed to different semantic frames the verb belongs to, following ‘the one sense, one frame’ hypothesis. However, there are cases where two separate meanings of the same verb may show exactly the same surface patterns with the same sets of frame elements. For example, in the case of the motion verb NA拿, two separate senses may end up with the same number and pattern of frame elements, as shown in (1): (1) Agent < V <Theme: a. …病人[Agent] 拿 著 健保卡[Theme] 上門… (sense 1 ‘carrying’) bing ren na zhe bao jian ka shang men patien take ZHE health insurance card up door ‘The patient carried the health insurance card to the counter.’ b.…我[Agent]可不可以順道 拿 個 研究學位[Theme]?(sense 2 ‘getting’) wo ke bu ke yi shun dao na ge yan jiu xue wei I can not can by the way take CL research academic degree ‘By the way, can I get an academic research degree?’ Therefore, it is apparent that a purely frame-based approach may be insufficient in dealing with polysemes. When frame elements fail to provide determining clues, what else should be taken into consideration? The model proposed in this study calls for consideration of two other variables: colloconstructions and contextual dependencies. This study aims to propose a hybrid multi-module solution to identify the most appropriate lexical sense in various expressions of a polyseme. The hybrid approach can be viewed as a sense disambiguating model based on three steps: 1) frame-based distinction, 2) colloconstruction distinction, and 3) contextual dependence distinction. The study is based on naturally occurring data extracted from the Sinica Balanced Corpus, which is established by the CKIP (Chinese Knowledge and Information Processing) group at Academia Sinica and open to the public at the Internet site: http://www.sinica.edu.tw/ftms-bin/kiwi.sh/. Given the high frequency of occurrences of the target words, only 200 entries are examined closely for the discussion. Corpus data provide explicit and implicit distributional tendencies which may go beyond native speaker’s intuition. Using corpus data as the input, the first step of the proposed model is to identify the senses of a polysemous word corresponding to the distinctions in semantic frames, following FrameNet. The extracted data from Sinica Corpus can be roughly classified into several frames by their basic patterns of expressing the core frame elements (arguments). When distinctions of frame elements and their basic patterns fail, senses are further identified by the second module - Colloconstrucion. In this step, attention is paid to the collocational patterns of non-core arguments. These non-core arguments can be classified into various syntactic categories, such as adverbials, adjectives, aspectual markers, and so forth. And frequent collocates, be it grammatical or lexical, will be identified with each individual sense. However, when colloconstruction fails to indicate any decisive cues, the third module - contextual information is called upon. In this module, the relevant contextual elements are thoroughly searched to establish a relational link within or cross clausal boundaries. The relational link may be established by any semantic/pragmatic associations between the polyseme and the contextual element that a larger semantic taxonomy, such as SUMO synsets (translated in BOW). To demonstrate the model, four sets of verbs (zou 走, na 拿, ting 聽, kan看) will be used as illustrations. By redefining polysemy with operational mechanisms, this study successfully provides a linguistic model with theoretical validity to develop a computational system for sense disambiguation. Me-Chun Liu 劉美君 2006 學位論文 ; thesis 107 en_US