Monolingual and Multilingual Link Detection

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 90 === Link Detection is a task of the project “Topic Detection and Tracking” (TDT). We participate the TDT 2001 evaluation and focus on the monolingual and multilingual link detection tasks. We used the TDT 2 corpus as training data, and evaluated the per...

Full description

Bibliographic Details
Main Authors: Chen, Ying-Ju, 陳盈如
Other Authors: Chen, Hsin-Hsi
Format: Others
Language:en_US
Published: 2002
Online Access:http://ndltd.ncl.edu.tw/handle/52547391068760890190
id ndltd-TW-090NTU00392039
record_format oai_dc
spelling ndltd-TW-090NTU003920392015-10-13T14:38:19Z http://ndltd.ncl.edu.tw/handle/52547391068760890190 Monolingual and Multilingual Link Detection 單語和多語新聞相關性偵測之研究 Chen, Ying-Ju 陳盈如 碩士 國立臺灣大學 資訊工程學研究所 90 Link Detection is a task of the project “Topic Detection and Tracking” (TDT). We participate the TDT 2001 evaluation and focus on the monolingual and multilingual link detection tasks. We used the TDT 2 corpus as training data, and evaluated the performance with the augmented version of TDT 3 corpus. The link detection task is to decide whether two stories discuss the same topic or not. In this thesis, we discuss the story representation. We do experiments to investigate the effect of story expansion and topic segmentation. We extend the monolingual model to the multilingual model. In the multilingual model, translation issue is discussed. The experimental results show that nouns, verbs, adjectives, and compound nouns are useful to represent news stories. Story expansion using historic information is helpful, and we find that assigning expanded terms half of the original weights would be better. We introduce topic segmentation in the link detection task, and the results show that it has a little effect. In the multilingual task, a translation model is needed to capture the difference between languages. We translate Chinese stories into English ones in multilingual pairs. For Chinese pairs, we employ the CILIN to do thesaurus expansion. Stories in different languages have different similarity distributions. Using thresholds to model the differences is shown to be usable. Finally, we show that the performance of the multilingual task is very close to that of the monolingual task. Chen, Hsin-Hsi 陳信希 2002 學位論文 ; thesis 61 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 資訊工程學研究所 === 90 === Link Detection is a task of the project “Topic Detection and Tracking” (TDT). We participate the TDT 2001 evaluation and focus on the monolingual and multilingual link detection tasks. We used the TDT 2 corpus as training data, and evaluated the performance with the augmented version of TDT 3 corpus. The link detection task is to decide whether two stories discuss the same topic or not. In this thesis, we discuss the story representation. We do experiments to investigate the effect of story expansion and topic segmentation. We extend the monolingual model to the multilingual model. In the multilingual model, translation issue is discussed. The experimental results show that nouns, verbs, adjectives, and compound nouns are useful to represent news stories. Story expansion using historic information is helpful, and we find that assigning expanded terms half of the original weights would be better. We introduce topic segmentation in the link detection task, and the results show that it has a little effect. In the multilingual task, a translation model is needed to capture the difference between languages. We translate Chinese stories into English ones in multilingual pairs. For Chinese pairs, we employ the CILIN to do thesaurus expansion. Stories in different languages have different similarity distributions. Using thresholds to model the differences is shown to be usable. Finally, we show that the performance of the multilingual task is very close to that of the monolingual task.
author2 Chen, Hsin-Hsi
author_facet Chen, Hsin-Hsi
Chen, Ying-Ju
陳盈如
author Chen, Ying-Ju
陳盈如
spellingShingle Chen, Ying-Ju
陳盈如
Monolingual and Multilingual Link Detection
author_sort Chen, Ying-Ju
title Monolingual and Multilingual Link Detection
title_short Monolingual and Multilingual Link Detection
title_full Monolingual and Multilingual Link Detection
title_fullStr Monolingual and Multilingual Link Detection
title_full_unstemmed Monolingual and Multilingual Link Detection
title_sort monolingual and multilingual link detection
publishDate 2002
url http://ndltd.ncl.edu.tw/handle/52547391068760890190
work_keys_str_mv AT chenyingju monolingualandmultilinguallinkdetection
AT chényíngrú monolingualandmultilinguallinkdetection
AT chenyingju dānyǔhéduōyǔxīnwénxiāngguānxìngzhēncèzhīyánjiū
AT chényíngrú dānyǔhéduōyǔxīnwénxiāngguānxìngzhēncèzhīyánjiū
_version_ 1717755465016803328