Monolingual and Multilingual Link Detection
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 90 === Link Detection is a task of the project “Topic Detection and Tracking” (TDT). We participate the TDT 2001 evaluation and focus on the monolingual and multilingual link detection tasks. We used the TDT 2 corpus as training data, and evaluated the per...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2002
|
Online Access: | http://ndltd.ncl.edu.tw/handle/52547391068760890190 |
id |
ndltd-TW-090NTU00392039 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-090NTU003920392015-10-13T14:38:19Z http://ndltd.ncl.edu.tw/handle/52547391068760890190 Monolingual and Multilingual Link Detection 單語和多語新聞相關性偵測之研究 Chen, Ying-Ju 陳盈如 碩士 國立臺灣大學 資訊工程學研究所 90 Link Detection is a task of the project “Topic Detection and Tracking” (TDT). We participate the TDT 2001 evaluation and focus on the monolingual and multilingual link detection tasks. We used the TDT 2 corpus as training data, and evaluated the performance with the augmented version of TDT 3 corpus. The link detection task is to decide whether two stories discuss the same topic or not. In this thesis, we discuss the story representation. We do experiments to investigate the effect of story expansion and topic segmentation. We extend the monolingual model to the multilingual model. In the multilingual model, translation issue is discussed. The experimental results show that nouns, verbs, adjectives, and compound nouns are useful to represent news stories. Story expansion using historic information is helpful, and we find that assigning expanded terms half of the original weights would be better. We introduce topic segmentation in the link detection task, and the results show that it has a little effect. In the multilingual task, a translation model is needed to capture the difference between languages. We translate Chinese stories into English ones in multilingual pairs. For Chinese pairs, we employ the CILIN to do thesaurus expansion. Stories in different languages have different similarity distributions. Using thresholds to model the differences is shown to be usable. Finally, we show that the performance of the multilingual task is very close to that of the monolingual task. Chen, Hsin-Hsi 陳信希 2002 學位論文 ; thesis 61 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 90 === Link Detection is a task of the project “Topic Detection and Tracking” (TDT). We participate the TDT 2001 evaluation and focus on the monolingual and multilingual link detection tasks. We used the TDT 2 corpus as training data, and evaluated the performance with the augmented version of TDT 3 corpus. The link detection task is to decide whether two stories discuss the same topic or not.
In this thesis, we discuss the story representation. We do experiments to investigate the effect of story expansion and topic segmentation. We extend the monolingual model to the multilingual model. In the multilingual model, translation issue is discussed.
The experimental results show that nouns, verbs, adjectives, and compound nouns are useful to represent news stories. Story expansion using historic information is helpful, and we find that assigning expanded terms half of the original weights would be better. We introduce topic segmentation in the link detection task, and the results show that it has a little effect. In the multilingual task, a translation model is needed to capture the difference between languages. We translate Chinese stories into English ones in multilingual pairs. For Chinese pairs, we employ the CILIN to do thesaurus expansion. Stories in different languages have different similarity distributions. Using thresholds to model the differences is shown to be usable. Finally, we show that the performance of the multilingual task is very close to that of the monolingual task.
|
author2 |
Chen, Hsin-Hsi |
author_facet |
Chen, Hsin-Hsi Chen, Ying-Ju 陳盈如 |
author |
Chen, Ying-Ju 陳盈如 |
spellingShingle |
Chen, Ying-Ju 陳盈如 Monolingual and Multilingual Link Detection |
author_sort |
Chen, Ying-Ju |
title |
Monolingual and Multilingual Link Detection |
title_short |
Monolingual and Multilingual Link Detection |
title_full |
Monolingual and Multilingual Link Detection |
title_fullStr |
Monolingual and Multilingual Link Detection |
title_full_unstemmed |
Monolingual and Multilingual Link Detection |
title_sort |
monolingual and multilingual link detection |
publishDate |
2002 |
url |
http://ndltd.ncl.edu.tw/handle/52547391068760890190 |
work_keys_str_mv |
AT chenyingju monolingualandmultilinguallinkdetection AT chényíngrú monolingualandmultilinguallinkdetection AT chenyingju dānyǔhéduōyǔxīnwénxiāngguānxìngzhēncèzhīyánjiū AT chényíngrú dānyǔhéduōyǔxīnwénxiāngguānxìngzhēncèzhīyánjiū |
_version_ |
1717755465016803328 |