Cross-Lingual Text Categorization: A Training-corpus Translation-based Approach

碩士 === 國立中山大學 === 資訊管理學系研究所 === 93 === Text categorization deals with the automatic learning of a text categorization model from a training set of preclassified documents on the basis of their contents and the assignment of unclassified documents to appropriate categories. Most of existing text cate...

Full description

Bibliographic Details
Main Authors:	Kai-hsiang Hsu, 許凱翔
Other Authors:	Chih-Ping Wei
Format:	Others
Language:	en_US
Published:	2005
Online Access:	http://ndltd.ncl.edu.tw/handle/29566553950618841626

id	ndltd-TW-093NSYS5396042
record_format	oai_dc
spelling	ndltd-TW-093NSYS53960422015-12-23T04:08:13Z http://ndltd.ncl.edu.tw/handle/29566553950618841626 Cross-Lingual Text Categorization: A Training-corpus Translation-based Approach 跨語言文件自動分類之研究:以翻譯訓練文集建立跨語言分類之方法 Kai-hsiang Hsu 許凱翔碩士國立中山大學資訊管理學系研究所 93 Text categorization deals with the automatic learning of a text categorization model from a training set of preclassified documents on the basis of their contents and the assignment of unclassified documents to appropriate categories. Most of existing text categorization techniques deal with monolingual documents (i.e., all documents are written in one language) during the text categorization model learning and category assignment (or prediction). However, with the globalization of business environments and advances in Internet technology, an organization or individual often generates/acquires and subsequently archives documents in different languages, thus creating the need for cross-lingual text categorization (CLTC). Existing studies on CLTC focus on the prediction-corpus translation-based approach that lacks of a systematic mechanism for reducing translation noises; thus, limiting their cross-lingual categorization effectiveness. Motivated by the needs of providing more effective CLTC support, we design a training-corpus translation-based CLTC approach. Using the prediction-corpus translation-based approach as the performance benchmark, our empirical evaluation results show that our proposed CLTC approach achieves significantly better classification effectiveness than the benchmark approach does in both Chinese Chih-Ping Wei 魏志平 2005 學位論文 ; thesis 49 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立中山大學 === 資訊管理學系研究所 === 93 === Text categorization deals with the automatic learning of a text categorization model from a training set of preclassified documents on the basis of their contents and the assignment of unclassified documents to appropriate categories. Most of existing text categorization techniques deal with monolingual documents (i.e., all documents are written in one language) during the text categorization model learning and category assignment (or prediction). However, with the globalization of business environments and advances in Internet technology, an organization or individual often generates/acquires and subsequently archives documents in different languages, thus creating the need for cross-lingual text categorization (CLTC). Existing studies on CLTC focus on the prediction-corpus translation-based approach that lacks of a systematic mechanism for reducing translation noises; thus, limiting their cross-lingual categorization effectiveness. Motivated by the needs of providing more effective CLTC support, we design a training-corpus translation-based CLTC approach. Using the prediction-corpus translation-based approach as the performance benchmark, our empirical evaluation results show that our proposed CLTC approach achieves significantly better classification effectiveness than the benchmark approach does in both Chinese
author2	Chih-Ping Wei
author_facet	Chih-Ping Wei Kai-hsiang Hsu 許凱翔
author	Kai-hsiang Hsu 許凱翔
spellingShingle	Kai-hsiang Hsu 許凱翔 Cross-Lingual Text Categorization: A Training-corpus Translation-based Approach
author_sort	Kai-hsiang Hsu
title	Cross-Lingual Text Categorization: A Training-corpus Translation-based Approach
title_short	Cross-Lingual Text Categorization: A Training-corpus Translation-based Approach
title_full	Cross-Lingual Text Categorization: A Training-corpus Translation-based Approach
title_fullStr	Cross-Lingual Text Categorization: A Training-corpus Translation-based Approach
title_full_unstemmed	Cross-Lingual Text Categorization: A Training-corpus Translation-based Approach
title_sort	cross-lingual text categorization: a training-corpus translation-based approach
publishDate	2005
url	http://ndltd.ncl.edu.tw/handle/29566553950618841626
work_keys_str_mv	AT kaihsianghsu crosslingualtextcategorizationatrainingcorpustranslationbasedapproach AT xǔkǎixiáng crosslingualtextcategorizationatrainingcorpustranslationbasedapproach AT kaihsianghsu kuàyǔyánwénjiànzìdòngfēnlèizhīyánjiūyǐfānyìxùnliànwénjíjiànlìkuàyǔyánfēnlèizhīfāngfǎ AT xǔkǎixiáng kuàyǔyánwénjiànzìdòngfēnlèizhīyánjiūyǐfānyìxùnliànwénjíjiànlìkuàyǔyánfēnlèizhīfāngfǎ
_version_	1718156075240259584

Cross-Lingual Text Categorization: A Training-corpus Translation-based Approach

Similar Items