Tools for Automatic Language Classification

碩士 === 國立中正大學 === 資訊工程所 === 98 === With the globalization of information, communication between countries or cultures is frequent. Now, language/encoding classification is presented almost everywhere. From text editor to web browser, mail system, and information retrieval, language/encoding classifi...

Full description

Bibliographic Details
Main Authors: Nai-fan Hsiao, 蕭乃凡
Other Authors: Sun Wu
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/92126373168389321757
Description
Summary:碩士 === 國立中正大學 === 資訊工程所 === 98 === With the globalization of information, communication between countries or cultures is frequent. Now, language/encoding classification is presented almost everywhere. From text editor to web browser, mail system, and information retrieval, language/encoding classification is a small but important tool in computer science. In this thesis we develop a language/encoding classification tool. The classification method contains encoding scheme check, statistical analysis of high frequency terms and Unicode encoding table lookup. TFIDF data-training technique, multi-pattern matching and weighted scoring mechanism are adopted. Besides, this tool is implemented as a network service, providing remote access for distributed computing environment.