Summary: | 碩士 === 國立中正大學 === 資訊工程所 === 98 === With the globalization of information, communication between countries or cultures is frequent. Now, language/encoding classification is presented almost everywhere. From text editor to web browser, mail system, and information retrieval, language/encoding classification is a small but important tool in computer science.
In this thesis we develop a language/encoding classification tool. The classification method contains encoding scheme check, statistical analysis of high frequency terms and Unicode encoding table lookup. TFIDF data-training technique, multi-pattern matching and weighted scoring mechanism are adopted. Besides, this tool is implemented as a network service, providing remote access for distributed computing environment.
|