A Study of Chinese Abbreviations

碩士 === 國立雲林科技大學 === 資訊管理系碩士班 === 93 === The form of abbreviation is commonly used in the Chinese text. For instance, we often transform ‘台灣鐵路局’ into ‘台鐵局’. This kind of transformation is timesaving and convenient. However, this merit also brings some challenges in Chinese text processing. In keyword...

Full description

Bibliographic Details
Main Authors: Chuan-Pu Yang, 楊顓溥
Other Authors: Chuen-Min Huang
Format: Others
Language:en_US
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/67470349424647052988
id ndltd-TW-093YUNT5396036
record_format oai_dc
spelling ndltd-TW-093YUNT53960362015-10-13T11:54:00Z http://ndltd.ncl.edu.tw/handle/67470349424647052988 A Study of Chinese Abbreviations 中文縮寫詞研究 Chuan-Pu Yang 楊顓溥 碩士 國立雲林科技大學 資訊管理系碩士班 93 The form of abbreviation is commonly used in the Chinese text. For instance, we often transform ‘台灣鐵路局’ into ‘台鐵局’. This kind of transformation is timesaving and convenient. However, this merit also brings some challenges in Chinese text processing. In keyword-based information retrieval system, using the abbreviated form and the original form as search entry respectively, usually return different results even though both are the same meaning. In addition, the influences of abbreviation on Chinese word segmentation, automatic documents clustering and weight of terms are obvious. To solve the semantic ambiguity problem, we propose an approach to connect the two forms and construct an abbreviation list automatically in corpus without any fixed dictionary. In this study, we conduct three major experiments with 8,500 documents from news website. Each experiment is a duo-process, from original form to abbreviation form back and forth. In the first experiment, we employ Maximum Entropy Model which uses many contextual “features” to locate the best candidate. In the second experiment, we attempt to transform original forms from their abbreviations. The third experiment is aimed at finding abbreviations from their original forms. The precision ratios achieve 80%-90%, 70%, and 80% respectively. Chuen-Min Huang 黃純敏 2005 學位論文 ; thesis 53 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立雲林科技大學 === 資訊管理系碩士班 === 93 === The form of abbreviation is commonly used in the Chinese text. For instance, we often transform ‘台灣鐵路局’ into ‘台鐵局’. This kind of transformation is timesaving and convenient. However, this merit also brings some challenges in Chinese text processing. In keyword-based information retrieval system, using the abbreviated form and the original form as search entry respectively, usually return different results even though both are the same meaning. In addition, the influences of abbreviation on Chinese word segmentation, automatic documents clustering and weight of terms are obvious. To solve the semantic ambiguity problem, we propose an approach to connect the two forms and construct an abbreviation list automatically in corpus without any fixed dictionary. In this study, we conduct three major experiments with 8,500 documents from news website. Each experiment is a duo-process, from original form to abbreviation form back and forth. In the first experiment, we employ Maximum Entropy Model which uses many contextual “features” to locate the best candidate. In the second experiment, we attempt to transform original forms from their abbreviations. The third experiment is aimed at finding abbreviations from their original forms. The precision ratios achieve 80%-90%, 70%, and 80% respectively.
author2 Chuen-Min Huang
author_facet Chuen-Min Huang
Chuan-Pu Yang
楊顓溥
author Chuan-Pu Yang
楊顓溥
spellingShingle Chuan-Pu Yang
楊顓溥
A Study of Chinese Abbreviations
author_sort Chuan-Pu Yang
title A Study of Chinese Abbreviations
title_short A Study of Chinese Abbreviations
title_full A Study of Chinese Abbreviations
title_fullStr A Study of Chinese Abbreviations
title_full_unstemmed A Study of Chinese Abbreviations
title_sort study of chinese abbreviations
publishDate 2005
url http://ndltd.ncl.edu.tw/handle/67470349424647052988
work_keys_str_mv AT chuanpuyang astudyofchineseabbreviations
AT yángzhuānpǔ astudyofchineseabbreviations
AT chuanpuyang zhōngwénsuōxiěcíyánjiū
AT yángzhuānpǔ zhōngwénsuōxiěcíyánjiū
AT chuanpuyang studyofchineseabbreviations
AT yángzhuānpǔ studyofchineseabbreviations
_version_ 1716850833581146112