A Study of Chinese Abbreviations
碩士 === 國立雲林科技大學 === 資訊管理系碩士班 === 93 === The form of abbreviation is commonly used in the Chinese text. For instance, we often transform ‘台灣鐵路局’ into ‘台鐵局’. This kind of transformation is timesaving and convenient. However, this merit also brings some challenges in Chinese text processing. In keyword...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2005
|
Online Access: | http://ndltd.ncl.edu.tw/handle/67470349424647052988 |
id |
ndltd-TW-093YUNT5396036 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-093YUNT53960362015-10-13T11:54:00Z http://ndltd.ncl.edu.tw/handle/67470349424647052988 A Study of Chinese Abbreviations 中文縮寫詞研究 Chuan-Pu Yang 楊顓溥 碩士 國立雲林科技大學 資訊管理系碩士班 93 The form of abbreviation is commonly used in the Chinese text. For instance, we often transform ‘台灣鐵路局’ into ‘台鐵局’. This kind of transformation is timesaving and convenient. However, this merit also brings some challenges in Chinese text processing. In keyword-based information retrieval system, using the abbreviated form and the original form as search entry respectively, usually return different results even though both are the same meaning. In addition, the influences of abbreviation on Chinese word segmentation, automatic documents clustering and weight of terms are obvious. To solve the semantic ambiguity problem, we propose an approach to connect the two forms and construct an abbreviation list automatically in corpus without any fixed dictionary. In this study, we conduct three major experiments with 8,500 documents from news website. Each experiment is a duo-process, from original form to abbreviation form back and forth. In the first experiment, we employ Maximum Entropy Model which uses many contextual “features” to locate the best candidate. In the second experiment, we attempt to transform original forms from their abbreviations. The third experiment is aimed at finding abbreviations from their original forms. The precision ratios achieve 80%-90%, 70%, and 80% respectively. Chuen-Min Huang 黃純敏 2005 學位論文 ; thesis 53 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立雲林科技大學 === 資訊管理系碩士班 === 93 === The form of abbreviation is commonly used in the Chinese text. For instance, we often transform ‘台灣鐵路局’ into ‘台鐵局’. This kind of transformation is timesaving and convenient. However, this merit also brings some challenges in Chinese text processing. In keyword-based information retrieval system, using the abbreviated form and the original form as search entry respectively, usually return different results even though both are the same meaning. In addition, the influences of abbreviation on Chinese word segmentation, automatic documents clustering and weight of terms are obvious.
To solve the semantic ambiguity problem, we propose an approach to connect the two forms and construct an abbreviation list automatically in corpus without any fixed dictionary.
In this study, we conduct three major experiments with 8,500 documents from news website. Each experiment is a duo-process, from original form to abbreviation form back and forth. In the first experiment, we employ Maximum Entropy Model which uses many contextual “features” to locate the best candidate. In the second experiment, we attempt to transform original forms from their abbreviations. The third experiment is aimed at finding abbreviations from their original forms. The precision ratios achieve 80%-90%, 70%, and 80% respectively.
|
author2 |
Chuen-Min Huang |
author_facet |
Chuen-Min Huang Chuan-Pu Yang 楊顓溥 |
author |
Chuan-Pu Yang 楊顓溥 |
spellingShingle |
Chuan-Pu Yang 楊顓溥 A Study of Chinese Abbreviations |
author_sort |
Chuan-Pu Yang |
title |
A Study of Chinese Abbreviations |
title_short |
A Study of Chinese Abbreviations |
title_full |
A Study of Chinese Abbreviations |
title_fullStr |
A Study of Chinese Abbreviations |
title_full_unstemmed |
A Study of Chinese Abbreviations |
title_sort |
study of chinese abbreviations |
publishDate |
2005 |
url |
http://ndltd.ncl.edu.tw/handle/67470349424647052988 |
work_keys_str_mv |
AT chuanpuyang astudyofchineseabbreviations AT yángzhuānpǔ astudyofchineseabbreviations AT chuanpuyang zhōngwénsuōxiěcíyánjiū AT yángzhuānpǔ zhōngwénsuōxiěcíyánjiū AT chuanpuyang studyofchineseabbreviations AT yángzhuānpǔ studyofchineseabbreviations |
_version_ |
1716850833581146112 |