A Hybrid Chinese Segmentation System for Finding Long Terms and New Terms
碩士 === 元智大學 === 資訊管理學系 === 99 === This study proposed a hybrid Chinese segmentation method. Firstly, we segment the documents using dual segmentation methods including High-Frequency Maximum Matching(HFMM) and CKIP. Secondly,we verify the HFMM generated long terms using part of speech (POS) given by...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2011
|
Online Access: | http://ndltd.ncl.edu.tw/handle/74114384622240439792 |
id |
ndltd-TW-099YZU05396009 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-099YZU053960092016-04-13T04:16:58Z http://ndltd.ncl.edu.tw/handle/74114384622240439792 A Hybrid Chinese Segmentation System for Finding Long Terms and New Terms 一個產生長詞與新詞的中文混合斷詞系統 Yu-Shyang Lin 林渝翔 碩士 元智大學 資訊管理學系 99 This study proposed a hybrid Chinese segmentation method. Firstly, we segment the documents using dual segmentation methods including High-Frequency Maximum Matching(HFMM) and CKIP. Secondly,we verify the HFMM generated long terms using part of speech (POS) given by CKIP and some POS combination rules. Finaly we find that generally won’t be generated by CKIP. The experimental results on Sinica corpus showed that the proposed method can achieve Precision, Recall and F1-measure to a certain level. Once adding the long terms selected manually into Sinica corpus, our method performs much better than other segment than methods. In addition,the experimental results on Google news showed that we can get 7.5 new terms in a average from news articlea of 3 categories. The average accuracy rate of new terms reached to 80.82%, indicating the proposeds can also find new terms accuratly. Cheng-Jye Luh 陸承志 2011 學位論文 ; thesis 61 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 元智大學 === 資訊管理學系 === 99 === This study proposed a hybrid Chinese segmentation method. Firstly, we segment the documents using dual segmentation methods including High-Frequency Maximum Matching(HFMM) and CKIP. Secondly,we verify the HFMM generated long terms using part of speech (POS) given by CKIP and some POS combination rules. Finaly we find that generally won’t be generated by CKIP.
The experimental results on Sinica corpus showed that the proposed method can achieve Precision, Recall and F1-measure to a certain level. Once adding the long terms selected manually into Sinica corpus, our method performs much better than other segment than methods.
In addition,the experimental results on Google news showed that we can get 7.5 new terms in a average from news articlea of 3 categories. The average accuracy rate of new terms reached to 80.82%, indicating the proposeds can also find new terms accuratly.
|
author2 |
Cheng-Jye Luh |
author_facet |
Cheng-Jye Luh Yu-Shyang Lin 林渝翔 |
author |
Yu-Shyang Lin 林渝翔 |
spellingShingle |
Yu-Shyang Lin 林渝翔 A Hybrid Chinese Segmentation System for Finding Long Terms and New Terms |
author_sort |
Yu-Shyang Lin |
title |
A Hybrid Chinese Segmentation System for Finding Long Terms and New Terms |
title_short |
A Hybrid Chinese Segmentation System for Finding Long Terms and New Terms |
title_full |
A Hybrid Chinese Segmentation System for Finding Long Terms and New Terms |
title_fullStr |
A Hybrid Chinese Segmentation System for Finding Long Terms and New Terms |
title_full_unstemmed |
A Hybrid Chinese Segmentation System for Finding Long Terms and New Terms |
title_sort |
hybrid chinese segmentation system for finding long terms and new terms |
publishDate |
2011 |
url |
http://ndltd.ncl.edu.tw/handle/74114384622240439792 |
work_keys_str_mv |
AT yushyanglin ahybridchinesesegmentationsystemforfindinglongtermsandnewterms AT línyúxiáng ahybridchinesesegmentationsystemforfindinglongtermsandnewterms AT yushyanglin yīgèchǎnshēngzhǎngcíyǔxīncídezhōngwénhùnhéduàncíxìtǒng AT línyúxiáng yīgèchǎnshēngzhǎngcíyǔxīncídezhōngwénhùnhéduàncíxìtǒng AT yushyanglin hybridchinesesegmentationsystemforfindinglongtermsandnewterms AT línyúxiáng hybridchinesesegmentationsystemforfindinglongtermsandnewterms |
_version_ |
1718222565163401216 |