Exploring the Influences of Lexical Sources and Term Weights
碩士 === 國立政治大學 === 資訊科學學系 === 95 === Legal information systems for non-Chinese languages have been studied intensively in the past many years. There are several topics under discussion, such as judgment assistance, legal document classification, and similar case search, and so on. This thesis studies...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2007
|
Online Access: | http://ndltd.ncl.edu.tw/handle/92986151610399472963 |
id |
ndltd-TW-095NCCU5394020 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-095NCCU53940202016-05-25T04:14:05Z http://ndltd.ncl.edu.tw/handle/92986151610399472963 Exploring the Influences of Lexical Sources and Term Weights 中文詞彙集的來源與權重對中文裁判書分類成效的影響 Cheng, Jen-Hao 鄭人豪 碩士 國立政治大學 資訊科學學系 95 Legal information systems for non-Chinese languages have been studied intensively in the past many years. There are several topics under discussion, such as judgment assistance, legal document classification, and similar case search, and so on. This thesis studies the classification of Chinese judgment documents. I use phrases as the indices for documents. I attempt to compare the influences of different lexical sources for segmenting Chinese text. One of the lexical sources is a general machine-readable dictionary, Hownet, and the other is the set of terms algorithmically extracted from legal documents. Based on the concept of tf-idf, I design two kinds of phrase weights: tpf-idf and tpf-icf. In the experiments, I use the k-nearest neighbor method to classify Chinese judgment documents into seven categories based on their prosecution reasons: larceny(竊盜), robbery (搶奪), robbery by threatening or disabling the victims (強盜), receiving stolen property (贓物), causing bodily harm (傷害), intimidation (恐嚇), and gambling(賭博). To achieve high accuracy with low rejection rates, I observe and discuss the distribution of similarity of the training documents to select appropriate parameters. In addition, I also conduct a set of analogous experiments for classifying documents based on the cited legal articles for gambling cases. To improve the classification effects, I apply the introspective learning technique to adjust the weights of phrases. I observe the intra-cluster similarity and inter-cluster similarity in evaluating the effects of weight adjustment on experiments for classifying documents based on their prosecution reasons and cited articles. Liu, Chao-Lin 劉昭麟 2007 學位論文 ; thesis 136 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立政治大學 === 資訊科學學系 === 95 === Legal information systems for non-Chinese languages have been studied intensively in the past many years. There are several topics under discussion, such as judgment assistance, legal document classification, and similar case search, and so on. This thesis studies the classification of Chinese judgment documents.
I use phrases as the indices for documents. I attempt to compare the influences of different lexical sources for segmenting Chinese text. One of the lexical sources is a general machine-readable dictionary, Hownet, and the other is the set of terms algorithmically extracted from legal documents. Based on the concept of tf-idf, I design two kinds of phrase weights: tpf-idf and tpf-icf.
In the experiments, I use the k-nearest neighbor method to classify Chinese judgment documents into seven categories based on their prosecution reasons: larceny(竊盜), robbery (搶奪), robbery by threatening or disabling the victims (強盜), receiving stolen property (贓物), causing bodily harm (傷害), intimidation (恐嚇), and gambling(賭博). To achieve high accuracy with low rejection rates, I observe and discuss the distribution of similarity of the training documents to select appropriate parameters. In addition, I also conduct a set of analogous experiments for classifying documents based on the cited legal articles for gambling cases.
To improve the classification effects, I apply the introspective learning technique to adjust the weights of phrases. I observe the intra-cluster similarity and inter-cluster similarity in evaluating the effects of weight adjustment on experiments for classifying documents based on their prosecution reasons and cited articles.
|
author2 |
Liu, Chao-Lin |
author_facet |
Liu, Chao-Lin Cheng, Jen-Hao 鄭人豪 |
author |
Cheng, Jen-Hao 鄭人豪 |
spellingShingle |
Cheng, Jen-Hao 鄭人豪 Exploring the Influences of Lexical Sources and Term Weights |
author_sort |
Cheng, Jen-Hao |
title |
Exploring the Influences of Lexical Sources and Term Weights |
title_short |
Exploring the Influences of Lexical Sources and Term Weights |
title_full |
Exploring the Influences of Lexical Sources and Term Weights |
title_fullStr |
Exploring the Influences of Lexical Sources and Term Weights |
title_full_unstemmed |
Exploring the Influences of Lexical Sources and Term Weights |
title_sort |
exploring the influences of lexical sources and term weights |
publishDate |
2007 |
url |
http://ndltd.ncl.edu.tw/handle/92986151610399472963 |
work_keys_str_mv |
AT chengjenhao exploringtheinfluencesoflexicalsourcesandtermweights AT zhèngrénháo exploringtheinfluencesoflexicalsourcesandtermweights AT chengjenhao zhōngwéncíhuìjídeláiyuányǔquánzhòngduìzhōngwéncáipànshūfēnlèichéngxiàodeyǐngxiǎng AT zhèngrénháo zhōngwéncíhuìjídeláiyuányǔquánzhòngduìzhōngwéncáipànshūfēnlèichéngxiàodeyǐngxiǎng |
_version_ |
1718280669001416704 |