A Feature Selection Method Based on Text Segmentation of E-Books
碩士 === 國立成功大學 === 資訊管理研究所 === 98 === With the exponential growth of information technology and Internet, paper books can be transformed into e-books. People can get these e-books form Internet and download them by e-reader. It enhances the convenience to absorb knowledge from books. However, the num...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2010
|
Online Access: | http://ndltd.ncl.edu.tw/handle/02023913522965767774 |
id |
ndltd-TW-098NCKU5396018 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-098NCKU53960182015-11-06T04:03:45Z http://ndltd.ncl.edu.tw/handle/02023913522965767774 A Feature Selection Method Based on Text Segmentation of E-Books 基於文件分段之電子書特徵選取 Ming-WeiLai 賴銘偉 碩士 國立成功大學 資訊管理研究所 98 With the exponential growth of information technology and Internet, paper books can be transformed into e-books. People can get these e-books form Internet and download them by e-reader. It enhances the convenience to absorb knowledge from books. However, the number of e-books has been very large. It costs lot time and energy to classify these e-books. The traditional classification approaches like decision tree, k-nearest neighbor, na?ve bayes, support vector machines, usually select the feature words from content. These words will form a feature space. The longer the article is, the more likely generate a lot of feature words, and the dimension of feature space is higher. It causes the follow-up of the classification process complicated. Therefore, the classification process steps filter undesirable feature words through feature selection. However, the length of e-books is usually much longer than the general article. With traditional approaches, e-books generate a large number of feature words, and cause the follow-up of the classification process complicated, and even lost important feature words because of the long length content reducing these words’ overall weights. Therefore, we present a novel feature selection approach which applies a text segmentation algorithm. With this algorithm, e-books can be cut several segments.We analyze all words’ importance in these segments and select the inportant feature words for every segments. We expect that the feature words which selected by our approach can improve the accuracy of classification. Hei-Chia Wang 王惠嘉 2010 學位論文 ; thesis 90 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立成功大學 === 資訊管理研究所 === 98 === With the exponential growth of information technology and Internet, paper books can be transformed into e-books. People can get these e-books form Internet and download them by e-reader. It enhances the convenience to absorb knowledge from books. However, the number of e-books has been very large. It costs lot time and energy to classify these e-books.
The traditional classification approaches like decision tree, k-nearest neighbor, na?ve bayes, support vector machines, usually select the feature words from content. These words will form a feature space. The longer the article is, the more likely generate a lot of feature words, and the dimension of feature space is higher. It causes the follow-up of the classification process complicated. Therefore, the classification process steps filter undesirable feature words through feature selection. However, the length of e-books is usually much longer than the general article. With traditional approaches, e-books generate a large number of feature words, and cause the follow-up of the classification process complicated, and even lost important feature words because of the long length content reducing these words’ overall weights.
Therefore, we present a novel feature selection approach which applies a text segmentation algorithm. With this algorithm, e-books can be cut several segments.We analyze all words’ importance in these segments and select the inportant feature words for every segments. We expect that the feature words which selected by our approach can improve the accuracy of classification.
|
author2 |
Hei-Chia Wang |
author_facet |
Hei-Chia Wang Ming-WeiLai 賴銘偉 |
author |
Ming-WeiLai 賴銘偉 |
spellingShingle |
Ming-WeiLai 賴銘偉 A Feature Selection Method Based on Text Segmentation of E-Books |
author_sort |
Ming-WeiLai |
title |
A Feature Selection Method Based on Text Segmentation of E-Books |
title_short |
A Feature Selection Method Based on Text Segmentation of E-Books |
title_full |
A Feature Selection Method Based on Text Segmentation of E-Books |
title_fullStr |
A Feature Selection Method Based on Text Segmentation of E-Books |
title_full_unstemmed |
A Feature Selection Method Based on Text Segmentation of E-Books |
title_sort |
feature selection method based on text segmentation of e-books |
publishDate |
2010 |
url |
http://ndltd.ncl.edu.tw/handle/02023913522965767774 |
work_keys_str_mv |
AT mingweilai afeatureselectionmethodbasedontextsegmentationofebooks AT làimíngwěi afeatureselectionmethodbasedontextsegmentationofebooks AT mingweilai jīyúwénjiànfēnduànzhīdiànzishūtèzhēngxuǎnqǔ AT làimíngwěi jīyúwénjiànfēnduànzhīdiànzishūtèzhēngxuǎnqǔ AT mingweilai featureselectionmethodbasedontextsegmentationofebooks AT làimíngwěi featureselectionmethodbasedontextsegmentationofebooks |
_version_ |
1718125296995008512 |