A Feature Selection Method Based on Text Segmentation of E-Books

碩士 === 國立成功大學 === 資訊管理研究所 === 98 === With the exponential growth of information technology and Internet, paper books can be transformed into e-books. People can get these e-books form Internet and download them by e-reader. It enhances the convenience to absorb knowledge from books. However, the num...

Full description

Bibliographic Details
Main Authors: Ming-WeiLai, 賴銘偉
Other Authors: Hei-Chia Wang
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/02023913522965767774
id ndltd-TW-098NCKU5396018
record_format oai_dc
spelling ndltd-TW-098NCKU53960182015-11-06T04:03:45Z http://ndltd.ncl.edu.tw/handle/02023913522965767774 A Feature Selection Method Based on Text Segmentation of E-Books 基於文件分段之電子書特徵選取 Ming-WeiLai 賴銘偉 碩士 國立成功大學 資訊管理研究所 98 With the exponential growth of information technology and Internet, paper books can be transformed into e-books. People can get these e-books form Internet and download them by e-reader. It enhances the convenience to absorb knowledge from books. However, the number of e-books has been very large. It costs lot time and energy to classify these e-books. The traditional classification approaches like decision tree, k-nearest neighbor, na?ve bayes, support vector machines, usually select the feature words from content. These words will form a feature space. The longer the article is, the more likely generate a lot of feature words, and the dimension of feature space is higher. It causes the follow-up of the classification process complicated. Therefore, the classification process steps filter undesirable feature words through feature selection. However, the length of e-books is usually much longer than the general article. With traditional approaches, e-books generate a large number of feature words, and cause the follow-up of the classification process complicated, and even lost important feature words because of the long length content reducing these words’ overall weights. Therefore, we present a novel feature selection approach which applies a text segmentation algorithm. With this algorithm, e-books can be cut several segments.We analyze all words’ importance in these segments and select the inportant feature words for every segments. We expect that the feature words which selected by our approach can improve the accuracy of classification. Hei-Chia Wang 王惠嘉 2010 學位論文 ; thesis 90 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立成功大學 === 資訊管理研究所 === 98 === With the exponential growth of information technology and Internet, paper books can be transformed into e-books. People can get these e-books form Internet and download them by e-reader. It enhances the convenience to absorb knowledge from books. However, the number of e-books has been very large. It costs lot time and energy to classify these e-books. The traditional classification approaches like decision tree, k-nearest neighbor, na?ve bayes, support vector machines, usually select the feature words from content. These words will form a feature space. The longer the article is, the more likely generate a lot of feature words, and the dimension of feature space is higher. It causes the follow-up of the classification process complicated. Therefore, the classification process steps filter undesirable feature words through feature selection. However, the length of e-books is usually much longer than the general article. With traditional approaches, e-books generate a large number of feature words, and cause the follow-up of the classification process complicated, and even lost important feature words because of the long length content reducing these words’ overall weights. Therefore, we present a novel feature selection approach which applies a text segmentation algorithm. With this algorithm, e-books can be cut several segments.We analyze all words’ importance in these segments and select the inportant feature words for every segments. We expect that the feature words which selected by our approach can improve the accuracy of classification.
author2 Hei-Chia Wang
author_facet Hei-Chia Wang
Ming-WeiLai
賴銘偉
author Ming-WeiLai
賴銘偉
spellingShingle Ming-WeiLai
賴銘偉
A Feature Selection Method Based on Text Segmentation of E-Books
author_sort Ming-WeiLai
title A Feature Selection Method Based on Text Segmentation of E-Books
title_short A Feature Selection Method Based on Text Segmentation of E-Books
title_full A Feature Selection Method Based on Text Segmentation of E-Books
title_fullStr A Feature Selection Method Based on Text Segmentation of E-Books
title_full_unstemmed A Feature Selection Method Based on Text Segmentation of E-Books
title_sort feature selection method based on text segmentation of e-books
publishDate 2010
url http://ndltd.ncl.edu.tw/handle/02023913522965767774
work_keys_str_mv AT mingweilai afeatureselectionmethodbasedontextsegmentationofebooks
AT làimíngwěi afeatureselectionmethodbasedontextsegmentationofebooks
AT mingweilai jīyúwénjiànfēnduànzhīdiànzishūtèzhēngxuǎnqǔ
AT làimíngwěi jīyúwénjiànfēnduànzhīdiànzishūtèzhēngxuǎnqǔ
AT mingweilai featureselectionmethodbasedontextsegmentationofebooks
AT làimíngwěi featureselectionmethodbasedontextsegmentationofebooks
_version_ 1718125296995008512