Applying the Association Rules to Refine the VSM-based Document Clustering

碩士 === 中原大學 === 資訊管理研究所 === 90 === Nowadays, the information flow grows as fast as the cell division; being able to retrieve, organize, and present these fast growing information efficiently will be the key to success. Clustering has been investigated for organizing and classifying information aut...

Full description

Bibliographic Details
Main Authors:	Ming-Hsuan Chung, 鍾明璇
Other Authors:	Wei-Ping Lee
Format:	Others
Language:	zh-TW
Published:	2002
Online Access:	http://ndltd.ncl.edu.tw/handle/96636445690727464302

id	ndltd-TW-090CYCU5396028
record_format	oai_dc
spelling	ndltd-TW-090CYCU53960282015-10-13T17:35:24Z http://ndltd.ncl.edu.tw/handle/96636445690727464302 Applying the Association Rules to Refine the VSM-based Document Clustering 應用關聯規則技術有效輔助以向量空間模型為基礎之文件群集法 Ming-Hsuan Chung 鍾明璇碩士中原大學資訊管理研究所 90 Nowadays, the information flow grows as fast as the cell division; being able to retrieve, organize, and present these fast growing information efficiently will be the key to success. Clustering has been investigated for organizing and classifying information automatically according to some features. When applying this technology to documentary data, it can improve the precision or recall in information retrieval systems, and allow the system to organize and present information efficiently. Furthermore, Document clustering has also been used to automatically generate hierarchical clusters of documents (E.g.: The automatic generation of taxonomy of Web documents like that provided by Yahoo!). The traditional document clustering involves two phases: first, feature extraction maps each document or record to a point in vector space model, then applying specific clustering algorithms to group the points into clusters. Nevertheless, due to some inherent defects of the vector space model, which can’t differentiate relationships of the terms in documents, these may cause errors in the following operations. Therefore, this study proposes to use the association rule, which is one of the Data mining techniques, to make up for the inadequacy of the traditional document clustering and effectively improve the quality of clustering. This study use association rules to mine the relationships between terms in documents and further improves the shortcomings of the vector space model. At the end, we conducted some experiments with the Reuters-21578 corpus, we have compared the proposed method of document clustering with traditional one, and proved that the proposed method does generate higher quality clusters than the one produced by the traditional method. In the future, we plan to apply the proposed method of document clustering to other clustering algorithms based on the vector space model in order to further improve the quality of clustering. Wei-Ping Lee 李維平 2002 學位論文 ; thesis 69 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 中原大學 === 資訊管理研究所 === 90 === Nowadays, the information flow grows as fast as the cell division; being able to retrieve, organize, and present these fast growing information efficiently will be the key to success. Clustering has been investigated for organizing and classifying information automatically according to some features. When applying this technology to documentary data, it can improve the precision or recall in information retrieval systems, and allow the system to organize and present information efficiently. Furthermore, Document clustering has also been used to automatically generate hierarchical clusters of documents (E.g.: The automatic generation of taxonomy of Web documents like that provided by Yahoo!). The traditional document clustering involves two phases: first, feature extraction maps each document or record to a point in vector space model, then applying specific clustering algorithms to group the points into clusters. Nevertheless, due to some inherent defects of the vector space model, which can’t differentiate relationships of the terms in documents, these may cause errors in the following operations. Therefore, this study proposes to use the association rule, which is one of the Data mining techniques, to make up for the inadequacy of the traditional document clustering and effectively improve the quality of clustering. This study use association rules to mine the relationships between terms in documents and further improves the shortcomings of the vector space model. At the end, we conducted some experiments with the Reuters-21578 corpus, we have compared the proposed method of document clustering with traditional one, and proved that the proposed method does generate higher quality clusters than the one produced by the traditional method. In the future, we plan to apply the proposed method of document clustering to other clustering algorithms based on the vector space model in order to further improve the quality of clustering.
author2	Wei-Ping Lee
author_facet	Wei-Ping Lee Ming-Hsuan Chung 鍾明璇
author	Ming-Hsuan Chung 鍾明璇
spellingShingle	Ming-Hsuan Chung 鍾明璇 Applying the Association Rules to Refine the VSM-based Document Clustering
author_sort	Ming-Hsuan Chung
title	Applying the Association Rules to Refine the VSM-based Document Clustering
title_short	Applying the Association Rules to Refine the VSM-based Document Clustering
title_full	Applying the Association Rules to Refine the VSM-based Document Clustering
title_fullStr	Applying the Association Rules to Refine the VSM-based Document Clustering
title_full_unstemmed	Applying the Association Rules to Refine the VSM-based Document Clustering
title_sort	applying the association rules to refine the vsm-based document clustering
publishDate	2002
url	http://ndltd.ncl.edu.tw/handle/96636445690727464302
work_keys_str_mv	AT minghsuanchung applyingtheassociationrulestorefinethevsmbaseddocumentclustering AT zhōngmíngxuán applyingtheassociationrulestorefinethevsmbaseddocumentclustering AT minghsuanchung yīngyòngguānliánguīzéjìshùyǒuxiàofǔzhùyǐxiàngliàngkōngjiānmóxíngwèijīchǔzhīwénjiànqúnjífǎ AT zhōngmíngxuán yīngyòngguānliánguīzéjìshùyǒuxiàofǔzhùyǐxiàngliàngkōngjiānmóxíngwèijīchǔzhīwénjiànqúnjífǎ
_version_	1717782876202729472

Applying the Association Rules to Refine the VSM-based Document Clustering

Similar Items