Applying the Association Rules to Refine the VSM-based Document Clustering

碩士 === 中原大學 === 資訊管理研究所 === 90 === Nowadays, the information flow grows as fast as the cell division; being able to retrieve, organize, and present these fast growing information efficiently will be the key to success. Clustering has been investigated for organizing and classifying information aut...

Full description

Bibliographic Details
Main Authors: Ming-Hsuan Chung, 鍾明璇
Other Authors: Wei-Ping Lee
Format: Others
Language:zh-TW
Published: 2002
Online Access:http://ndltd.ncl.edu.tw/handle/96636445690727464302
id ndltd-TW-090CYCU5396028
record_format oai_dc
spelling ndltd-TW-090CYCU53960282015-10-13T17:35:24Z http://ndltd.ncl.edu.tw/handle/96636445690727464302 Applying the Association Rules to Refine the VSM-based Document Clustering 應用關聯規則技術有效輔助以向量空間模型為基礎之文件群集法 Ming-Hsuan Chung 鍾明璇 碩士 中原大學 資訊管理研究所 90 Nowadays, the information flow grows as fast as the cell division; being able to retrieve, organize, and present these fast growing information efficiently will be the key to success. Clustering has been investigated for organizing and classifying information automatically according to some features. When applying this technology to documentary data, it can improve the precision or recall in information retrieval systems, and allow the system to organize and present information efficiently. Furthermore, Document clustering has also been used to automatically generate hierarchical clusters of documents (E.g.: The automatic generation of taxonomy of Web documents like that provided by Yahoo!). The traditional document clustering involves two phases: first, feature extraction maps each document or record to a point in vector space model, then applying specific clustering algorithms to group the points into clusters. Nevertheless, due to some inherent defects of the vector space model, which can’t differentiate relationships of the terms in documents, these may cause errors in the following operations. Therefore, this study proposes to use the association rule, which is one of the Data mining techniques, to make up for the inadequacy of the traditional document clustering and effectively improve the quality of clustering. This study use association rules to mine the relationships between terms in documents and further improves the shortcomings of the vector space model. At the end, we conducted some experiments with the Reuters-21578 corpus, we have compared the proposed method of document clustering with traditional one, and proved that the proposed method does generate higher quality clusters than the one produced by the traditional method. In the future, we plan to apply the proposed method of document clustering to other clustering algorithms based on the vector space model in order to further improve the quality of clustering. Wei-Ping Lee 李維平 2002 學位論文 ; thesis 69 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 中原大學 === 資訊管理研究所 === 90 === Nowadays, the information flow grows as fast as the cell division; being able to retrieve, organize, and present these fast growing information efficiently will be the key to success. Clustering has been investigated for organizing and classifying information automatically according to some features. When applying this technology to documentary data, it can improve the precision or recall in information retrieval systems, and allow the system to organize and present information efficiently. Furthermore, Document clustering has also been used to automatically generate hierarchical clusters of documents (E.g.: The automatic generation of taxonomy of Web documents like that provided by Yahoo!). The traditional document clustering involves two phases: first, feature extraction maps each document or record to a point in vector space model, then applying specific clustering algorithms to group the points into clusters. Nevertheless, due to some inherent defects of the vector space model, which can’t differentiate relationships of the terms in documents, these may cause errors in the following operations. Therefore, this study proposes to use the association rule, which is one of the Data mining techniques, to make up for the inadequacy of the traditional document clustering and effectively improve the quality of clustering. This study use association rules to mine the relationships between terms in documents and further improves the shortcomings of the vector space model. At the end, we conducted some experiments with the Reuters-21578 corpus, we have compared the proposed method of document clustering with traditional one, and proved that the proposed method does generate higher quality clusters than the one produced by the traditional method. In the future, we plan to apply the proposed method of document clustering to other clustering algorithms based on the vector space model in order to further improve the quality of clustering.
author2 Wei-Ping Lee
author_facet Wei-Ping Lee
Ming-Hsuan Chung
鍾明璇
author Ming-Hsuan Chung
鍾明璇
spellingShingle Ming-Hsuan Chung
鍾明璇
Applying the Association Rules to Refine the VSM-based Document Clustering
author_sort Ming-Hsuan Chung
title Applying the Association Rules to Refine the VSM-based Document Clustering
title_short Applying the Association Rules to Refine the VSM-based Document Clustering
title_full Applying the Association Rules to Refine the VSM-based Document Clustering
title_fullStr Applying the Association Rules to Refine the VSM-based Document Clustering
title_full_unstemmed Applying the Association Rules to Refine the VSM-based Document Clustering
title_sort applying the association rules to refine the vsm-based document clustering
publishDate 2002
url http://ndltd.ncl.edu.tw/handle/96636445690727464302
work_keys_str_mv AT minghsuanchung applyingtheassociationrulestorefinethevsmbaseddocumentclustering
AT zhōngmíngxuán applyingtheassociationrulestorefinethevsmbaseddocumentclustering
AT minghsuanchung yīngyòngguānliánguīzéjìshùyǒuxiàofǔzhùyǐxiàngliàngkōngjiānmóxíngwèijīchǔzhīwénjiànqúnjífǎ
AT zhōngmíngxuán yīngyòngguānliánguīzéjìshùyǒuxiàofǔzhùyǐxiàngliàngkōngjiānmóxíngwèijīchǔzhīwénjiànqúnjífǎ
_version_ 1717782876202729472