Applying the Association Rules to Refine the VSM-based Document Clustering
碩士 === 中原大學 === 資訊管理研究所 === 90 === Nowadays, the information flow grows as fast as the cell division; being able to retrieve, organize, and present these fast growing information efficiently will be the key to success. Clustering has been investigated for organizing and classifying information aut...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2002
|
Online Access: | http://ndltd.ncl.edu.tw/handle/96636445690727464302 |
id |
ndltd-TW-090CYCU5396028 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-090CYCU53960282015-10-13T17:35:24Z http://ndltd.ncl.edu.tw/handle/96636445690727464302 Applying the Association Rules to Refine the VSM-based Document Clustering 應用關聯規則技術有效輔助以向量空間模型為基礎之文件群集法 Ming-Hsuan Chung 鍾明璇 碩士 中原大學 資訊管理研究所 90 Nowadays, the information flow grows as fast as the cell division; being able to retrieve, organize, and present these fast growing information efficiently will be the key to success. Clustering has been investigated for organizing and classifying information automatically according to some features. When applying this technology to documentary data, it can improve the precision or recall in information retrieval systems, and allow the system to organize and present information efficiently. Furthermore, Document clustering has also been used to automatically generate hierarchical clusters of documents (E.g.: The automatic generation of taxonomy of Web documents like that provided by Yahoo!). The traditional document clustering involves two phases: first, feature extraction maps each document or record to a point in vector space model, then applying specific clustering algorithms to group the points into clusters. Nevertheless, due to some inherent defects of the vector space model, which can’t differentiate relationships of the terms in documents, these may cause errors in the following operations. Therefore, this study proposes to use the association rule, which is one of the Data mining techniques, to make up for the inadequacy of the traditional document clustering and effectively improve the quality of clustering. This study use association rules to mine the relationships between terms in documents and further improves the shortcomings of the vector space model. At the end, we conducted some experiments with the Reuters-21578 corpus, we have compared the proposed method of document clustering with traditional one, and proved that the proposed method does generate higher quality clusters than the one produced by the traditional method. In the future, we plan to apply the proposed method of document clustering to other clustering algorithms based on the vector space model in order to further improve the quality of clustering. Wei-Ping Lee 李維平 2002 學位論文 ; thesis 69 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 中原大學 === 資訊管理研究所 === 90 === Nowadays, the information flow grows as fast as the cell division; being able to retrieve, organize, and present these fast growing information efficiently will be the key to success.
Clustering has been investigated for organizing and classifying information automatically according to some features. When applying this technology to documentary data, it can improve the precision or recall in information retrieval systems, and allow the system to organize and present information efficiently. Furthermore, Document clustering has also been used to automatically generate hierarchical clusters of documents (E.g.: The automatic generation of taxonomy of Web documents like that provided by Yahoo!). The traditional document clustering involves two phases: first, feature extraction maps each document or record to a point in vector space model, then applying specific clustering algorithms to group the points into clusters. Nevertheless, due to some inherent defects of the vector space model, which can’t differentiate relationships of the terms in documents, these may cause errors in the following operations. Therefore, this study proposes to use the association rule, which is one of the Data mining techniques, to make up for the inadequacy of the traditional document clustering and effectively improve the quality of clustering.
This study use association rules to mine the relationships between terms in documents and further improves the shortcomings of the vector space model. At the end, we conducted some experiments with the Reuters-21578 corpus, we have compared the proposed method of document clustering with traditional one, and proved that the proposed method does generate higher quality clusters than the one produced by the traditional method. In the future, we plan to apply the proposed method of document clustering to other clustering algorithms based on the vector space model in order to further improve the quality of clustering.
|
author2 |
Wei-Ping Lee |
author_facet |
Wei-Ping Lee Ming-Hsuan Chung 鍾明璇 |
author |
Ming-Hsuan Chung 鍾明璇 |
spellingShingle |
Ming-Hsuan Chung 鍾明璇 Applying the Association Rules to Refine the VSM-based Document Clustering |
author_sort |
Ming-Hsuan Chung |
title |
Applying the Association Rules to Refine the VSM-based Document Clustering |
title_short |
Applying the Association Rules to Refine the VSM-based Document Clustering |
title_full |
Applying the Association Rules to Refine the VSM-based Document Clustering |
title_fullStr |
Applying the Association Rules to Refine the VSM-based Document Clustering |
title_full_unstemmed |
Applying the Association Rules to Refine the VSM-based Document Clustering |
title_sort |
applying the association rules to refine the vsm-based document clustering |
publishDate |
2002 |
url |
http://ndltd.ncl.edu.tw/handle/96636445690727464302 |
work_keys_str_mv |
AT minghsuanchung applyingtheassociationrulestorefinethevsmbaseddocumentclustering AT zhōngmíngxuán applyingtheassociationrulestorefinethevsmbaseddocumentclustering AT minghsuanchung yīngyòngguānliánguīzéjìshùyǒuxiàofǔzhùyǐxiàngliàngkōngjiānmóxíngwèijīchǔzhīwénjiànqúnjífǎ AT zhōngmíngxuán yīngyòngguānliánguīzéjìshùyǒuxiàofǔzhùyǐxiàngliàngkōngjiānmóxíngwèijīchǔzhīwénjiànqúnjífǎ |
_version_ |
1717782876202729472 |