Document Clustering using Self-Organizing Maps
Cluster analysis of textual documents is a common technique for better ltering, navigation, under- standing and comprehension of the large document collection. Document clustering is an autonomous method that separate out large heterogeneous document collection into smaller more homogeneous sub-col...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Brno University of Technology
2017-06-01
|
Series: | Mendel |
Subjects: | |
Online Access: | https://mendel-journal.org/index.php/mendel/article/view/61 |
id |
doaj-7a281e0326174f859749f380313293cf |
---|---|
record_format |
Article |
spelling |
doaj-7a281e0326174f859749f380313293cf2021-07-21T07:38:49ZengBrno University of TechnologyMendel1803-38142571-37012017-06-0123110.13164/mendel.2017.1.11161Document Clustering using Self-Organizing MapsMuhammad RafiMuhammad WaqarHareem AjazUmar AyubMuhammad Danish Cluster analysis of textual documents is a common technique for better ltering, navigation, under- standing and comprehension of the large document collection. Document clustering is an autonomous method that separate out large heterogeneous document collection into smaller more homogeneous sub-collections called clusters. Self-organizing maps (SOM) is a type of arti cial neural network (ANN) that can be used to perform autonomous self-organization of high dimension feature space into low-dimensional projections called maps. It is considered a good method to perform clustering as both requires unsupervised processing. In this paper, we proposed a SOM using multi-layer, multi-feature to cluster documents. The paper implements a SOM using four layers containing lexical terms, phrases and sequences in bottom layers respectively and combining all at the top layers. The documents are processed to extract these features to feed the SOM. The internal weights and interconnections between these layers features(neurons) automatically settle through iterations with a small learning rate to discover the actual clusters. We have performed extensive set of experiments on standard text mining datasets like: NEWS20, Reuters and WebKB with evaluation measures F-Measure and Purity. The evaluation gives encouraging results and outperforms some of the existing approaches. We conclude that SOM with multi-features (lexical terms, phrases and sequences) and multi-layers can be very e ective in producing high quality clusters on large document collections. https://mendel-journal.org/index.php/mendel/article/view/61Document ClusteringText MiningNeural NetworkUnsupervised LearningSelf-Organizing MapsLayered Approach |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Muhammad Rafi Muhammad Waqar Hareem Ajaz Umar Ayub Muhammad Danish |
spellingShingle |
Muhammad Rafi Muhammad Waqar Hareem Ajaz Umar Ayub Muhammad Danish Document Clustering using Self-Organizing Maps Mendel Document Clustering Text Mining Neural Network Unsupervised Learning Self-Organizing Maps Layered Approach |
author_facet |
Muhammad Rafi Muhammad Waqar Hareem Ajaz Umar Ayub Muhammad Danish |
author_sort |
Muhammad Rafi |
title |
Document Clustering using Self-Organizing Maps |
title_short |
Document Clustering using Self-Organizing Maps |
title_full |
Document Clustering using Self-Organizing Maps |
title_fullStr |
Document Clustering using Self-Organizing Maps |
title_full_unstemmed |
Document Clustering using Self-Organizing Maps |
title_sort |
document clustering using self-organizing maps |
publisher |
Brno University of Technology |
series |
Mendel |
issn |
1803-3814 2571-3701 |
publishDate |
2017-06-01 |
description |
Cluster analysis of textual documents is a common technique for better ltering, navigation, under-
standing and comprehension of the large document collection. Document clustering is an autonomous method
that separate out large heterogeneous document collection into smaller more homogeneous sub-collections called
clusters. Self-organizing maps (SOM) is a type of arti cial neural network (ANN) that can be used to perform
autonomous self-organization of high dimension feature space into low-dimensional projections called maps. It
is considered a good method to perform clustering as both requires unsupervised processing. In this paper, we
proposed a SOM using multi-layer, multi-feature to cluster documents. The paper implements a SOM using
four layers containing lexical terms, phrases and sequences in bottom layers respectively and combining all at
the top layers. The documents are processed to extract these features to feed the SOM. The internal weights
and interconnections between these layers features(neurons) automatically settle through iterations with a small
learning rate to discover the actual clusters. We have performed extensive set of experiments on standard text
mining datasets like: NEWS20, Reuters and WebKB with evaluation measures F-Measure and Purity. The
evaluation gives encouraging results and outperforms some of the existing approaches. We conclude that SOM
with multi-features (lexical terms, phrases and sequences) and multi-layers can be very e ective in producing
high quality clusters on large document collections.
|
topic |
Document Clustering Text Mining Neural Network Unsupervised Learning Self-Organizing Maps Layered Approach |
url |
https://mendel-journal.org/index.php/mendel/article/view/61 |
work_keys_str_mv |
AT muhammadrafi documentclusteringusingselforganizingmaps AT muhammadwaqar documentclusteringusingselforganizingmaps AT hareemajaz documentclusteringusingselforganizingmaps AT umarayub documentclusteringusingselforganizingmaps AT muhammaddanish documentclusteringusingselforganizingmaps |
_version_ |
1721292958622285824 |