Document Clustering using Self-Organizing Maps

Cluster analysis of textual documents is a common technique for better ltering, navigation, under- standing and comprehension of the large document collection. Document clustering is an autonomous method that separate out large heterogeneous document collection into smaller more homogeneous sub-col...

Full description

Bibliographic Details
Main Authors: Muhammad Rafi, Muhammad Waqar, Hareem Ajaz, Umar Ayub, Muhammad Danish
Format: Article
Language:English
Published: Brno University of Technology 2017-06-01
Series:Mendel
Subjects:
Online Access:https://mendel-journal.org/index.php/mendel/article/view/61
id doaj-7a281e0326174f859749f380313293cf
record_format Article
spelling doaj-7a281e0326174f859749f380313293cf2021-07-21T07:38:49ZengBrno University of TechnologyMendel1803-38142571-37012017-06-0123110.13164/mendel.2017.1.11161Document Clustering using Self-Organizing MapsMuhammad RafiMuhammad WaqarHareem AjazUmar AyubMuhammad Danish Cluster analysis of textual documents is a common technique for better ltering, navigation, under- standing and comprehension of the large document collection. Document clustering is an autonomous method that separate out large heterogeneous document collection into smaller more homogeneous sub-collections called clusters. Self-organizing maps (SOM) is a type of arti cial neural network (ANN) that can be used to perform autonomous self-organization of high dimension feature space into low-dimensional projections called maps. It is considered a good method to perform clustering as both requires unsupervised processing. In this paper, we proposed a SOM using multi-layer, multi-feature to cluster documents. The paper implements a SOM using four layers containing lexical terms, phrases and sequences in bottom layers respectively and combining all at the top layers. The documents are processed to extract these features to feed the SOM. The internal weights and interconnections between these layers features(neurons) automatically settle through iterations with a small learning rate to discover the actual clusters. We have performed extensive set of experiments on standard text mining datasets like: NEWS20, Reuters and WebKB with evaluation measures F-Measure and Purity. The evaluation gives encouraging results and outperforms some of the existing approaches. We conclude that SOM with multi-features (lexical terms, phrases and sequences) and multi-layers can be very e ective in producing high quality clusters on large document collections. https://mendel-journal.org/index.php/mendel/article/view/61Document ClusteringText MiningNeural NetworkUnsupervised LearningSelf-Organizing MapsLayered Approach
collection DOAJ
language English
format Article
sources DOAJ
author Muhammad Rafi
Muhammad Waqar
Hareem Ajaz
Umar Ayub
Muhammad Danish
spellingShingle Muhammad Rafi
Muhammad Waqar
Hareem Ajaz
Umar Ayub
Muhammad Danish
Document Clustering using Self-Organizing Maps
Mendel
Document Clustering
Text Mining
Neural Network
Unsupervised Learning
Self-Organizing Maps
Layered Approach
author_facet Muhammad Rafi
Muhammad Waqar
Hareem Ajaz
Umar Ayub
Muhammad Danish
author_sort Muhammad Rafi
title Document Clustering using Self-Organizing Maps
title_short Document Clustering using Self-Organizing Maps
title_full Document Clustering using Self-Organizing Maps
title_fullStr Document Clustering using Self-Organizing Maps
title_full_unstemmed Document Clustering using Self-Organizing Maps
title_sort document clustering using self-organizing maps
publisher Brno University of Technology
series Mendel
issn 1803-3814
2571-3701
publishDate 2017-06-01
description Cluster analysis of textual documents is a common technique for better ltering, navigation, under- standing and comprehension of the large document collection. Document clustering is an autonomous method that separate out large heterogeneous document collection into smaller more homogeneous sub-collections called clusters. Self-organizing maps (SOM) is a type of arti cial neural network (ANN) that can be used to perform autonomous self-organization of high dimension feature space into low-dimensional projections called maps. It is considered a good method to perform clustering as both requires unsupervised processing. In this paper, we proposed a SOM using multi-layer, multi-feature to cluster documents. The paper implements a SOM using four layers containing lexical terms, phrases and sequences in bottom layers respectively and combining all at the top layers. The documents are processed to extract these features to feed the SOM. The internal weights and interconnections between these layers features(neurons) automatically settle through iterations with a small learning rate to discover the actual clusters. We have performed extensive set of experiments on standard text mining datasets like: NEWS20, Reuters and WebKB with evaluation measures F-Measure and Purity. The evaluation gives encouraging results and outperforms some of the existing approaches. We conclude that SOM with multi-features (lexical terms, phrases and sequences) and multi-layers can be very e ective in producing high quality clusters on large document collections.
topic Document Clustering
Text Mining
Neural Network
Unsupervised Learning
Self-Organizing Maps
Layered Approach
url https://mendel-journal.org/index.php/mendel/article/view/61
work_keys_str_mv AT muhammadrafi documentclusteringusingselforganizingmaps
AT muhammadwaqar documentclusteringusingselforganizingmaps
AT hareemajaz documentclusteringusingselforganizingmaps
AT umarayub documentclusteringusingselforganizingmaps
AT muhammaddanish documentclusteringusingselforganizingmaps
_version_ 1721292958622285824