Document Clustering using Self-Organizing Maps

Cluster analysis of textual documents is a common technique for better ltering, navigation, under- standing and comprehension of the large document collection. Document clustering is an autonomous method that separate out large heterogeneous document collection into smaller more homogeneous sub-col...

Full description

Bibliographic Details
Main Authors:	Muhammad Rafi, Muhammad Waqar, Hareem Ajaz, Umar Ayub, Muhammad Danish
Format:	Article
Language:	English
Published:	Brno University of Technology 2017-06-01
Series:	Mendel
Subjects:	Document Clustering Text Mining Neural Network Unsupervised Learning Self-Organizing Maps Layered Approach
Online Access:	https://mendel-journal.org/index.php/mendel/article/view/61

id	doaj-7a281e0326174f859749f380313293cf
record_format	Article
spelling	doaj-7a281e0326174f859749f380313293cf2021-07-21T07:38:49ZengBrno University of TechnologyMendel1803-38142571-37012017-06-0123110.13164/mendel.2017.1.11161Document Clustering using Self-Organizing MapsMuhammad RafiMuhammad WaqarHareem AjazUmar AyubMuhammad Danish Cluster analysis of textual documents is a common technique for better ltering, navigation, under- standing and comprehension of the large document collection. Document clustering is an autonomous method that separate out large heterogeneous document collection into smaller more homogeneous sub-collections called clusters. Self-organizing maps (SOM) is a type of arti cial neural network (ANN) that can be used to perform autonomous self-organization of high dimension feature space into low-dimensional projections called maps. It is considered a good method to perform clustering as both requires unsupervised processing. In this paper, we proposed a SOM using multi-layer, multi-feature to cluster documents. The paper implements a SOM using four layers containing lexical terms, phrases and sequences in bottom layers respectively and combining all at the top layers. The documents are processed to extract these features to feed the SOM. The internal weights and interconnections between these layers features(neurons) automatically settle through iterations with a small learning rate to discover the actual clusters. We have performed extensive set of experiments on standard text mining datasets like: NEWS20, Reuters and WebKB with evaluation measures F-Measure and Purity. The evaluation gives encouraging results and outperforms some of the existing approaches. We conclude that SOM with multi-features (lexical terms, phrases and sequences) and multi-layers can be very e ective in producing high quality clusters on large document collections. https://mendel-journal.org/index.php/mendel/article/view/61Document ClusteringText MiningNeural NetworkUnsupervised LearningSelf-Organizing MapsLayered Approach
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Muhammad Rafi Muhammad Waqar Hareem Ajaz Umar Ayub Muhammad Danish
spellingShingle	Muhammad Rafi Muhammad Waqar Hareem Ajaz Umar Ayub Muhammad Danish Document Clustering using Self-Organizing Maps Mendel Document Clustering Text Mining Neural Network Unsupervised Learning Self-Organizing Maps Layered Approach
author_facet	Muhammad Rafi Muhammad Waqar Hareem Ajaz Umar Ayub Muhammad Danish
author_sort	Muhammad Rafi
title	Document Clustering using Self-Organizing Maps
title_short	Document Clustering using Self-Organizing Maps
title_full	Document Clustering using Self-Organizing Maps
title_fullStr	Document Clustering using Self-Organizing Maps
title_full_unstemmed	Document Clustering using Self-Organizing Maps
title_sort	document clustering using self-organizing maps
publisher	Brno University of Technology
series	Mendel
issn	1803-3814 2571-3701
publishDate	2017-06-01
description	Cluster analysis of textual documents is a common technique for better ltering, navigation, under- standing and comprehension of the large document collection. Document clustering is an autonomous method that separate out large heterogeneous document collection into smaller more homogeneous sub-collections called clusters. Self-organizing maps (SOM) is a type of arti cial neural network (ANN) that can be used to perform autonomous self-organization of high dimension feature space into low-dimensional projections called maps. It is considered a good method to perform clustering as both requires unsupervised processing. In this paper, we proposed a SOM using multi-layer, multi-feature to cluster documents. The paper implements a SOM using four layers containing lexical terms, phrases and sequences in bottom layers respectively and combining all at the top layers. The documents are processed to extract these features to feed the SOM. The internal weights and interconnections between these layers features(neurons) automatically settle through iterations with a small learning rate to discover the actual clusters. We have performed extensive set of experiments on standard text mining datasets like: NEWS20, Reuters and WebKB with evaluation measures F-Measure and Purity. The evaluation gives encouraging results and outperforms some of the existing approaches. We conclude that SOM with multi-features (lexical terms, phrases and sequences) and multi-layers can be very e ective in producing high quality clusters on large document collections.
topic	Document Clustering Text Mining Neural Network Unsupervised Learning Self-Organizing Maps Layered Approach
url	https://mendel-journal.org/index.php/mendel/article/view/61
work_keys_str_mv	AT muhammadrafi documentclusteringusingselforganizingmaps AT muhammadwaqar documentclusteringusingselforganizingmaps AT hareemajaz documentclusteringusingselforganizingmaps AT umarayub documentclusteringusingselforganizingmaps AT muhammaddanish documentclusteringusingselforganizingmaps
_version_	1721292958622285824

Document Clustering using Self-Organizing Maps

Similar Items