A new AntTree-based algorithm for clustering short-text corpora

Research work on "short-text clustering" is a very important research area due to the current tendency for people to use "small-language", e.g. blogs, textmessaging and others. In some recent works, new bioinspired clustering algorithms have been proposed to deal with this diffic...

Full description

Bibliographic Details
Main Authors: Marcelo Luis Errecalde, Diego Alejandro Ingaramo, Paolo Rosso
Format: Article
Language:English
Published: Postgraduate Office, School of Computer Science, Universidad Nacional de La Plata 2010-04-01
Series:Journal of Computer Science and Technology
Subjects:
Online Access:https://journal.info.unlp.edu.ar/JCST/article/view/708
id doaj-ebbb03d42c06452597644bf3f6ba7323
record_format Article
spelling doaj-ebbb03d42c06452597644bf3f6ba73232021-05-05T13:55:14ZengPostgraduate Office, School of Computer Science, Universidad Nacional de La PlataJournal of Computer Science and Technology1666-60461666-60382010-04-01100117402A new AntTree-based algorithm for clustering short-text corporaMarcelo Luis Errecalde0Diego Alejandro Ingaramo1Paolo Rosso2Development and Research Laboratory in Computacional Intelligence (LIDIC), Universidad Nacional de San Luis, San Luis, ArgentinaDevelopment and Research Laboratory in Computacional Intelligence (LIDIC), Universidad Nacional de San Luis, San Luis, ArgentinaNatural Language Engineering Lab.,ELiRF, Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, Valencia, SpainResearch work on "short-text clustering" is a very important research area due to the current tendency for people to use "small-language", e.g. blogs, textmessaging and others. In some recent works, new bioinspired clustering algorithms have been proposed to deal with this difficult problem and novel uses of Internal Clustering Validity Measures have also been presented. In this work, a new AntTree-based approach is proposed for this task. It integrates information on the Silhouette Coefficient and the concept of attraction of a cluster in different stages of the clustering process. The proposal achieves results comparable to the best reported results in this area, showing an interesting stability in the quality of the results and presenting some interesting capabilities as a general improvement method for arbitrary clustering approaches.https://journal.info.unlp.edu.ar/JCST/article/view/708internal validity measuresanttreeshort-text clusteringbio-inspired algorithmsinternal validity measuressilhouette coefficient
collection DOAJ
language English
format Article
sources DOAJ
author Marcelo Luis Errecalde
Diego Alejandro Ingaramo
Paolo Rosso
spellingShingle Marcelo Luis Errecalde
Diego Alejandro Ingaramo
Paolo Rosso
A new AntTree-based algorithm for clustering short-text corpora
Journal of Computer Science and Technology
internal validity measures
anttree
short-text clustering
bio-inspired algorithms
internal validity measures
silhouette coefficient
author_facet Marcelo Luis Errecalde
Diego Alejandro Ingaramo
Paolo Rosso
author_sort Marcelo Luis Errecalde
title A new AntTree-based algorithm for clustering short-text corpora
title_short A new AntTree-based algorithm for clustering short-text corpora
title_full A new AntTree-based algorithm for clustering short-text corpora
title_fullStr A new AntTree-based algorithm for clustering short-text corpora
title_full_unstemmed A new AntTree-based algorithm for clustering short-text corpora
title_sort new anttree-based algorithm for clustering short-text corpora
publisher Postgraduate Office, School of Computer Science, Universidad Nacional de La Plata
series Journal of Computer Science and Technology
issn 1666-6046
1666-6038
publishDate 2010-04-01
description Research work on "short-text clustering" is a very important research area due to the current tendency for people to use "small-language", e.g. blogs, textmessaging and others. In some recent works, new bioinspired clustering algorithms have been proposed to deal with this difficult problem and novel uses of Internal Clustering Validity Measures have also been presented. In this work, a new AntTree-based approach is proposed for this task. It integrates information on the Silhouette Coefficient and the concept of attraction of a cluster in different stages of the clustering process. The proposal achieves results comparable to the best reported results in this area, showing an interesting stability in the quality of the results and presenting some interesting capabilities as a general improvement method for arbitrary clustering approaches.
topic internal validity measures
anttree
short-text clustering
bio-inspired algorithms
internal validity measures
silhouette coefficient
url https://journal.info.unlp.edu.ar/JCST/article/view/708
work_keys_str_mv AT marceloluiserrecalde anewanttreebasedalgorithmforclusteringshorttextcorpora
AT diegoalejandroingaramo anewanttreebasedalgorithmforclusteringshorttextcorpora
AT paolorosso anewanttreebasedalgorithmforclusteringshorttextcorpora
AT marceloluiserrecalde newanttreebasedalgorithmforclusteringshorttextcorpora
AT diegoalejandroingaramo newanttreebasedalgorithmforclusteringshorttextcorpora
AT paolorosso newanttreebasedalgorithmforclusteringshorttextcorpora
_version_ 1721460549997297664