Cluster Analysis of Discussions on Internet Forums

The growth of textual content on internet forums over the last decade have been immense which have resulted in users struggling to find relevant information in a convenient and quick way. The activity of finding information from large data collections is known as information retrieval and many tools...

Full description

Bibliographic Details
Main Author:	Holm, Rasmus
Format:	Others
Language:	English
Published:	Linköpings universitet, Artificiell intelligens och integrerad datorsystem 2016
Subjects:	Cluster Analysis Text Mining Internet Forum Computer Sciences Datavetenskap (datalogi)
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-129934

id	ndltd-UPSALLA1-oai-DiVA.org-liu-129934
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-liu-1299342018-01-11T05:11:30ZCluster Analysis of Discussions on Internet ForumsengKlusteranalys av Diskussioner på InternetforumHolm, RasmusLinköpings universitet, Artificiell intelligens och integrerad datorsystem2016Cluster AnalysisText MiningInternet ForumComputer SciencesDatavetenskap (datalogi)The growth of textual content on internet forums over the last decade have been immense which have resulted in users struggling to find relevant information in a convenient and quick way. The activity of finding information from large data collections is known as information retrieval and many tools and techniques have been developed to tackle common problems. Cluster analysis is a technique for grouping similar objects into smaller groups (clusters) such that the objects within a cluster are more similar than objects between clusters. We have investigated the clustering algorithms, Graclus and Non-Exhaustive Overlapping k-means (NEO-k-means), on textual data taken from Reddit, a social network service. One of the difficulties with the aforementioned algorithms is that both have an input parameter controlling how many clusters to find. We have used a greedy modularity maximization algorithm in order to estimate the number of clusters that exist in discussion threads. We have shown that it is possible to find subtopics within discussions and that in terms of execution time, Graclus has a clear advantage over NEO-k-means. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-129934application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Cluster Analysis Text Mining Internet Forum Computer Sciences Datavetenskap (datalogi)
spellingShingle	Cluster Analysis Text Mining Internet Forum Computer Sciences Datavetenskap (datalogi) Holm, Rasmus Cluster Analysis of Discussions on Internet Forums
description	The growth of textual content on internet forums over the last decade have been immense which have resulted in users struggling to find relevant information in a convenient and quick way. The activity of finding information from large data collections is known as information retrieval and many tools and techniques have been developed to tackle common problems. Cluster analysis is a technique for grouping similar objects into smaller groups (clusters) such that the objects within a cluster are more similar than objects between clusters. We have investigated the clustering algorithms, Graclus and Non-Exhaustive Overlapping k-means (NEO-k-means), on textual data taken from Reddit, a social network service. One of the difficulties with the aforementioned algorithms is that both have an input parameter controlling how many clusters to find. We have used a greedy modularity maximization algorithm in order to estimate the number of clusters that exist in discussion threads. We have shown that it is possible to find subtopics within discussions and that in terms of execution time, Graclus has a clear advantage over NEO-k-means.
author	Holm, Rasmus
author_facet	Holm, Rasmus
author_sort	Holm, Rasmus
title	Cluster Analysis of Discussions on Internet Forums
title_short	Cluster Analysis of Discussions on Internet Forums
title_full	Cluster Analysis of Discussions on Internet Forums
title_fullStr	Cluster Analysis of Discussions on Internet Forums
title_full_unstemmed	Cluster Analysis of Discussions on Internet Forums
title_sort	cluster analysis of discussions on internet forums
publisher	Linköpings universitet, Artificiell intelligens och integrerad datorsystem
publishDate	2016
url	http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-129934
work_keys_str_mv	AT holmrasmus clusteranalysisofdiscussionsoninternetforums AT holmrasmus klusteranalysavdiskussionerpainternetforum
_version_	1718604426150674432

Cluster Analysis of Discussions on Internet Forums

Similar Items