A General Framework for Discovering Multiple Data Groupings
Clustering helps users gain insights from their data by discovering hidden structures in an unsupervised way. Unlike classification tasks that are evaluated using well-defined target labels, clustering is an intrinsically subjective task as it depends on the interpretation, need and interest of user...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
Högskolan i Halmstad, Akademin för informationsteknologi
2018
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-38047 |
id |
ndltd-UPSALLA1-oai-DiVA.org-hh-38047 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-hh-380472018-09-26T05:55:33ZA General Framework for Discovering Multiple Data GroupingsengSweidan, DirarHögskolan i Halmstad, Akademin för informationsteknologi2018machine learningunsupervised learningdata miningclusteringmultiple-clusteringsclustering algorithmEngineering and TechnologyTeknik och teknologierComputer SystemsDatorsystemClustering helps users gain insights from their data by discovering hidden structures in an unsupervised way. Unlike classification tasks that are evaluated using well-defined target labels, clustering is an intrinsically subjective task as it depends on the interpretation, need and interest of users. In many real-world applications, multiple meaningful clusterings can be hidden in the data, and different users are interested in exploring different perspectives and use cases of this same data. Despite this, most existing clustering techniques only attempt to produce a single clustering of the data, which can be too strict. In this thesis, a general method is proposed to discover multiple alternative clusterings of the data, and let users select the clustering(s) they are most interested in. In order to cover a large set of possible clustering solutions, a diverse set of clusterings is first generated based on various projections of the data. Then, similar clusterings are found, filtered, and aggregated into one representative clustering, allowing the user to only explore a small set of non-redundant representative clusterings. We compare the proposed method against others and analyze its advantages and disadvantages, based on artificial and real-world datasets, as well as on images enabling a visual assessment of the meaningfulness of the discovered clustering solutions. On the other hand, extensive studies and analysis concerning a variety of techniques used in the method are made. Results show that the proposed method is able to discover multiple interesting and meaningful clustering solutions. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-38047application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
machine learning unsupervised learning data mining clustering multiple-clusterings clustering algorithm Engineering and Technology Teknik och teknologier Computer Systems Datorsystem |
spellingShingle |
machine learning unsupervised learning data mining clustering multiple-clusterings clustering algorithm Engineering and Technology Teknik och teknologier Computer Systems Datorsystem Sweidan, Dirar A General Framework for Discovering Multiple Data Groupings |
description |
Clustering helps users gain insights from their data by discovering hidden structures in an unsupervised way. Unlike classification tasks that are evaluated using well-defined target labels, clustering is an intrinsically subjective task as it depends on the interpretation, need and interest of users. In many real-world applications, multiple meaningful clusterings can be hidden in the data, and different users are interested in exploring different perspectives and use cases of this same data. Despite this, most existing clustering techniques only attempt to produce a single clustering of the data, which can be too strict. In this thesis, a general method is proposed to discover multiple alternative clusterings of the data, and let users select the clustering(s) they are most interested in. In order to cover a large set of possible clustering solutions, a diverse set of clusterings is first generated based on various projections of the data. Then, similar clusterings are found, filtered, and aggregated into one representative clustering, allowing the user to only explore a small set of non-redundant representative clusterings. We compare the proposed method against others and analyze its advantages and disadvantages, based on artificial and real-world datasets, as well as on images enabling a visual assessment of the meaningfulness of the discovered clustering solutions. On the other hand, extensive studies and analysis concerning a variety of techniques used in the method are made. Results show that the proposed method is able to discover multiple interesting and meaningful clustering solutions. |
author |
Sweidan, Dirar |
author_facet |
Sweidan, Dirar |
author_sort |
Sweidan, Dirar |
title |
A General Framework for Discovering Multiple Data Groupings |
title_short |
A General Framework for Discovering Multiple Data Groupings |
title_full |
A General Framework for Discovering Multiple Data Groupings |
title_fullStr |
A General Framework for Discovering Multiple Data Groupings |
title_full_unstemmed |
A General Framework for Discovering Multiple Data Groupings |
title_sort |
general framework for discovering multiple data groupings |
publisher |
Högskolan i Halmstad, Akademin för informationsteknologi |
publishDate |
2018 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-38047 |
work_keys_str_mv |
AT sweidandirar ageneralframeworkfordiscoveringmultipledatagroupings AT sweidandirar generalframeworkfordiscoveringmultipledatagroupings |
_version_ |
1718742743363092480 |