Efficient K-means clustering and the importanceof seeding

Data clustering is the process of grouping data elements based on some aspect of similarity between the elements in the group. Clustering has many applications such as data compression, data mining, pattern recognition and machine learning and there are many different clustering methods. This paper...

Full description

Bibliographic Details
Main Authors:	ELIASSON, PHILIP, ROSÉN, NIKLAS
Format:	Others
Language:	English
Published:	KTH, Skolan för datavetenskap och kommunikation (CSC) 2013
Subjects:	Computer Sciences Datavetenskap (datalogi)
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-134910

id	ndltd-UPSALLA1-oai-DiVA.org-kth-134910
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-kth-1349102018-01-12T05:12:43ZEfficient K-means clustering and the importanceof seedingengELIASSON, PHILIPROSÉN, NIKLASKTH, Skolan för datavetenskap och kommunikation (CSC)KTH, Skolan för datavetenskap och kommunikation (CSC)2013Computer SciencesDatavetenskap (datalogi)Data clustering is the process of grouping data elements based on some aspect of similarity between the elements in the group. Clustering has many applications such as data compression, data mining, pattern recognition and machine learning and there are many different clustering methods. This paper examines the k-means method of clustering and how the choice of initial seeding affects the result. Lloyd’s algorithm is used as a base line and it is compared to an improved algorithm utilizing kd-trees. Two different methods of seeding are compared, random seeding and partial clustering seeding. Klustring av data innebär att man grupperar dataelement baserat på någon typ a likhet mellan de grupperade elementen. Klustring har många olika användningsråden såsom datakompression, datautvinning, mönsterigenkänning, och maskininlärning och det finns många olika klustringsmetoder. Den här uppsatsen undersöker klustringsmetoden k-means och hur valet av startvärden för metoden påverkar resultatet. Lloyds algorithm används som utgångspunkt och den jämförs med en förbättrad algorithm som använder sig av kd-träd. Två olika metoder att välja startvärden jämförs, slumpmässigt val av startvärde och delklustring. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-134910Kandidatexjobb CSC ; K13021application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Computer Sciences Datavetenskap (datalogi)
spellingShingle	Computer Sciences Datavetenskap (datalogi) ELIASSON, PHILIP ROSÉN, NIKLAS Efficient K-means clustering and the importanceof seeding
description	Data clustering is the process of grouping data elements based on some aspect of similarity between the elements in the group. Clustering has many applications such as data compression, data mining, pattern recognition and machine learning and there are many different clustering methods. This paper examines the k-means method of clustering and how the choice of initial seeding affects the result. Lloyd’s algorithm is used as a base line and it is compared to an improved algorithm utilizing kd-trees. Two different methods of seeding are compared, random seeding and partial clustering seeding. === Klustring av data innebär att man grupperar dataelement baserat på någon typ a likhet mellan de grupperade elementen. Klustring har många olika användningsråden såsom datakompression, datautvinning, mönsterigenkänning, och maskininlärning och det finns många olika klustringsmetoder. Den här uppsatsen undersöker klustringsmetoden k-means och hur valet av startvärden för metoden påverkar resultatet. Lloyds algorithm används som utgångspunkt och den jämförs med en förbättrad algorithm som använder sig av kd-träd. Två olika metoder att välja startvärden jämförs, slumpmässigt val av startvärde och delklustring.
author	ELIASSON, PHILIP ROSÉN, NIKLAS
author_facet	ELIASSON, PHILIP ROSÉN, NIKLAS
author_sort	ELIASSON, PHILIP
title	Efficient K-means clustering and the importanceof seeding
title_short	Efficient K-means clustering and the importanceof seeding
title_full	Efficient K-means clustering and the importanceof seeding
title_fullStr	Efficient K-means clustering and the importanceof seeding
title_full_unstemmed	Efficient K-means clustering and the importanceof seeding
title_sort	efficient k-means clustering and the importanceof seeding
publisher	KTH, Skolan för datavetenskap och kommunikation (CSC)
publishDate	2013
url	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-134910
work_keys_str_mv	AT eliassonphilip efficientkmeansclusteringandtheimportanceofseeding AT rosenniklas efficientkmeansclusteringandtheimportanceofseeding
_version_	1718606299617296384

Efficient K-means clustering and the importanceof seeding

Similar Items