CVIC: Cluster Validation Using Instance-Based Confidences

As unlabeled data becomes increasingly available, the need for robust data mining techniques increases as well. Clustering is a common data mining tool which seeks to find related, independent patterns in data called clusters. The cluster validation problem addresses the question of how well a given...

Full description

Bibliographic Details
Main Author:	LeBaron, Dean M
Format:	Others
Published:	BYU ScholarsArchive 2015
Subjects:	clustering validation cluster confidence supervised learners Computer Sciences
Online Access:	https://scholarsarchive.byu.edu/etd/5736 https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=6735&context=etd

id	ndltd-BGMYU2-oai-scholarsarchive.byu.edu-etd-6735
record_format	oai_dc
spelling	ndltd-BGMYU2-oai-scholarsarchive.byu.edu-etd-67352019-05-16T03:27:18Z CVIC: Cluster Validation Using Instance-Based Confidences LeBaron, Dean M As unlabeled data becomes increasingly available, the need for robust data mining techniques increases as well. Clustering is a common data mining tool which seeks to find related, independent patterns in data called clusters. The cluster validation problem addresses the question of how well a given clustering fits the data set. We present CVIC (cluster validation using instance-based confidences) which assigns confidence scores to each individual instance, as opposed to more traditional methods which focus on the clusters themselves. CVIC trains supervised learners to recreate the clustering, and instances are scored based on output from the learners which corresponds to the confidence that the instance was clustered correctly. One consequence of individually validated instances is the ability to direct users to instances in a cluster that are either potentially misclustered or correctly clustered. Instances with low confidences can either be manually inspected or reclustered and instances with high confidences can be automatically labeled. We compare CVIC to three competing methods for assigning confidence scores and show results on CVIC's ability to successfully assign scores that result in higher average precision and recall for detecting misclustered and correctly clustered instances across five clustering algorithms on twenty data sets including handwritten historical image data provided by Ancestry.com. 2015-11-01T07:00:00Z text application/pdf https://scholarsarchive.byu.edu/etd/5736 https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=6735&context=etd http://lib.byu.edu/about/copyright/ All Theses and Dissertations BYU ScholarsArchive clustering validation cluster confidence supervised learners Computer Sciences
collection	NDLTD
format	Others
sources	NDLTD
topic	clustering validation cluster confidence supervised learners Computer Sciences
spellingShingle	clustering validation cluster confidence supervised learners Computer Sciences LeBaron, Dean M CVIC: Cluster Validation Using Instance-Based Confidences
description	As unlabeled data becomes increasingly available, the need for robust data mining techniques increases as well. Clustering is a common data mining tool which seeks to find related, independent patterns in data called clusters. The cluster validation problem addresses the question of how well a given clustering fits the data set. We present CVIC (cluster validation using instance-based confidences) which assigns confidence scores to each individual instance, as opposed to more traditional methods which focus on the clusters themselves. CVIC trains supervised learners to recreate the clustering, and instances are scored based on output from the learners which corresponds to the confidence that the instance was clustered correctly. One consequence of individually validated instances is the ability to direct users to instances in a cluster that are either potentially misclustered or correctly clustered. Instances with low confidences can either be manually inspected or reclustered and instances with high confidences can be automatically labeled. We compare CVIC to three competing methods for assigning confidence scores and show results on CVIC's ability to successfully assign scores that result in higher average precision and recall for detecting misclustered and correctly clustered instances across five clustering algorithms on twenty data sets including handwritten historical image data provided by Ancestry.com.
author	LeBaron, Dean M
author_facet	LeBaron, Dean M
author_sort	LeBaron, Dean M
title	CVIC: Cluster Validation Using Instance-Based Confidences
title_short	CVIC: Cluster Validation Using Instance-Based Confidences
title_full	CVIC: Cluster Validation Using Instance-Based Confidences
title_fullStr	CVIC: Cluster Validation Using Instance-Based Confidences
title_full_unstemmed	CVIC: Cluster Validation Using Instance-Based Confidences
title_sort	cvic: cluster validation using instance-based confidences
publisher	BYU ScholarsArchive
publishDate	2015
url	https://scholarsarchive.byu.edu/etd/5736 https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=6735&context=etd
work_keys_str_mv	AT lebarondeanm cvicclustervalidationusinginstancebasedconfidences
_version_	1719186135060578304

CVIC: Cluster Validation Using Instance-Based Confidences

Similar Items