Scalable nearest neighbour methods for high dimensional data

For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbour matches to high dimensional vectors that repres...

Full description

Bibliographic Details
Main Author:	Muja, Marius
Language:	English
Published:	University of British Columbia 2013
Online Access:	http://hdl.handle.net/2429/44402

id	ndltd-UBC-oai-circle.library.ubc.ca-2429-44402
record_format	oai_dc
spelling	ndltd-UBC-oai-circle.library.ubc.ca-2429-444022018-01-05T17:26:35Z Scalable nearest neighbour methods for high dimensional data Muja, Marius For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbour matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbour matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this thesis, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbour algorithm and its parameters depend on the dataset characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular dataset. In order to scale to very large datasets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbour matching framework that can be used with any of the algorithms described in the thesis. All this research has been released as an open source library called FLANN (Fast Library for Approximate Nearest Neighbours), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbour matching. Science, Faculty of Computer Science, Department of Graduate 2013-04-30T17:53:12Z 2013-05-01T09:28:15Z 2013 2013-11 Text Thesis/Dissertation http://hdl.handle.net/2429/44402 eng Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ University of British Columbia
collection	NDLTD
language	English
sources	NDLTD
description	For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbour matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbour matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this thesis, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbour algorithm and its parameters depend on the dataset characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular dataset. In order to scale to very large datasets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbour matching framework that can be used with any of the algorithms described in the thesis. All this research has been released as an open source library called FLANN (Fast Library for Approximate Nearest Neighbours), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbour matching. === Science, Faculty of === Computer Science, Department of === Graduate
author	Muja, Marius
spellingShingle	Muja, Marius Scalable nearest neighbour methods for high dimensional data
author_facet	Muja, Marius
author_sort	Muja, Marius
title	Scalable nearest neighbour methods for high dimensional data
title_short	Scalable nearest neighbour methods for high dimensional data
title_full	Scalable nearest neighbour methods for high dimensional data
title_fullStr	Scalable nearest neighbour methods for high dimensional data
title_full_unstemmed	Scalable nearest neighbour methods for high dimensional data
title_sort	scalable nearest neighbour methods for high dimensional data
publisher	University of British Columbia
publishDate	2013
url	http://hdl.handle.net/2429/44402
work_keys_str_mv	AT mujamarius scalablenearestneighbourmethodsforhighdimensionaldata
_version_	1718583807360106496

Scalable nearest neighbour methods for high dimensional data

Similar Items