Double Distance-Calculation-Pruning for Similarity Search

Many modern applications deal with complex data, where retrieval by similarity plays an important role. Complex data main comparison mechanisms are based on similarity predicates. They are usually immersed in metric spaces where distance functions are employed to express the similarity and a lower b...

Full description

Bibliographic Details
Main Authors:	Ives Renê Venturini Pola, Fernanda Paula Barbosa Pola, Danilo Medeiros Eler
Format:	Article
Language:	English
Published:	MDPI AG 2018-05-01
Series:	Information
Subjects:	information retrieval similarity joins metric indexing
Online Access:	http://www.mdpi.com/2078-2489/9/5/124

id	doaj-6467fa238dc0491c88ab7fa033a01eff
record_format	Article
spelling	doaj-6467fa238dc0491c88ab7fa033a01eff2020-11-25T01:23:53ZengMDPI AGInformation2078-24892018-05-019512410.3390/info9050124info9050124Double Distance-Calculation-Pruning for Similarity SearchIves Renê Venturini Pola0Fernanda Paula Barbosa Pola1Danilo Medeiros Eler2Department of Informatics, Federal University of Technology-UTFPR, 85503390 Pato Branco, PR, BrazilDepartment of Mathematics, Federal University of Technology-UTFPR, 85503390 Pato Branco, PR, BrazilSão Paulo State University—UNESP, Rua Roberto Simonsen, 305. Bairro: Centro Educacional, 9060-900 Presidente Prudente, SP, BrazilMany modern applications deal with complex data, where retrieval by similarity plays an important role. Complex data main comparison mechanisms are based on similarity predicates. They are usually immersed in metric spaces where distance functions are employed to express the similarity and a lower bound property is usually employed to prevent distance calculations. Retrieval by similarity is implemented by unary and binary operators. Most of the studies aimed at improving the efficiency of unary operators, either by using metric access methods or mathematical properties to prune parts of the search space during query answering. Studies on binary operators to solve similarity joins aim to improve efficiency and most of them use only the metric lower bound property for pruning. However, they are dependent on the query parameters, such as the range radius. In this paper, we propose a generic concept that uses both lower and upper bound properties based on the Metric Spaces Theory to increase the avoidance of element comparisons. The concept can be applied on any existing similarity retrieval method. We analyzed the prunability power increase and show an example of its application on classical join nested loops algorithms. Practical evaluation over both synthetic and real data sets shows that our method reduced the number of distance evaluations on similarity joins.http://www.mdpi.com/2078-2489/9/5/124information retrievalsimilarity joinsmetric indexing
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Ives Renê Venturini Pola Fernanda Paula Barbosa Pola Danilo Medeiros Eler
spellingShingle	Ives Renê Venturini Pola Fernanda Paula Barbosa Pola Danilo Medeiros Eler Double Distance-Calculation-Pruning for Similarity Search Information information retrieval similarity joins metric indexing
author_facet	Ives Renê Venturini Pola Fernanda Paula Barbosa Pola Danilo Medeiros Eler
author_sort	Ives Renê Venturini Pola
title	Double Distance-Calculation-Pruning for Similarity Search
title_short	Double Distance-Calculation-Pruning for Similarity Search
title_full	Double Distance-Calculation-Pruning for Similarity Search
title_fullStr	Double Distance-Calculation-Pruning for Similarity Search
title_full_unstemmed	Double Distance-Calculation-Pruning for Similarity Search
title_sort	double distance-calculation-pruning for similarity search
publisher	MDPI AG
series	Information
issn	2078-2489
publishDate	2018-05-01
description	Many modern applications deal with complex data, where retrieval by similarity plays an important role. Complex data main comparison mechanisms are based on similarity predicates. They are usually immersed in metric spaces where distance functions are employed to express the similarity and a lower bound property is usually employed to prevent distance calculations. Retrieval by similarity is implemented by unary and binary operators. Most of the studies aimed at improving the efficiency of unary operators, either by using metric access methods or mathematical properties to prune parts of the search space during query answering. Studies on binary operators to solve similarity joins aim to improve efficiency and most of them use only the metric lower bound property for pruning. However, they are dependent on the query parameters, such as the range radius. In this paper, we propose a generic concept that uses both lower and upper bound properties based on the Metric Spaces Theory to increase the avoidance of element comparisons. The concept can be applied on any existing similarity retrieval method. We analyzed the prunability power increase and show an example of its application on classical join nested loops algorithms. Practical evaluation over both synthetic and real data sets shows that our method reduced the number of distance evaluations on similarity joins.
topic	information retrieval similarity joins metric indexing
url	http://www.mdpi.com/2078-2489/9/5/124
work_keys_str_mv	AT ivesreneventurinipola doubledistancecalculationpruningforsimilaritysearch AT fernandapaulabarbosapola doubledistancecalculationpruningforsimilaritysearch AT danilomedeiroseler doubledistancecalculationpruningforsimilaritysearch
_version_	1725120159749767168

Double Distance-Calculation-Pruning for Similarity Search

Similar Items