A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters

In high-dimensional spaces, accuracy and similarity search by low computing and storage costs are always difficult research topics, and there is a balance between efficiency and accuracy. In this paper, we propose a new structure Similar-PBF-PHT to represent items of a set with high dimensions and r...

Full description

Bibliographic Details
Main Authors: Chunyan Shuai, Hengcheng Yang, Xin Ouyang, Siqi Li, Zheng Chen
Format: Article
Language:English
Published: Hindawi Limited 2016-01-01
Series:Computational Intelligence and Neuroscience
Online Access:http://dx.doi.org/10.1155/2016/4075257
id doaj-e583ea1789724918a9c79265500aa185
record_format Article
spelling doaj-e583ea1789724918a9c79265500aa1852020-11-24T22:01:43ZengHindawi LimitedComputational Intelligence and Neuroscience1687-52651687-52732016-01-01201610.1155/2016/40752574075257A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom FiltersChunyan Shuai0Hengcheng Yang1Xin Ouyang2Siqi Li3Zheng Chen4Faculty of Electric Power Engineering, Kunming University of Science and Technology, Kunming 650051, ChinaFaculty of Electric Power Engineering, Kunming University of Science and Technology, Kunming 650051, ChinaFaculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650051, ChinaFaculty of Electric Power Engineering, Kunming University of Science and Technology, Kunming 650051, ChinaFaculty of Transportation Engineering, Kunming University of Science and Technology, Kunming 650051, ChinaIn high-dimensional spaces, accuracy and similarity search by low computing and storage costs are always difficult research topics, and there is a balance between efficiency and accuracy. In this paper, we propose a new structure Similar-PBF-PHT to represent items of a set with high dimensions and retrieve accurate and similar items. The Similar-PBF-PHT contains three parts: parallel bloom filters (PBFs), parallel hash tables (PHTs), and a bitmatrix. Experiments show that the Similar-PBF-PHT is effective in membership query and K-nearest neighbors (K-NN) search. With accurate querying, the Similar-PBF-PHT owns low hit false positive probability (FPP) and acceptable memory costs. With K-NN querying, the average overall ratio and rank-i ratio of the Hamming distance are accurate and ratios of the Euclidean distance are acceptable. It takes CPU time not I/O times to retrieve accurate and similar items and can deal with different data formats not only numerical values.http://dx.doi.org/10.1155/2016/4075257
collection DOAJ
language English
format Article
sources DOAJ
author Chunyan Shuai
Hengcheng Yang
Xin Ouyang
Siqi Li
Zheng Chen
spellingShingle Chunyan Shuai
Hengcheng Yang
Xin Ouyang
Siqi Li
Zheng Chen
A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters
Computational Intelligence and Neuroscience
author_facet Chunyan Shuai
Hengcheng Yang
Xin Ouyang
Siqi Li
Zheng Chen
author_sort Chunyan Shuai
title A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters
title_short A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters
title_full A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters
title_fullStr A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters
title_full_unstemmed A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters
title_sort novel accuracy and similarity search structure based on parallel bloom filters
publisher Hindawi Limited
series Computational Intelligence and Neuroscience
issn 1687-5265
1687-5273
publishDate 2016-01-01
description In high-dimensional spaces, accuracy and similarity search by low computing and storage costs are always difficult research topics, and there is a balance between efficiency and accuracy. In this paper, we propose a new structure Similar-PBF-PHT to represent items of a set with high dimensions and retrieve accurate and similar items. The Similar-PBF-PHT contains three parts: parallel bloom filters (PBFs), parallel hash tables (PHTs), and a bitmatrix. Experiments show that the Similar-PBF-PHT is effective in membership query and K-nearest neighbors (K-NN) search. With accurate querying, the Similar-PBF-PHT owns low hit false positive probability (FPP) and acceptable memory costs. With K-NN querying, the average overall ratio and rank-i ratio of the Hamming distance are accurate and ratios of the Euclidean distance are acceptable. It takes CPU time not I/O times to retrieve accurate and similar items and can deal with different data formats not only numerical values.
url http://dx.doi.org/10.1155/2016/4075257
work_keys_str_mv AT chunyanshuai anovelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters
AT hengchengyang anovelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters
AT xinouyang anovelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters
AT siqili anovelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters
AT zhengchen anovelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters
AT chunyanshuai novelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters
AT hengchengyang novelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters
AT xinouyang novelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters
AT siqili novelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters
AT zhengchen novelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters
_version_ 1725838962484838400