A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters
In high-dimensional spaces, accuracy and similarity search by low computing and storage costs are always difficult research topics, and there is a balance between efficiency and accuracy. In this paper, we propose a new structure Similar-PBF-PHT to represent items of a set with high dimensions and r...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2016-01-01
|
Series: | Computational Intelligence and Neuroscience |
Online Access: | http://dx.doi.org/10.1155/2016/4075257 |
id |
doaj-e583ea1789724918a9c79265500aa185 |
---|---|
record_format |
Article |
spelling |
doaj-e583ea1789724918a9c79265500aa1852020-11-24T22:01:43ZengHindawi LimitedComputational Intelligence and Neuroscience1687-52651687-52732016-01-01201610.1155/2016/40752574075257A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom FiltersChunyan Shuai0Hengcheng Yang1Xin Ouyang2Siqi Li3Zheng Chen4Faculty of Electric Power Engineering, Kunming University of Science and Technology, Kunming 650051, ChinaFaculty of Electric Power Engineering, Kunming University of Science and Technology, Kunming 650051, ChinaFaculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650051, ChinaFaculty of Electric Power Engineering, Kunming University of Science and Technology, Kunming 650051, ChinaFaculty of Transportation Engineering, Kunming University of Science and Technology, Kunming 650051, ChinaIn high-dimensional spaces, accuracy and similarity search by low computing and storage costs are always difficult research topics, and there is a balance between efficiency and accuracy. In this paper, we propose a new structure Similar-PBF-PHT to represent items of a set with high dimensions and retrieve accurate and similar items. The Similar-PBF-PHT contains three parts: parallel bloom filters (PBFs), parallel hash tables (PHTs), and a bitmatrix. Experiments show that the Similar-PBF-PHT is effective in membership query and K-nearest neighbors (K-NN) search. With accurate querying, the Similar-PBF-PHT owns low hit false positive probability (FPP) and acceptable memory costs. With K-NN querying, the average overall ratio and rank-i ratio of the Hamming distance are accurate and ratios of the Euclidean distance are acceptable. It takes CPU time not I/O times to retrieve accurate and similar items and can deal with different data formats not only numerical values.http://dx.doi.org/10.1155/2016/4075257 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Chunyan Shuai Hengcheng Yang Xin Ouyang Siqi Li Zheng Chen |
spellingShingle |
Chunyan Shuai Hengcheng Yang Xin Ouyang Siqi Li Zheng Chen A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters Computational Intelligence and Neuroscience |
author_facet |
Chunyan Shuai Hengcheng Yang Xin Ouyang Siqi Li Zheng Chen |
author_sort |
Chunyan Shuai |
title |
A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters |
title_short |
A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters |
title_full |
A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters |
title_fullStr |
A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters |
title_full_unstemmed |
A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters |
title_sort |
novel accuracy and similarity search structure based on parallel bloom filters |
publisher |
Hindawi Limited |
series |
Computational Intelligence and Neuroscience |
issn |
1687-5265 1687-5273 |
publishDate |
2016-01-01 |
description |
In high-dimensional spaces, accuracy and similarity search by low computing and storage costs are always difficult research topics, and there is a balance between efficiency and accuracy. In this paper, we propose a new structure Similar-PBF-PHT to represent items of a set with high dimensions and retrieve accurate and similar items. The Similar-PBF-PHT contains three parts: parallel bloom filters (PBFs), parallel hash tables (PHTs), and a bitmatrix. Experiments show that the Similar-PBF-PHT is effective in membership query and K-nearest neighbors (K-NN) search. With accurate querying, the Similar-PBF-PHT owns low hit false positive probability (FPP) and acceptable memory costs. With K-NN querying, the average overall ratio and rank-i ratio of the Hamming distance are accurate and ratios of the Euclidean distance are acceptable. It takes CPU time not I/O times to retrieve accurate and similar items and can deal with different data formats not only numerical values. |
url |
http://dx.doi.org/10.1155/2016/4075257 |
work_keys_str_mv |
AT chunyanshuai anovelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters AT hengchengyang anovelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters AT xinouyang anovelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters AT siqili anovelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters AT zhengchen anovelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters AT chunyanshuai novelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters AT hengchengyang novelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters AT xinouyang novelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters AT siqili novelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters AT zhengchen novelaccuracyandsimilaritysearchstructurebasedonparallelbloomfilters |
_version_ |
1725838962484838400 |