The Bloom Filter Design for Numerical Range Query
碩士 === 國立交通大學 === 網路工程研究所 === 96 === A Bloom filter is a simple space-efficient randomized data structure for concisely representing a data set. The property of its randomization has great potential for distributed network systems, and it supports the membership query with a small false positive rat...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2008
|
Online Access: | http://ndltd.ncl.edu.tw/handle/19742805573464119458 |
id |
ndltd-TW-096NCTU5726063 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-096NCTU57260632015-10-13T13:51:51Z http://ndltd.ncl.edu.tw/handle/19742805573464119458 The Bloom Filter Design for Numerical Range Query 在布隆過濾器下改善範圍搜尋方法 Kun-Yang Fan 范坤揚 碩士 國立交通大學 網路工程研究所 96 A Bloom filter is a simple space-efficient randomized data structure for concisely representing a data set. The property of its randomization has great potential for distributed network systems, and it supports the membership query with a small false positive rate, which is the probability that an element was not in the data set but Bloom filter reported it is. There have been many studies on how to improve the correctness of Bloom filter by reducing the false positive rate. However, little research has been done on Bloom filter design for numerical range query. Since a Bloom filter can only represent a limited number of elements, when a large range of numerical attributes are inserted into a Bloom filter, the false positive rate increases dramatically. In this thesis we present efficient Bloom filter design for numerical ranges. First, Division scheme reduces the number of elements inserted by grouping the numerical range into divisions, i.e., numbers in the same division are treated as the same element. On the other hand, Overlapping scheme reduced the number of bits inserted in the Bloom filter by overlapping the inserted bits of consecutive numbers. In addition, Division and Overlapping scheme combines the techniques of the aforementioned two schemes. Analytic model was used to derive the false positive rates of the schemes. Computer simulations were used to verify the correctness of the analytic model. Moreover, the optimal configuration of Bloom filter representing a numeric range of single attribute can be obtained, i.e., the false positive rate is minimized. A heuristic algorithm has been developed to obtain near optimal configurations for multiple attributes. The Division and Overlapping scheme extends the Bloom filter design for numerical range query, where traditional Bloom filter cannot be used. Ming-Feng Chang 張明峰 2008 學位論文 ; thesis 50 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立交通大學 === 網路工程研究所 === 96 === A Bloom filter is a simple space-efficient randomized data structure for concisely representing a data set. The property of its randomization has great potential for distributed network systems, and it supports the membership query with a small false positive rate, which is the probability that an element was not in the data set but Bloom filter reported it is. There have been many studies on how to improve the correctness of Bloom filter by reducing the false positive rate. However, little research has been done on Bloom filter design for numerical range query. Since a Bloom filter can only represent a limited number of elements, when a large range of numerical attributes are inserted into a Bloom filter, the false positive rate increases dramatically. In this thesis we present efficient Bloom filter design for numerical ranges. First, Division scheme reduces the number of elements inserted by grouping the numerical range into divisions, i.e., numbers in the same division are treated as the same element. On the other hand, Overlapping scheme reduced the number of bits inserted in the Bloom filter by overlapping the inserted bits of consecutive numbers. In addition, Division and Overlapping scheme combines the techniques of the aforementioned two schemes. Analytic model was used to derive the false positive rates of the schemes. Computer simulations were used to verify the correctness of the analytic model. Moreover, the optimal configuration of Bloom filter representing a numeric range of single attribute can be obtained, i.e., the false positive rate is minimized. A heuristic algorithm has been developed to obtain near optimal configurations for multiple attributes. The Division and Overlapping scheme extends the Bloom filter design for numerical range query, where traditional Bloom filter cannot be used.
|
author2 |
Ming-Feng Chang |
author_facet |
Ming-Feng Chang Kun-Yang Fan 范坤揚 |
author |
Kun-Yang Fan 范坤揚 |
spellingShingle |
Kun-Yang Fan 范坤揚 The Bloom Filter Design for Numerical Range Query |
author_sort |
Kun-Yang Fan |
title |
The Bloom Filter Design for Numerical Range Query |
title_short |
The Bloom Filter Design for Numerical Range Query |
title_full |
The Bloom Filter Design for Numerical Range Query |
title_fullStr |
The Bloom Filter Design for Numerical Range Query |
title_full_unstemmed |
The Bloom Filter Design for Numerical Range Query |
title_sort |
bloom filter design for numerical range query |
publishDate |
2008 |
url |
http://ndltd.ncl.edu.tw/handle/19742805573464119458 |
work_keys_str_mv |
AT kunyangfan thebloomfilterdesignfornumericalrangequery AT fànkūnyáng thebloomfilterdesignfornumericalrangequery AT kunyangfan zàibùlóngguòlǜqìxiàgǎishànfànwéisōuxúnfāngfǎ AT fànkūnyáng zàibùlóngguòlǜqìxiàgǎishànfànwéisōuxúnfāngfǎ AT kunyangfan bloomfilterdesignfornumericalrangequery AT fànkūnyáng bloomfilterdesignfornumericalrangequery |
_version_ |
1717745303167172608 |