The Bloom Filter Design for Numerical Range Query

碩士 === 國立交通大學 === 網路工程研究所 === 96 === A Bloom filter is a simple space-efficient randomized data structure for concisely representing a data set. The property of its randomization has great potential for distributed network systems, and it supports the membership query with a small false positive rat...

Full description

Bibliographic Details
Main Authors: Kun-Yang Fan, 范坤揚
Other Authors: Ming-Feng Chang
Format: Others
Language:en_US
Published: 2008
Online Access:http://ndltd.ncl.edu.tw/handle/19742805573464119458
id ndltd-TW-096NCTU5726063
record_format oai_dc
spelling ndltd-TW-096NCTU57260632015-10-13T13:51:51Z http://ndltd.ncl.edu.tw/handle/19742805573464119458 The Bloom Filter Design for Numerical Range Query 在布隆過濾器下改善範圍搜尋方法 Kun-Yang Fan 范坤揚 碩士 國立交通大學 網路工程研究所 96 A Bloom filter is a simple space-efficient randomized data structure for concisely representing a data set. The property of its randomization has great potential for distributed network systems, and it supports the membership query with a small false positive rate, which is the probability that an element was not in the data set but Bloom filter reported it is. There have been many studies on how to improve the correctness of Bloom filter by reducing the false positive rate. However, little research has been done on Bloom filter design for numerical range query. Since a Bloom filter can only represent a limited number of elements, when a large range of numerical attributes are inserted into a Bloom filter, the false positive rate increases dramatically. In this thesis we present efficient Bloom filter design for numerical ranges. First, Division scheme reduces the number of elements inserted by grouping the numerical range into divisions, i.e., numbers in the same division are treated as the same element. On the other hand, Overlapping scheme reduced the number of bits inserted in the Bloom filter by overlapping the inserted bits of consecutive numbers. In addition, Division and Overlapping scheme combines the techniques of the aforementioned two schemes. Analytic model was used to derive the false positive rates of the schemes. Computer simulations were used to verify the correctness of the analytic model. Moreover, the optimal configuration of Bloom filter representing a numeric range of single attribute can be obtained, i.e., the false positive rate is minimized. A heuristic algorithm has been developed to obtain near optimal configurations for multiple attributes. The Division and Overlapping scheme extends the Bloom filter design for numerical range query, where traditional Bloom filter cannot be used. Ming-Feng Chang 張明峰 2008 學位論文 ; thesis 50 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立交通大學 === 網路工程研究所 === 96 === A Bloom filter is a simple space-efficient randomized data structure for concisely representing a data set. The property of its randomization has great potential for distributed network systems, and it supports the membership query with a small false positive rate, which is the probability that an element was not in the data set but Bloom filter reported it is. There have been many studies on how to improve the correctness of Bloom filter by reducing the false positive rate. However, little research has been done on Bloom filter design for numerical range query. Since a Bloom filter can only represent a limited number of elements, when a large range of numerical attributes are inserted into a Bloom filter, the false positive rate increases dramatically. In this thesis we present efficient Bloom filter design for numerical ranges. First, Division scheme reduces the number of elements inserted by grouping the numerical range into divisions, i.e., numbers in the same division are treated as the same element. On the other hand, Overlapping scheme reduced the number of bits inserted in the Bloom filter by overlapping the inserted bits of consecutive numbers. In addition, Division and Overlapping scheme combines the techniques of the aforementioned two schemes. Analytic model was used to derive the false positive rates of the schemes. Computer simulations were used to verify the correctness of the analytic model. Moreover, the optimal configuration of Bloom filter representing a numeric range of single attribute can be obtained, i.e., the false positive rate is minimized. A heuristic algorithm has been developed to obtain near optimal configurations for multiple attributes. The Division and Overlapping scheme extends the Bloom filter design for numerical range query, where traditional Bloom filter cannot be used.
author2 Ming-Feng Chang
author_facet Ming-Feng Chang
Kun-Yang Fan
范坤揚
author Kun-Yang Fan
范坤揚
spellingShingle Kun-Yang Fan
范坤揚
The Bloom Filter Design for Numerical Range Query
author_sort Kun-Yang Fan
title The Bloom Filter Design for Numerical Range Query
title_short The Bloom Filter Design for Numerical Range Query
title_full The Bloom Filter Design for Numerical Range Query
title_fullStr The Bloom Filter Design for Numerical Range Query
title_full_unstemmed The Bloom Filter Design for Numerical Range Query
title_sort bloom filter design for numerical range query
publishDate 2008
url http://ndltd.ncl.edu.tw/handle/19742805573464119458
work_keys_str_mv AT kunyangfan thebloomfilterdesignfornumericalrangequery
AT fànkūnyáng thebloomfilterdesignfornumericalrangequery
AT kunyangfan zàibùlóngguòlǜqìxiàgǎishànfànwéisōuxúnfāngfǎ
AT fànkūnyáng zàibùlóngguòlǜqìxiàgǎishànfànwéisōuxúnfāngfǎ
AT kunyangfan bloomfilterdesignfornumericalrangequery
AT fànkūnyáng bloomfilterdesignfornumericalrangequery
_version_ 1717745303167172608