Near-Optimal Data Structure for Approximate Range Emptiness Problem in Information-Centric Internet of Things

The approximate range emptiness problem requires a memory-efficient data structure D to approximately represent a set S of n distinct elements chosen from a large universe U= {0,1,⋯,N-1} and answer an emptiness query of the form “S∩[a;b]=0?” for an interva...

Full description

Bibliographic Details
Main Authors: Xiujun Wang, Zhi Liu, Yan Gao, Xiao Zheng, Xianfu Chen, Celimuge Wu
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8633895/
Description
Summary:The approximate range emptiness problem requires a memory-efficient data structure D to approximately represent a set S of n distinct elements chosen from a large universe U= {0,1,&#x22EF;,N-1} and answer an emptiness query of the form &#x201C;S&#x2229;[a;b]=0?&#x201D; for an interval [a;b] of length L (a,b&#x2208;U), with a false positive rate &#x03B5;. The designed D for this problem can be kept in high-speed memory and quickly determine approximately whether a query interval is empty or not. Thus, it is crucial for facilitating online query processing in the information-centric Internet of Things applications, where the IoT data are continuously generated from a large number of resource-constrained sensors or readers and then are processed in networks. However, the existing works on the approximate range emptiness problem only consider the simple case when the set S is static, rendering them unsuitable for the continuously generated IoT data. In this paper, we study the approximate range emptiness problem over sliding windows in the IoT Data streams, denoted by &#x03B5;-ARESD-problem, where both insertion and deletion are allowed. We first prove that, given a sliding window size n and an interval length L, the lower bound of memory bits needed in any data structure for &#x03B5;-ARESD-problem is n log<sub>2</sub> (nL/&#x03B5;)+&#x0398;(n). Then, a data structure is proposed and proved to be within a factor of 1.33 of the lower bound. The extensive simulation results demonstrate the advantage of the efficiency of our data structure over the baseline approach.
ISSN:2169-3536