Learning-based frequency estimation algorithms

Estimating the frequencies of elements in a data stream is a fundamental task in data analysis and machine learning. The problem is typically addressed using streaming algorithms which can process very large data using limited storage. Today's streaming algorithms, however, cannot exploit patte...

Full description

Bibliographic Details
Main Authors: Hsu, Chen-Yu (Author), Indyk, Piotr (Author), Katabi, Dina (Author), Vakilian, Ali (Author)
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format: Article
Language:English
Published: ICLR, 2021-01-20T16:09:57Z.
Subjects:
Online Access:Get fulltext
LEADER 01711 am a22002173u 4500
001 129467
042 |a dc 
100 1 0 |a Hsu, Chen-Yu  |e author 
100 1 0 |a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science  |e contributor 
700 1 0 |a Indyk, Piotr  |e author 
700 1 0 |a Katabi, Dina  |e author 
700 1 0 |a Vakilian, Ali  |e author 
245 0 0 |a Learning-based frequency estimation algorithms 
260 |b ICLR,   |c 2021-01-20T16:09:57Z. 
856 |z Get fulltext  |u https://hdl.handle.net/1721.1/129467 
520 |a Estimating the frequencies of elements in a data stream is a fundamental task in data analysis and machine learning. The problem is typically addressed using streaming algorithms which can process very large data using limited storage. Today's streaming algorithms, however, cannot exploit patterns in their input to improve performance. We propose a new class of algorithms that automatically learn relevant patterns in the input data and use them to improve its frequency estimates. The proposed algorithms combine the benefits of machine learning with the formal guarantees available through algorithm theory. We prove that our learning-based algorithms have lower estimation errors than their non-learning counterparts. We also evaluate our algorithms on two real-world datasets and demonstrate empirically their performance gains. 
520 |a National Science Foundation (U.S.). Transdisciplinary Research in Principles of Data Science (Award 1740751) 
520 |a National Science Foundation (U.S.). Algorithms in the Field (Award 1535851) 
546 |a en 
655 7 |a Article 
773 |t 7th International Conference on Learning Representations, ICLR 2019