Estimating entropy of distributions in constant space

We consider the task of estimating the entropy of k-ary distributions from samples in the streaming model, where space is limited. Our main contribution is an algorithm that requires O ( klog(1"3/")2 ) samples and a constant O(1) memory words of space and outputs a ±" estimate of H(p)...

Full description

Bibliographic Details
Main Author: Indyk, Piotr (Author)
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format: Article
Language:English
Published: Neural Information Processing Systems Foundation, 2021-01-25T19:53:49Z.
Subjects:
Online Access:Get fulltext
LEADER 01436 am a22001693u 4500
001 129554
042 |a dc 
100 1 0 |a Indyk, Piotr  |e author 
100 1 0 |a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science  |e contributor 
245 0 0 |a Estimating entropy of distributions in constant space 
260 |b Neural Information Processing Systems Foundation,   |c 2021-01-25T19:53:49Z. 
856 |z Get fulltext  |u https://hdl.handle.net/1721.1/129554 
520 |a We consider the task of estimating the entropy of k-ary distributions from samples in the streaming model, where space is limited. Our main contribution is an algorithm that requires O ( klog(1"3/")2 ) samples and a constant O(1) memory words of space and outputs a ±" estimate of H(p). Without space limitations, the sample complexity has been established as S(k, ") = T ( "logkk + log"22 k 0, which is sub-linear in the domain size k, and the current algorithms that achieve optimal sample complexity also require nearly-linear space in k. Our algorithm partitions [0, 1] into intervals and estimates the entropy contribution of probability values in each interval. The intervals are designed to trade off the bias and variance of these estimates. 
520 |a National Science Foundation (U.S.). Computing and Communication Foundation (Grant 657471) 
546 |a en 
655 7 |a Article 
773 |t Advances in Neural Information Processing Systems