Testing Closeness of Discrete Distributions

Given samples from two distributions over an n-element set, we wish to test whether these distributions are statistically close. We present an algorithm which uses sublinear in n, specifically, O(n[superscript 2/3]ε[superscript −8/3] log n), independent samples from each distribution, runs in time l...

Full description

Bibliographic Details
Main Authors: Batu, Tugkan (Author), Fortnow, Lance (Author), Rubinfeld, Ronitt (Contributor), Smith, Warren D. (Author), White, Patrick (Author)
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format: Article
Language:English
Published: Association for Computing Machinery (ACM), 2014-10-08T15:03:28Z.
Subjects:
Online Access:Get fulltext
LEADER 01673 am a22002173u 4500
001 90630
042 |a dc 
100 1 0 |a Batu, Tugkan  |e author 
100 1 0 |a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science  |e contributor 
100 1 0 |a Rubinfeld, Ronitt  |e contributor 
700 1 0 |a Fortnow, Lance  |e author 
700 1 0 |a Rubinfeld, Ronitt  |e author 
700 1 0 |a Smith, Warren D.  |e author 
700 1 0 |a White, Patrick  |e author 
245 0 0 |a Testing Closeness of Discrete Distributions 
260 |b Association for Computing Machinery (ACM),   |c 2014-10-08T15:03:28Z. 
856 |z Get fulltext  |u http://hdl.handle.net/1721.1/90630 
520 |a Given samples from two distributions over an n-element set, we wish to test whether these distributions are statistically close. We present an algorithm which uses sublinear in n, specifically, O(n[superscript 2/3]ε[superscript −8/3] log n), independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases when the distance between the distributions is small (less than {ε[superscript 4/3]n[superscript −1/3]/32, εn[superscript −1/2]/4}) or large (more than ε) in ℓ[subscript 1] distance. This result can be compared to the lower bound of Ω(n[superscript 2/3]ε[superscript −2/3]) for this problem given by Valiant [2008]. Our algorithm has applications to the problem of testing whether a given Markov process is rapidly mixing. We present sublinear algorithms for several variants of this problem as well. 
546 |a en_US 
655 7 |a Article 
773 |t Journal of the ACM