Characterizing the Asymptotic Per-Symbol Redundancy of Memoryless Sources over Countable Alphabets in Terms of Single-Letter Marginals

The minimum expected number of bits needed to describe a random variable is its entropy, assuming knowledge of the distribution of the random variable. On the other hand, universal compression describes data supposing that the underlying distribution is unknown, but that it belongs to a known set Ρ...

Full description

Bibliographic Details
Main Authors:	Maryam Hosseini, Narayana Santhanam
Format:	Article
Language:	English
Published:	MDPI AG 2014-07-01
Series:	Entropy
Subjects:	universal compression redundancy large alphabets tightness redundancy-capacity theorem
Online Access:	http://www.mdpi.com/1099-4300/16/7/4168

id	doaj-fccff672b62d4abf916e6a5cb16038d7
record_format	Article
spelling	doaj-fccff672b62d4abf916e6a5cb16038d72020-11-24T22:53:44ZengMDPI AGEntropy1099-43002014-07-011674168418410.3390/e16074168e16074168Characterizing the Asymptotic Per-Symbol Redundancy of Memoryless Sources over Countable Alphabets in Terms of Single-Letter MarginalsMaryam Hosseini0Narayana Santhanam1Department of Electrical Engineering, University of Hawaii at Manoa, Honolulu, HI 96822, USADepartment of Electrical Engineering, University of Hawaii at Manoa, Honolulu, HI 96822, USAThe minimum expected number of bits needed to describe a random variable is its entropy, assuming knowledge of the distribution of the random variable. On the other hand, universal compression describes data supposing that the underlying distribution is unknown, but that it belongs to a known set Ρ of distributions. However, since universal descriptions are not matched exactly to the underlying distribution, the number of bits they use on average is higher, and the excess over the entropy used is the redundancy. In this paper, we study the redundancy incurred by the universal description of strings of positive integers (Z+), the strings being generated independently and identically distributed (i.i.d.) according an unknown distribution over Z+ in a known collection P. We first show that if describing a single symbol incurs finite redundancy, then P is tight, but that the converse does not always hold. If a single symbol can be described with finite worst-case regret (a more stringent formulation than redundancy above), then it is known that describing length n i.i.d. strings only incurs vanishing (to zero) redundancy per symbol as n increases. On the contrary, we show it is possible that the description of a single symbol from an unknown distribution of P incurs finite redundancy, yet the description of length n i.i.d. strings incurs a constant (> 0) redundancy per symbol encoded. We then show a sufficient condition on single-letter marginals, such that length n i.i.d. samples will incur vanishing redundancy per symbol encoded.http://www.mdpi.com/1099-4300/16/7/4168universal compressionredundancylarge alphabetstightnessredundancy-capacity theorem
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Maryam Hosseini Narayana Santhanam
spellingShingle	Maryam Hosseini Narayana Santhanam Characterizing the Asymptotic Per-Symbol Redundancy of Memoryless Sources over Countable Alphabets in Terms of Single-Letter Marginals Entropy universal compression redundancy large alphabets tightness redundancy-capacity theorem
author_facet	Maryam Hosseini Narayana Santhanam
author_sort	Maryam Hosseini
title	Characterizing the Asymptotic Per-Symbol Redundancy of Memoryless Sources over Countable Alphabets in Terms of Single-Letter Marginals
title_short	Characterizing the Asymptotic Per-Symbol Redundancy of Memoryless Sources over Countable Alphabets in Terms of Single-Letter Marginals
title_full	Characterizing the Asymptotic Per-Symbol Redundancy of Memoryless Sources over Countable Alphabets in Terms of Single-Letter Marginals
title_fullStr	Characterizing the Asymptotic Per-Symbol Redundancy of Memoryless Sources over Countable Alphabets in Terms of Single-Letter Marginals
title_full_unstemmed	Characterizing the Asymptotic Per-Symbol Redundancy of Memoryless Sources over Countable Alphabets in Terms of Single-Letter Marginals
title_sort	characterizing the asymptotic per-symbol redundancy of memoryless sources over countable alphabets in terms of single-letter marginals
publisher	MDPI AG
series	Entropy
issn	1099-4300
publishDate	2014-07-01
description	The minimum expected number of bits needed to describe a random variable is its entropy, assuming knowledge of the distribution of the random variable. On the other hand, universal compression describes data supposing that the underlying distribution is unknown, but that it belongs to a known set Ρ of distributions. However, since universal descriptions are not matched exactly to the underlying distribution, the number of bits they use on average is higher, and the excess over the entropy used is the redundancy. In this paper, we study the redundancy incurred by the universal description of strings of positive integers (Z+), the strings being generated independently and identically distributed (i.i.d.) according an unknown distribution over Z+ in a known collection P. We first show that if describing a single symbol incurs finite redundancy, then P is tight, but that the converse does not always hold. If a single symbol can be described with finite worst-case regret (a more stringent formulation than redundancy above), then it is known that describing length n i.i.d. strings only incurs vanishing (to zero) redundancy per symbol as n increases. On the contrary, we show it is possible that the description of a single symbol from an unknown distribution of P incurs finite redundancy, yet the description of length n i.i.d. strings incurs a constant (> 0) redundancy per symbol encoded. We then show a sufficient condition on single-letter marginals, such that length n i.i.d. samples will incur vanishing redundancy per symbol encoded.
topic	universal compression redundancy large alphabets tightness redundancy-capacity theorem
url	http://www.mdpi.com/1099-4300/16/7/4168
work_keys_str_mv	AT maryamhosseini characterizingtheasymptoticpersymbolredundancyofmemorylesssourcesovercountablealphabetsintermsofsinglelettermarginals AT narayanasanthanam characterizingtheasymptoticpersymbolredundancyofmemorylesssourcesovercountablealphabetsintermsofsinglelettermarginals
_version_	1725662195639910400

Characterizing the Asymptotic Per-Symbol Redundancy of Memoryless Sources over Countable Alphabets in Terms of Single-Letter Marginals

Similar Items