On the Randomness of Compressed Data

It seems reasonable to expect from a good compression method that its output should not be further compressible, because it should behave essentially like random data. We investigate this premise for a variety of known lossless compression techniques, and find that, surprisingly, there is much varia...

Full description

Bibliographic Details
Main Authors:	Shmuel T. Klein, Dana Shapira
Format:	Article
Language:	English
Published:	MDPI AG 2020-04-01
Series:	Information
Subjects:	data compression Huffman coding arithmetic coding Ziv-Lempel coding
Online Access:	https://www.mdpi.com/2078-2489/11/4/196

id	doaj-08f460d445ba435b860a9ec1ed8f975f
record_format	Article
spelling	doaj-08f460d445ba435b860a9ec1ed8f975f2020-11-25T02:26:48ZengMDPI AGInformation2078-24892020-04-011119619610.3390/info11040196On the Randomness of Compressed DataShmuel T. Klein0Dana Shapira1Computer Science Department, Bar Ilan University, Ramat-Gan 5290002, IsraelComputer Science Department, Data Science and Artificial Intelligence Center, Ariel University, Ariel 40700, IsraelIt seems reasonable to expect from a good compression method that its output should not be further compressible, because it should behave essentially like random data. We investigate this premise for a variety of known lossless compression techniques, and find that, surprisingly, there is much variability in the randomness, depending on the chosen method. Arithmetic coding seems to produce perfectly random output, whereas that of Huffman or Ziv-Lempel coding still contains many dependencies. In particular, the output of Huffman coding has already been proven to be random under certain conditions, and we present evidence here that arithmetic coding may produce an output that is identical to that of Huffman.https://www.mdpi.com/2078-2489/11/4/196data compressionHuffman codingarithmetic codingZiv-Lempel coding
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Shmuel T. Klein Dana Shapira
spellingShingle	Shmuel T. Klein Dana Shapira On the Randomness of Compressed Data Information data compression Huffman coding arithmetic coding Ziv-Lempel coding
author_facet	Shmuel T. Klein Dana Shapira
author_sort	Shmuel T. Klein
title	On the Randomness of Compressed Data
title_short	On the Randomness of Compressed Data
title_full	On the Randomness of Compressed Data
title_fullStr	On the Randomness of Compressed Data
title_full_unstemmed	On the Randomness of Compressed Data
title_sort	on the randomness of compressed data
publisher	MDPI AG
series	Information
issn	2078-2489
publishDate	2020-04-01
description	It seems reasonable to expect from a good compression method that its output should not be further compressible, because it should behave essentially like random data. We investigate this premise for a variety of known lossless compression techniques, and find that, surprisingly, there is much variability in the randomness, depending on the chosen method. Arithmetic coding seems to produce perfectly random output, whereas that of Huffman or Ziv-Lempel coding still contains many dependencies. In particular, the output of Huffman coding has already been proven to be random under certain conditions, and we present evidence here that arithmetic coding may produce an output that is identical to that of Huffman.
topic	data compression Huffman coding arithmetic coding Ziv-Lempel coding
url	https://www.mdpi.com/2078-2489/11/4/196
work_keys_str_mv	AT shmueltklein ontherandomnessofcompresseddata AT danashapira ontherandomnessofcompresseddata
_version_	1724845576743288832

On the Randomness of Compressed Data

Similar Items