On the Randomness of Compressed Data
It seems reasonable to expect from a good compression method that its output should not be further compressible, because it should behave essentially like random data. We investigate this premise for a variety of known lossless compression techniques, and find that, surprisingly, there is much varia...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-04-01
|
Series: | Information |
Subjects: | |
Online Access: | https://www.mdpi.com/2078-2489/11/4/196 |
id |
doaj-08f460d445ba435b860a9ec1ed8f975f |
---|---|
record_format |
Article |
spelling |
doaj-08f460d445ba435b860a9ec1ed8f975f2020-11-25T02:26:48ZengMDPI AGInformation2078-24892020-04-011119619610.3390/info11040196On the Randomness of Compressed DataShmuel T. Klein0Dana Shapira1Computer Science Department, Bar Ilan University, Ramat-Gan 5290002, IsraelComputer Science Department, Data Science and Artificial Intelligence Center, Ariel University, Ariel 40700, IsraelIt seems reasonable to expect from a good compression method that its output should not be further compressible, because it should behave essentially like random data. We investigate this premise for a variety of known lossless compression techniques, and find that, surprisingly, there is much variability in the randomness, depending on the chosen method. Arithmetic coding seems to produce perfectly random output, whereas that of Huffman or Ziv-Lempel coding still contains many dependencies. In particular, the output of Huffman coding has already been proven to be random under certain conditions, and we present evidence here that arithmetic coding may produce an output that is identical to that of Huffman.https://www.mdpi.com/2078-2489/11/4/196data compressionHuffman codingarithmetic codingZiv-Lempel coding |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Shmuel T. Klein Dana Shapira |
spellingShingle |
Shmuel T. Klein Dana Shapira On the Randomness of Compressed Data Information data compression Huffman coding arithmetic coding Ziv-Lempel coding |
author_facet |
Shmuel T. Klein Dana Shapira |
author_sort |
Shmuel T. Klein |
title |
On the Randomness of Compressed Data |
title_short |
On the Randomness of Compressed Data |
title_full |
On the Randomness of Compressed Data |
title_fullStr |
On the Randomness of Compressed Data |
title_full_unstemmed |
On the Randomness of Compressed Data |
title_sort |
on the randomness of compressed data |
publisher |
MDPI AG |
series |
Information |
issn |
2078-2489 |
publishDate |
2020-04-01 |
description |
It seems reasonable to expect from a good compression method that its output should not be further compressible, because it should behave essentially like random data. We investigate this premise for a variety of known lossless compression techniques, and find that, surprisingly, there is much variability in the randomness, depending on the chosen method. Arithmetic coding seems to produce perfectly random output, whereas that of Huffman or Ziv-Lempel coding still contains many dependencies. In particular, the output of Huffman coding has already been proven to be random under certain conditions, and we present evidence here that arithmetic coding may produce an output that is identical to that of Huffman. |
topic |
data compression Huffman coding arithmetic coding Ziv-Lempel coding |
url |
https://www.mdpi.com/2078-2489/11/4/196 |
work_keys_str_mv |
AT shmueltklein ontherandomnessofcompresseddata AT danashapira ontherandomnessofcompresseddata |
_version_ |
1724845576743288832 |