Lossless text compression using GPT-2 language model and Huffman coding
Modern daily life activities produced lots of information for the advancement of telecommunication. It is a challenging issue to store them on a digital device or transmit it over the Internet, leading to the necessity for data compression. Thus, research on data compression to solve the issue has b...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
EDP Sciences
2021-01-01
|
Series: | SHS Web of Conferences |
Online Access: | https://www.shs-conferences.org/articles/shsconf/pdf/2021/13/shsconf_etltc2021_04013.pdf |
id |
doaj-af6f5e98372d4eaaa405edb2bd903277 |
---|---|
record_format |
Article |
spelling |
doaj-af6f5e98372d4eaaa405edb2bd9032772021-05-04T12:25:01ZengEDP SciencesSHS Web of Conferences2261-24242021-01-011020401310.1051/shsconf/202110204013shsconf_etltc2021_04013Lossless text compression using GPT-2 language model and Huffman codingRahman Md. Atiqur0Hamada Mohamed1School of Computer Science and Engineering, The University of AizuSchool of Computer Science and Engineering, The University of AizuModern daily life activities produced lots of information for the advancement of telecommunication. It is a challenging issue to store them on a digital device or transmit it over the Internet, leading to the necessity for data compression. Thus, research on data compression to solve the issue has become a topic of great interest to researchers. Moreover, the size of compressed data is generally smaller than its original. As a result, data compression saves storage and increases transmission speed. In this article, we propose a text compression technique using GPT-2 language model and Huffman coding. In this proposed method, Burrows-Wheeler transform and a list of keys are used to reduce the original text file’s length. Finally, we apply GPT-2 language mode and then Huffman coding for encoding. This proposed method is compared with the state-of-the-art techniques used for text compression. Finally, we show that the proposed method demonstrates a gain in compression ratio compared to the other state-of-the-art methods.https://www.shs-conferences.org/articles/shsconf/pdf/2021/13/shsconf_etltc2021_04013.pdf |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Rahman Md. Atiqur Hamada Mohamed |
spellingShingle |
Rahman Md. Atiqur Hamada Mohamed Lossless text compression using GPT-2 language model and Huffman coding SHS Web of Conferences |
author_facet |
Rahman Md. Atiqur Hamada Mohamed |
author_sort |
Rahman Md. Atiqur |
title |
Lossless text compression using GPT-2 language model and Huffman coding |
title_short |
Lossless text compression using GPT-2 language model and Huffman coding |
title_full |
Lossless text compression using GPT-2 language model and Huffman coding |
title_fullStr |
Lossless text compression using GPT-2 language model and Huffman coding |
title_full_unstemmed |
Lossless text compression using GPT-2 language model and Huffman coding |
title_sort |
lossless text compression using gpt-2 language model and huffman coding |
publisher |
EDP Sciences |
series |
SHS Web of Conferences |
issn |
2261-2424 |
publishDate |
2021-01-01 |
description |
Modern daily life activities produced lots of information for the advancement of telecommunication. It is a challenging issue to store them on a digital device or transmit it over the Internet, leading to the necessity for data compression. Thus, research on data compression to solve the issue has become a topic of great interest to researchers. Moreover, the size of compressed data is generally smaller than its original. As a result, data compression saves storage and increases transmission speed. In this article, we propose a text compression technique using GPT-2 language model and Huffman coding. In this proposed method, Burrows-Wheeler transform and a list of keys are used to reduce the original text file’s length. Finally, we apply GPT-2 language mode and then Huffman coding for encoding. This proposed method is compared with the state-of-the-art techniques used for text compression. Finally, we show that the proposed method demonstrates a gain in compression ratio compared to the other state-of-the-art methods. |
url |
https://www.shs-conferences.org/articles/shsconf/pdf/2021/13/shsconf_etltc2021_04013.pdf |
work_keys_str_mv |
AT rahmanmdatiqur losslesstextcompressionusinggpt2languagemodelandhuffmancoding AT hamadamohamed losslesstextcompressionusinggpt2languagemodelandhuffmancoding |
_version_ |
1721478894815543296 |