LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTS
Data compressions for different world languages including Indian languages are in high need and demand. Tamil language is one of the longest-surviving classical languages in the world. Usage of Tamil language for communication and storage was increased due to the digitization of government documents...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
ICT Academy of Tamil Nadu
2018-01-01
|
Series: | ICTACT Journal on Soft Computing |
Subjects: | |
Online Access: | http://ictactjournals.in/ArticleDetails.aspx?id=3293 |
id |
doaj-2a2f554c37114b27a5f42962ac71292e |
---|---|
record_format |
Article |
spelling |
doaj-2a2f554c37114b27a5f42962ac71292e2020-11-25T00:22:23ZengICT Academy of Tamil NaduICTACT Journal on Soft Computing0976-65612229-69562018-01-01821635164010.21917/ijsc.2018.0227LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTSB Vijayalakshmi0N Sasirekha1Vidyasagar College of Arts and Science, IndiaVidyasagar College of Arts and Science, IndiaData compressions for different world languages including Indian languages are in high need and demand. Tamil language is one of the longest-surviving classical languages in the world. Usage of Tamil language for communication and storage was increased due to the digitization of government documents and orders. Lossless text compression process for Tamil language document involves substituting an ASCII character in place of Unicode Tamil characters, since the size of an ASCII character is one byte where as a Unicode character size range between 1 byte to 4 bytes depends on the encoding file storage type. The decompression process involves the reverse of compression technique (i.e) replacing ASCII characters with Unicode characters. This paper describes about the architecture of compression and decompression process for Tamil text documents.http://ictactjournals.in/ArticleDetails.aspx?id=3293CompressionDecompressionUnicodeASCII and Substitution |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
B Vijayalakshmi N Sasirekha |
spellingShingle |
B Vijayalakshmi N Sasirekha LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTS ICTACT Journal on Soft Computing Compression Decompression Unicode ASCII and Substitution |
author_facet |
B Vijayalakshmi N Sasirekha |
author_sort |
B Vijayalakshmi |
title |
LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTS |
title_short |
LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTS |
title_full |
LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTS |
title_fullStr |
LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTS |
title_full_unstemmed |
LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTS |
title_sort |
lossless text compression for unicode tamil documents |
publisher |
ICT Academy of Tamil Nadu |
series |
ICTACT Journal on Soft Computing |
issn |
0976-6561 2229-6956 |
publishDate |
2018-01-01 |
description |
Data compressions for different world languages including Indian languages are in high need and demand. Tamil language is one of the longest-surviving classical languages in the world. Usage of Tamil language for communication and storage was increased due to the digitization of government documents and orders. Lossless text compression process for Tamil language document involves substituting an ASCII character in place of Unicode Tamil characters, since the size of an ASCII character is one byte where as a Unicode character size range between 1 byte to 4 bytes depends on the encoding file storage type. The decompression process involves the reverse of compression technique (i.e) replacing ASCII characters with Unicode characters. This paper describes about the architecture of compression and decompression process for Tamil text documents. |
topic |
Compression Decompression Unicode ASCII and Substitution |
url |
http://ictactjournals.in/ArticleDetails.aspx?id=3293 |
work_keys_str_mv |
AT bvijayalakshmi losslesstextcompressionforunicodetamildocuments AT nsasirekha losslesstextcompressionforunicodetamildocuments |
_version_ |
1725360115536625664 |