LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTS

Data compressions for different world languages including Indian languages are in high need and demand. Tamil language is one of the longest-surviving classical languages in the world. Usage of Tamil language for communication and storage was increased due to the digitization of government documents...

Full description

Bibliographic Details
Main Authors: B Vijayalakshmi, N Sasirekha
Format: Article
Language:English
Published: ICT Academy of Tamil Nadu 2018-01-01
Series:ICTACT Journal on Soft Computing
Subjects:
Online Access:http://ictactjournals.in/ArticleDetails.aspx?id=3293
id doaj-2a2f554c37114b27a5f42962ac71292e
record_format Article
spelling doaj-2a2f554c37114b27a5f42962ac71292e2020-11-25T00:22:23ZengICT Academy of Tamil NaduICTACT Journal on Soft Computing0976-65612229-69562018-01-01821635164010.21917/ijsc.2018.0227LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTSB Vijayalakshmi0N Sasirekha1Vidyasagar College of Arts and Science, IndiaVidyasagar College of Arts and Science, IndiaData compressions for different world languages including Indian languages are in high need and demand. Tamil language is one of the longest-surviving classical languages in the world. Usage of Tamil language for communication and storage was increased due to the digitization of government documents and orders. Lossless text compression process for Tamil language document involves substituting an ASCII character in place of Unicode Tamil characters, since the size of an ASCII character is one byte where as a Unicode character size range between 1 byte to 4 bytes depends on the encoding file storage type. The decompression process involves the reverse of compression technique (i.e) replacing ASCII characters with Unicode characters. This paper describes about the architecture of compression and decompression process for Tamil text documents.http://ictactjournals.in/ArticleDetails.aspx?id=3293CompressionDecompressionUnicodeASCII and Substitution
collection DOAJ
language English
format Article
sources DOAJ
author B Vijayalakshmi
N Sasirekha
spellingShingle B Vijayalakshmi
N Sasirekha
LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTS
ICTACT Journal on Soft Computing
Compression
Decompression
Unicode
ASCII and Substitution
author_facet B Vijayalakshmi
N Sasirekha
author_sort B Vijayalakshmi
title LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTS
title_short LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTS
title_full LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTS
title_fullStr LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTS
title_full_unstemmed LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTS
title_sort lossless text compression for unicode tamil documents
publisher ICT Academy of Tamil Nadu
series ICTACT Journal on Soft Computing
issn 0976-6561
2229-6956
publishDate 2018-01-01
description Data compressions for different world languages including Indian languages are in high need and demand. Tamil language is one of the longest-surviving classical languages in the world. Usage of Tamil language for communication and storage was increased due to the digitization of government documents and orders. Lossless text compression process for Tamil language document involves substituting an ASCII character in place of Unicode Tamil characters, since the size of an ASCII character is one byte where as a Unicode character size range between 1 byte to 4 bytes depends on the encoding file storage type. The decompression process involves the reverse of compression technique (i.e) replacing ASCII characters with Unicode characters. This paper describes about the architecture of compression and decompression process for Tamil text documents.
topic Compression
Decompression
Unicode
ASCII and Substitution
url http://ictactjournals.in/ArticleDetails.aspx?id=3293
work_keys_str_mv AT bvijayalakshmi losslesstextcompressionforunicodetamildocuments
AT nsasirekha losslesstextcompressionforunicodetamildocuments
_version_ 1725360115536625664