File compression using probabilistic grammars and LR parsing

Data compression, the reduction in size of the physical representation of data being stored or transmitted, has long been of interest both as a research topic and as a practical technique. Different methods are used for encoding different classes of data files. The purpose of this research is to com...

Full description

Bibliographic Details
Main Author:	Al-Hussaini, Adil M. M.
Published:	Loughborough University 1982
Subjects:	621.3822 Information theory & coding theory
Online Access:	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.331772

id	ndltd-bl.uk-oai-ethos.bl.uk-331772
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-3317722017-10-04T03:27:28ZFile compression using probabilistic grammars and LR parsingAl-Hussaini, Adil M. M.1982Data compression, the reduction in size of the physical representation of data being stored or transmitted, has long been of interest both as a research topic and as a practical technique. Different methods are used for encoding different classes of data files. The purpose of this research is to compress a class of highly redundant data files whose contents are partially described by a context-free grammar (i.e. text files containing computer programs). An encoding technique is developed for the removal of structural dependancy due to the context-free structure of such files. The technique depends on a type of LR parsing method called LALR(K) (Lookahead LRM). The encoder also pays particular attention to the encoding of editing characters, comments, names and constants. The encoded data maintains the exact information content of the original data. Hence, a decoding technique (depending on the same parsing method) is developed to recover the original information from its compressed representation. The technique is demonstrated by compressing Pascal programs. An optimal coding scheme (based on Huffman codes) is used to encode the parsing alternatives in each parsing state. The decoder uses these codes during the decoding phase. Also Huffman codes, based on the probability of the symbols c oncerned, are used when coding editing characterst comments, names and constants. The sizes of the parsing tables (and subsequently the encoding tables) were considerably reduced by splitting them into a number of sub-tables. The minimum and the average code length of the average program are derived from two different matrices. These matrices are constructed from a probabilistic grammar, and the language generated by this grammar. Finally, various comparisons are made with a related encoding method by using a simple context-free language.621.3822Information theory & coding theoryLoughborough Universityhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.331772https://dspace.lboro.ac.uk/2134/7296Electronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
topic	621.3822 Information theory & coding theory
spellingShingle	621.3822 Information theory & coding theory Al-Hussaini, Adil M. M. File compression using probabilistic grammars and LR parsing
description	Data compression, the reduction in size of the physical representation of data being stored or transmitted, has long been of interest both as a research topic and as a practical technique. Different methods are used for encoding different classes of data files. The purpose of this research is to compress a class of highly redundant data files whose contents are partially described by a context-free grammar (i.e. text files containing computer programs). An encoding technique is developed for the removal of structural dependancy due to the context-free structure of such files. The technique depends on a type of LR parsing method called LALR(K) (Lookahead LRM). The encoder also pays particular attention to the encoding of editing characters, comments, names and constants. The encoded data maintains the exact information content of the original data. Hence, a decoding technique (depending on the same parsing method) is developed to recover the original information from its compressed representation. The technique is demonstrated by compressing Pascal programs. An optimal coding scheme (based on Huffman codes) is used to encode the parsing alternatives in each parsing state. The decoder uses these codes during the decoding phase. Also Huffman codes, based on the probability of the symbols c oncerned, are used when coding editing characterst comments, names and constants. The sizes of the parsing tables (and subsequently the encoding tables) were considerably reduced by splitting them into a number of sub-tables. The minimum and the average code length of the average program are derived from two different matrices. These matrices are constructed from a probabilistic grammar, and the language generated by this grammar. Finally, various comparisons are made with a related encoding method by using a simple context-free language.
author	Al-Hussaini, Adil M. M.
author_facet	Al-Hussaini, Adil M. M.
author_sort	Al-Hussaini, Adil M. M.
title	File compression using probabilistic grammars and LR parsing
title_short	File compression using probabilistic grammars and LR parsing
title_full	File compression using probabilistic grammars and LR parsing
title_fullStr	File compression using probabilistic grammars and LR parsing
title_full_unstemmed	File compression using probabilistic grammars and LR parsing
title_sort	file compression using probabilistic grammars and lr parsing
publisher	Loughborough University
publishDate	1982
url	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.331772
work_keys_str_mv	AT alhussainiadilmm filecompressionusingprobabilisticgrammarsandlrparsing
_version_	1718544061383573504

File compression using probabilistic grammars and LR parsing

Similar Items