Database Streaming Compression on Memory-Limited Machines

Dynamic Huffman compression algorithms operate on data-streams with a bounded symbol list. With these algorithms, the complete list of symbols must be contained in main memory or secondary storage. A horizontal format transaction database that is streaming can have a very large item list. Many nodes...

Full description

Bibliographic Details
Main Author: Bruccoleri, Damon F.
Format: Others
Published: NSUWorks 2018
Subjects:
Online Access:https://nsuworks.nova.edu/gscis_etd/1031
https://nsuworks.nova.edu/cgi/viewcontent.cgi?article=2032&context=gscis_etd
Description
Summary:Dynamic Huffman compression algorithms operate on data-streams with a bounded symbol list. With these algorithms, the complete list of symbols must be contained in main memory or secondary storage. A horizontal format transaction database that is streaming can have a very large item list. Many nodes tax both the processing hardware primary memory size, and the processing time to dynamically maintain the tree. This research investigated Huffman compression of a transaction-streaming database with a very large symbol list, where each item in the transaction database schema’s item list is a symbol to compress. The constraint of a large symbol list is, in this research, equivalent to the constraint of a memory-limited machine. A large symbol set will result if each item in a large database item list is a symbol to compress in a database stream. In addition, database streams may have some temporal component spanning months or years. Finally, the horizontal format is the format most suited to a streaming transaction database because the transaction IDs are not known beforehand This research prototypes an algorithm that will compresses a transaction database stream. There are several advantages to the memory limited dynamic Huffman algorithm. Dynamic Huffman algorithms are single pass algorithms. In many instances a second pass over the data is not possible, such as with streaming databases. Previous dynamic Huffman algorithms are not memory limited, they are asymptotic to O(n), where n is the number of distinct item IDs. Memory is required to grow to fit the n items. The improvement of the new memory limited Dynamic Huffman algorithm is that it would have an O(k) asymptotic memory requirement; where k is the maximum number of nodes in the Huffman tree, k < n, and k is a user chosen constant. The new memory limited Dynamic Huffman algorithm compresses horizontally encoded transaction databases that do not contain long runs of 0’s or 1’s.