Engineering compact dynamic data structures and in-memory data mining
Compact and succinct data structures use space that approaches the information-theoretic lower bound on the space that is required to represent the data. In practice, their memory footprint is orders of magnitude smaller than normal data structures and at the same time they are competitive in speed....
Main Author: | |
---|---|
Other Authors: | |
Published: |
University of Leicester
2018
|
Subjects: | |
Online Access: | https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.745834 |
id |
ndltd-bl.uk-oai-ethos.bl.uk-745834 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-bl.uk-oai-ethos.bl.uk-7458342019-03-05T15:47:05ZEngineering compact dynamic data structures and in-memory data miningPoyias, AndreasRaman, Rajeev ; Fung, Stanley2018Compact and succinct data structures use space that approaches the information-theoretic lower bound on the space that is required to represent the data. In practice, their memory footprint is orders of magnitude smaller than normal data structures and at the same time they are competitive in speed. A main drawback with many of these data structures is that they do not support dynamic operations efficiently. It can be exceedingly expensive to rebuild a static data structure each time an update occurs. In this thesis, we propose a number of novel compact dynamic data structures including m-Bonsai, which is a compact tree representation, compact dynamic rewritable (CDRW) arrays which is a compact representation of variable-length bit-strings. These data structures can answer queries efficiently, perform updates fast while they maintain their small memory footprint. In addition to the designing of these data structures, we analyze them theoretically, we implement them and finally test them to show their good practical performance. Many data mining algorithms require data structures that can query and dynamically update data in memory. One such algorithm is FP-growth. It is one of the fastest algorithms for the solution of Frequent Itemset Mining, which is one of the most fundamental problems in data mining. FP-growth reads the entire data in memory, updates the data structures in memory and performs a series of queries on the given data. We propose a compact implementation for the FP-growth algorithm, the PFP-growth. Based on our experimental evaluation, our implementation is one order of magnitude more space efficient compared to the classic implementation of FP-growth and 2 - 3 times compared to a more recent carefully engineered implementation. At the same time it is competitive in terms of speed.004University of Leicesterhttps://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.745834http://hdl.handle.net/2381/42282Electronic Thesis or Dissertation |
collection |
NDLTD |
sources |
NDLTD |
topic |
004 |
spellingShingle |
004 Poyias, Andreas Engineering compact dynamic data structures and in-memory data mining |
description |
Compact and succinct data structures use space that approaches the information-theoretic lower bound on the space that is required to represent the data. In practice, their memory footprint is orders of magnitude smaller than normal data structures and at the same time they are competitive in speed. A main drawback with many of these data structures is that they do not support dynamic operations efficiently. It can be exceedingly expensive to rebuild a static data structure each time an update occurs. In this thesis, we propose a number of novel compact dynamic data structures including m-Bonsai, which is a compact tree representation, compact dynamic rewritable (CDRW) arrays which is a compact representation of variable-length bit-strings. These data structures can answer queries efficiently, perform updates fast while they maintain their small memory footprint. In addition to the designing of these data structures, we analyze them theoretically, we implement them and finally test them to show their good practical performance. Many data mining algorithms require data structures that can query and dynamically update data in memory. One such algorithm is FP-growth. It is one of the fastest algorithms for the solution of Frequent Itemset Mining, which is one of the most fundamental problems in data mining. FP-growth reads the entire data in memory, updates the data structures in memory and performs a series of queries on the given data. We propose a compact implementation for the FP-growth algorithm, the PFP-growth. Based on our experimental evaluation, our implementation is one order of magnitude more space efficient compared to the classic implementation of FP-growth and 2 - 3 times compared to a more recent carefully engineered implementation. At the same time it is competitive in terms of speed. |
author2 |
Raman, Rajeev ; Fung, Stanley |
author_facet |
Raman, Rajeev ; Fung, Stanley Poyias, Andreas |
author |
Poyias, Andreas |
author_sort |
Poyias, Andreas |
title |
Engineering compact dynamic data structures and in-memory data mining |
title_short |
Engineering compact dynamic data structures and in-memory data mining |
title_full |
Engineering compact dynamic data structures and in-memory data mining |
title_fullStr |
Engineering compact dynamic data structures and in-memory data mining |
title_full_unstemmed |
Engineering compact dynamic data structures and in-memory data mining |
title_sort |
engineering compact dynamic data structures and in-memory data mining |
publisher |
University of Leicester |
publishDate |
2018 |
url |
https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.745834 |
work_keys_str_mv |
AT poyiasandreas engineeringcompactdynamicdatastructuresandinmemorydatamining |
_version_ |
1718997256916434944 |