Dynamic Hot Data Identification Using a Stack Distance Approximation

Though various applications such as flash memory, cache, storage systems, and even indexing for enterprise big data search, adopt hot data identification schemes, relatively little research has been expended into holistically examining alternative strategies. Rather, researchers tend to classify hot...

Full description

Bibliographic Details
Main Authors: Hyeonji Ha, Daeun Shim, Hyeyin Lee, Dongchul Park
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
SSD
Online Access:https://ieeexplore.ieee.org/document/9443093/
id doaj-4816ec01554c4e8790c39101e6330ddf
record_format Article
spelling doaj-4816ec01554c4e8790c39101e6330ddf2021-06-07T23:00:51ZengIEEEIEEE Access2169-35362021-01-019798897990310.1109/ACCESS.2021.30848519443093Dynamic Hot Data Identification Using a Stack Distance ApproximationHyeonji Ha0Daeun Shim1Hyeyin Lee2Dongchul Park3https://orcid.org/0000-0001-6553-7448Department of Software, Sookmyung Women&#x2019;s University, Seoul, South KoreaDepartment of Software, Sookmyung Women&#x2019;s University, Seoul, South KoreaDepartment of Software, Sookmyung Women&#x2019;s University, Seoul, South KoreaDepartment of Software, Sookmyung Women&#x2019;s University, Seoul, South KoreaThough various applications such as flash memory, cache, storage systems, and even indexing for enterprise big data search, adopt hot data identification schemes, relatively little research has been expended into holistically examining alternative strategies. Rather, researchers tend to classify hot data simplistically by considering one or more frequency metrics, thereby disregarding recency, which is also an important consideration. In practice, different workloads mandate different treatment to achieve effective hot data decisions. This paper proposes a <italic>dynamic</italic> hot data identification scheme that adopts a workload stack distance approximation. Stack distance is a good recency measure, but it traditionally requires high computational complexity as well as additional space. Since stack distance calculation efficiency is a core component for our dynamic feature design, this paper additionally proposes a stack distance approximation algorithm that significantly reduces both computation and space requirements. To our knowledge, the proposed scheme is the first <italic>dynamic</italic> hot data identification scheme which judiciously assigns more weight to either recency or frequency based on workload characteristics. Our experiments with diverse realistic workloads demonstrate that our stack distance approximation achieves excellent accuracy (up to a 0.1&#x0025; error rate) and our dynamic scheme improves performance by as much as 49.8&#x0025;.https://ieeexplore.ieee.org/document/9443093/Bloom filterflash memoryhot datahot data identificationSSDstack distance
collection DOAJ
language English
format Article
sources DOAJ
author Hyeonji Ha
Daeun Shim
Hyeyin Lee
Dongchul Park
spellingShingle Hyeonji Ha
Daeun Shim
Hyeyin Lee
Dongchul Park
Dynamic Hot Data Identification Using a Stack Distance Approximation
IEEE Access
Bloom filter
flash memory
hot data
hot data identification
SSD
stack distance
author_facet Hyeonji Ha
Daeun Shim
Hyeyin Lee
Dongchul Park
author_sort Hyeonji Ha
title Dynamic Hot Data Identification Using a Stack Distance Approximation
title_short Dynamic Hot Data Identification Using a Stack Distance Approximation
title_full Dynamic Hot Data Identification Using a Stack Distance Approximation
title_fullStr Dynamic Hot Data Identification Using a Stack Distance Approximation
title_full_unstemmed Dynamic Hot Data Identification Using a Stack Distance Approximation
title_sort dynamic hot data identification using a stack distance approximation
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Though various applications such as flash memory, cache, storage systems, and even indexing for enterprise big data search, adopt hot data identification schemes, relatively little research has been expended into holistically examining alternative strategies. Rather, researchers tend to classify hot data simplistically by considering one or more frequency metrics, thereby disregarding recency, which is also an important consideration. In practice, different workloads mandate different treatment to achieve effective hot data decisions. This paper proposes a <italic>dynamic</italic> hot data identification scheme that adopts a workload stack distance approximation. Stack distance is a good recency measure, but it traditionally requires high computational complexity as well as additional space. Since stack distance calculation efficiency is a core component for our dynamic feature design, this paper additionally proposes a stack distance approximation algorithm that significantly reduces both computation and space requirements. To our knowledge, the proposed scheme is the first <italic>dynamic</italic> hot data identification scheme which judiciously assigns more weight to either recency or frequency based on workload characteristics. Our experiments with diverse realistic workloads demonstrate that our stack distance approximation achieves excellent accuracy (up to a 0.1&#x0025; error rate) and our dynamic scheme improves performance by as much as 49.8&#x0025;.
topic Bloom filter
flash memory
hot data
hot data identification
SSD
stack distance
url https://ieeexplore.ieee.org/document/9443093/
work_keys_str_mv AT hyeonjiha dynamichotdataidentificationusingastackdistanceapproximation
AT daeunshim dynamichotdataidentificationusingastackdistanceapproximation
AT hyeyinlee dynamichotdataidentificationusingastackdistanceapproximation
AT dongchulpark dynamichotdataidentificationusingastackdistanceapproximation
_version_ 1721391122726518784