A streaming algorithms approach to approximating hit rate curves

In this work, we study systems with two levels of memory: a fixed-size cache, and a backing store, each of which contain blocks. In order to serve an IO request, the block must be in the cache. If the block is already in the cache when it is requested, the request is a cache hit. Otherwise it is a c...

Full description

Bibliographic Details
Main Author: Drudi, Zachary
Language:English
Published: University of British Columbia 2014
Online Access:http://hdl.handle.net/2429/50486
id ndltd-UBC-oai-circle.library.ubc.ca-2429-50486
record_format oai_dc
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-504862018-01-05T17:27:44Z A streaming algorithms approach to approximating hit rate curves Drudi, Zachary In this work, we study systems with two levels of memory: a fixed-size cache, and a backing store, each of which contain blocks. In order to serve an IO request, the block must be in the cache. If the block is already in the cache when it is requested, the request is a cache hit. Otherwise it is a cache miss, and the block must be brought into the cache. If the cache is full, a block must be evicted from the cache to make room for the new block. A replacement policy determines which block to evict. In this work, we consider only the LRU policy. An LRU cache evicts the block which was least recently requested. A trace is a sequence of blocks, representing a stream of IO requests. For a given trace, a hit rate curve maps cache sizes to the fraction of hits that such a cache would achieve on the trace. Hit rate curves have been used to design storage systems, partition memory among competing processes, detect phases in a trace, and dynamically adjust heap size in garbage-collected applications. The first algorithm to compute the hit rate curve of a trace over a single pass was given by Mattson et al. in 1970. A long line of work has improved on this initial algorithm. The main contribution of our work is the presentation and formal analysis of two algorithms to approximate hit rate curves. Inspired by recent results in the streaming algorithms community on the distinct elements problem, we use memory efficient probabilistic counters to estimate the number of distinct blocks in a subsequence of the trace, which allows us to approximate the hit rate curve using sublinear space. We also formally state some variants of the hit rate curve approximation problem which our algorithms solve, and derive lower bounds on the space complexity of these problems using tools from communication complexity. Science, Faculty of Computer Science, Department of Graduate 2014-09-30T14:41:43Z 2014-09-30T14:41:43Z 2014 2014-11 Text Thesis/Dissertation http://hdl.handle.net/2429/50486 eng Attribution-NonCommercial-NoDerivs 2.5 Canada http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ University of British Columbia
collection NDLTD
language English
sources NDLTD
description In this work, we study systems with two levels of memory: a fixed-size cache, and a backing store, each of which contain blocks. In order to serve an IO request, the block must be in the cache. If the block is already in the cache when it is requested, the request is a cache hit. Otherwise it is a cache miss, and the block must be brought into the cache. If the cache is full, a block must be evicted from the cache to make room for the new block. A replacement policy determines which block to evict. In this work, we consider only the LRU policy. An LRU cache evicts the block which was least recently requested. A trace is a sequence of blocks, representing a stream of IO requests. For a given trace, a hit rate curve maps cache sizes to the fraction of hits that such a cache would achieve on the trace. Hit rate curves have been used to design storage systems, partition memory among competing processes, detect phases in a trace, and dynamically adjust heap size in garbage-collected applications. The first algorithm to compute the hit rate curve of a trace over a single pass was given by Mattson et al. in 1970. A long line of work has improved on this initial algorithm. The main contribution of our work is the presentation and formal analysis of two algorithms to approximate hit rate curves. Inspired by recent results in the streaming algorithms community on the distinct elements problem, we use memory efficient probabilistic counters to estimate the number of distinct blocks in a subsequence of the trace, which allows us to approximate the hit rate curve using sublinear space. We also formally state some variants of the hit rate curve approximation problem which our algorithms solve, and derive lower bounds on the space complexity of these problems using tools from communication complexity. === Science, Faculty of === Computer Science, Department of === Graduate
author Drudi, Zachary
spellingShingle Drudi, Zachary
A streaming algorithms approach to approximating hit rate curves
author_facet Drudi, Zachary
author_sort Drudi, Zachary
title A streaming algorithms approach to approximating hit rate curves
title_short A streaming algorithms approach to approximating hit rate curves
title_full A streaming algorithms approach to approximating hit rate curves
title_fullStr A streaming algorithms approach to approximating hit rate curves
title_full_unstemmed A streaming algorithms approach to approximating hit rate curves
title_sort streaming algorithms approach to approximating hit rate curves
publisher University of British Columbia
publishDate 2014
url http://hdl.handle.net/2429/50486
work_keys_str_mv AT drudizachary astreamingalgorithmsapproachtoapproximatinghitratecurves
AT drudizachary streamingalgorithmsapproachtoapproximatinghitratecurves
_version_ 1718584455549943808