Predictor Virtualization: Teaching Old Caches New Tricks

To improve application performance, current processors rely on prediction-based hardware optimizations, such as data prefetching and branch prediction. These hardware optimizations store application metadata in on-chip predictor tables and use the metadata to anticipate and optimize for future appli...

Full description

Bibliographic Details
Main Author: Burcea, Ioana Monica
Other Authors: Moshovos, Andreas
Language:en_ca
Published: 2012
Subjects:
Online Access:http://hdl.handle.net/1807/32674
id ndltd-TORONTO-oai-tspace.library.utoronto.ca-1807-32674
record_format oai_dc
spelling ndltd-TORONTO-oai-tspace.library.utoronto.ca-1807-326742013-04-19T19:57:49ZPredictor Virtualization: Teaching Old Caches New TricksBurcea, Ioana Monicapredictor virtualizationhardware optimizationsprocessor caches09840544To improve application performance, current processors rely on prediction-based hardware optimizations, such as data prefetching and branch prediction. These hardware optimizations store application metadata in on-chip predictor tables and use the metadata to anticipate and optimize for future application behavior. As application footprints grow, the predictor tables need to scale for predictors to remain effective. One important challenge in processor design is to decide which hardware optimizations to implement and how much resources to dedicate to a specific optimization. Traditionally, processor architects employ a one-size-fits-all approach when designing predictor-based hardware optimizations: for each optimization, a fixed portion of the on-chip resources is allocated to the predictor storage. This approach often leads to sub-optimal designs where: 1) resources are wasted for applications that do not benefit from a particular predictor or require only small predictor tables, or 2) predictors under-perform for applications that need larger predictor tables that can not be built due to area-latency-power constraints. This thesis introduces Predictor Virtualization (PV), a framework that uses the traditional processor memory hierarchy to store application metadata used in speculative hardware optimizations. This allows to emulate large, more accurate predictor tables, which, in return, leads to higher application performance. PV exploits the current trend of unprecedentedly large on- chip secondary caches and allocates on demand a small portion of the cache capacity to store application metadata used in hardware optimizations, adjusting to the application’s need for predictor resources. As a consequence, PV is a pay-as-you-go technique that emulates large predictor tables without increasing the dedicated storage overhead. To demonstrate the benefits of virtualizing hardware predictors, we present virtualized designs for three different hardware optimizations: a state-of-the-art data prefetcher, conventional branch target buffers and an object-pointer prefetcher. While each of these hardware predictors exhibit different characteristics that lead to different virtualized designs, virtualization improves the cost-performance trade-off for all these optimizations. PV increases the utility of traditional processor caches: in addition to being accelerators for slow off-chip memories, on-chip caches are leveraged for increasing the effectiveness of predictor-based hardware optimizations.Moshovos, Andreas2012-062012-08-20T18:33:45ZNO_RESTRICTION2012-08-20T18:33:45Z2012-08-20Thesishttp://hdl.handle.net/1807/32674en_ca
collection NDLTD
language en_ca
sources NDLTD
topic predictor virtualization
hardware optimizations
processor caches
0984
0544
spellingShingle predictor virtualization
hardware optimizations
processor caches
0984
0544
Burcea, Ioana Monica
Predictor Virtualization: Teaching Old Caches New Tricks
description To improve application performance, current processors rely on prediction-based hardware optimizations, such as data prefetching and branch prediction. These hardware optimizations store application metadata in on-chip predictor tables and use the metadata to anticipate and optimize for future application behavior. As application footprints grow, the predictor tables need to scale for predictors to remain effective. One important challenge in processor design is to decide which hardware optimizations to implement and how much resources to dedicate to a specific optimization. Traditionally, processor architects employ a one-size-fits-all approach when designing predictor-based hardware optimizations: for each optimization, a fixed portion of the on-chip resources is allocated to the predictor storage. This approach often leads to sub-optimal designs where: 1) resources are wasted for applications that do not benefit from a particular predictor or require only small predictor tables, or 2) predictors under-perform for applications that need larger predictor tables that can not be built due to area-latency-power constraints. This thesis introduces Predictor Virtualization (PV), a framework that uses the traditional processor memory hierarchy to store application metadata used in speculative hardware optimizations. This allows to emulate large, more accurate predictor tables, which, in return, leads to higher application performance. PV exploits the current trend of unprecedentedly large on- chip secondary caches and allocates on demand a small portion of the cache capacity to store application metadata used in hardware optimizations, adjusting to the application’s need for predictor resources. As a consequence, PV is a pay-as-you-go technique that emulates large predictor tables without increasing the dedicated storage overhead. To demonstrate the benefits of virtualizing hardware predictors, we present virtualized designs for three different hardware optimizations: a state-of-the-art data prefetcher, conventional branch target buffers and an object-pointer prefetcher. While each of these hardware predictors exhibit different characteristics that lead to different virtualized designs, virtualization improves the cost-performance trade-off for all these optimizations. PV increases the utility of traditional processor caches: in addition to being accelerators for slow off-chip memories, on-chip caches are leveraged for increasing the effectiveness of predictor-based hardware optimizations.
author2 Moshovos, Andreas
author_facet Moshovos, Andreas
Burcea, Ioana Monica
author Burcea, Ioana Monica
author_sort Burcea, Ioana Monica
title Predictor Virtualization: Teaching Old Caches New Tricks
title_short Predictor Virtualization: Teaching Old Caches New Tricks
title_full Predictor Virtualization: Teaching Old Caches New Tricks
title_fullStr Predictor Virtualization: Teaching Old Caches New Tricks
title_full_unstemmed Predictor Virtualization: Teaching Old Caches New Tricks
title_sort predictor virtualization: teaching old caches new tricks
publishDate 2012
url http://hdl.handle.net/1807/32674
work_keys_str_mv AT burceaioanamonica predictorvirtualizationteachingoldcachesnewtricks
_version_ 1716582185443524608