Cooperative hardware/software caching for next-generation memory systems

The memory system remains a major performance bottleneck in modern and future architectures. In this dissertation, we propose a hardware/software cooperative approach and demonstrate its effectiveness. This approach combines the global yet imperfect view of the compiler with the timely yet narrow-sc...

Full description

Bibliographic Details
Main Author: Wang, Zhenlin
Language:ENG
Published: ScholarWorks@UMass Amherst 2004
Subjects:
Online Access:https://scholarworks.umass.edu/dissertations/AAI3118338
id ndltd-UMASS-oai-scholarworks.umass.edu-dissertations-3900
record_format oai_dc
spelling ndltd-UMASS-oai-scholarworks.umass.edu-dissertations-39002020-12-02T14:31:05Z Cooperative hardware/software caching for next-generation memory systems Wang, Zhenlin The memory system remains a major performance bottleneck in modern and future architectures. In this dissertation, we propose a hardware/software cooperative approach and demonstrate its effectiveness. This approach combines the global yet imperfect view of the compiler with the timely yet narrow-scope context of the hardware. It relies on a light-weight extension to the instruction set architecture to convey compile-time knowledge (hints) to the hardware. The hardware then uses these hints to make better decisions. Our work shows that a cooperative hardware/software approach to (1) cache replacement, (2) prefetching, and (3) their combination eliminates or tolerates much of the memory performance bottleneck. (1) Our work enhances cache replacement decisions using compiler hints. The compiler detects which data will or will not be reused and annotates loads accordingly. The compiler sets one bit (the evict-me bit) to denote a preferred eviction candidate. On a miss, the cache replacement algorithm preferentially replaces a cache line with its evict-me bit set. Otherwise, it follows the LRU policy. The evict-me replacement scheme improves cache replacement decisions and is effective in both L1 and L2 caches. (2) We also use compiler hints to direct aggressive hardware region prefetching and content-aware pointer prefetching. The original SRP (scheduled region prefetching) engine queues prefetching requests on every outstanding L2 miss and tolerates latencies at the cost of dramatically increasing the memory traffic. GRP (guided region prefetching) enhances SRP by restricting prefetching to compiler-marked loads. Our compiler algorithms effectively mark spatial reuses across the SPEC CPU2000 benchmarks, and thus GRP achieves the performance of SRP with only one eighth of the additional traffic. (3) The evict-me cache replacement scheme helps alleviate the side effects of cache pollution introduced by useless region prefetches. The combination of evict-me caching and region prefetching further improves cache performance. These results demonstrate significant promise for overcoming the memory bottleneck with cooperative hardware/software techniques. 2004-01-01T08:00:00Z text https://scholarworks.umass.edu/dissertations/AAI3118338 Doctoral Dissertations Available from Proquest ENG ScholarWorks@UMass Amherst Computer science
collection NDLTD
language ENG
sources NDLTD
topic Computer science
spellingShingle Computer science
Wang, Zhenlin
Cooperative hardware/software caching for next-generation memory systems
description The memory system remains a major performance bottleneck in modern and future architectures. In this dissertation, we propose a hardware/software cooperative approach and demonstrate its effectiveness. This approach combines the global yet imperfect view of the compiler with the timely yet narrow-scope context of the hardware. It relies on a light-weight extension to the instruction set architecture to convey compile-time knowledge (hints) to the hardware. The hardware then uses these hints to make better decisions. Our work shows that a cooperative hardware/software approach to (1) cache replacement, (2) prefetching, and (3) their combination eliminates or tolerates much of the memory performance bottleneck. (1) Our work enhances cache replacement decisions using compiler hints. The compiler detects which data will or will not be reused and annotates loads accordingly. The compiler sets one bit (the evict-me bit) to denote a preferred eviction candidate. On a miss, the cache replacement algorithm preferentially replaces a cache line with its evict-me bit set. Otherwise, it follows the LRU policy. The evict-me replacement scheme improves cache replacement decisions and is effective in both L1 and L2 caches. (2) We also use compiler hints to direct aggressive hardware region prefetching and content-aware pointer prefetching. The original SRP (scheduled region prefetching) engine queues prefetching requests on every outstanding L2 miss and tolerates latencies at the cost of dramatically increasing the memory traffic. GRP (guided region prefetching) enhances SRP by restricting prefetching to compiler-marked loads. Our compiler algorithms effectively mark spatial reuses across the SPEC CPU2000 benchmarks, and thus GRP achieves the performance of SRP with only one eighth of the additional traffic. (3) The evict-me cache replacement scheme helps alleviate the side effects of cache pollution introduced by useless region prefetches. The combination of evict-me caching and region prefetching further improves cache performance. These results demonstrate significant promise for overcoming the memory bottleneck with cooperative hardware/software techniques.
author Wang, Zhenlin
author_facet Wang, Zhenlin
author_sort Wang, Zhenlin
title Cooperative hardware/software caching for next-generation memory systems
title_short Cooperative hardware/software caching for next-generation memory systems
title_full Cooperative hardware/software caching for next-generation memory systems
title_fullStr Cooperative hardware/software caching for next-generation memory systems
title_full_unstemmed Cooperative hardware/software caching for next-generation memory systems
title_sort cooperative hardware/software caching for next-generation memory systems
publisher ScholarWorks@UMass Amherst
publishDate 2004
url https://scholarworks.umass.edu/dissertations/AAI3118338
work_keys_str_mv AT wangzhenlin cooperativehardwaresoftwarecachingfornextgenerationmemorysystems
_version_ 1719364132513251328