Adaptive Cache Replacement Policies to Increase Hit Rate of

碩士 === 國立交通大學 === 資訊科學與工程研究所 === 102 === Load-to-use latency is one of the key factors to influence microprocessor performance. Because load instruction usually has long execution latency and take large portion of total instructions at run time. Early execute load instructions is a way to reduce...

Full description

Bibliographic Details
Main Authors: Hsieh, Kai-Chung, 謝凱仲
Other Authors: Chung, Ping-Chung
Format: Others
Language:en_US
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/y2b8ky
id ndltd-TW-102NCTU5394102
record_format oai_dc
spelling ndltd-TW-102NCTU53941022019-05-15T21:50:57Z http://ndltd.ncl.edu.tw/handle/y2b8ky Adaptive Cache Replacement Policies to Increase Hit Rate of 利用適應性的多個快取置換策略來提高一個基址暫存器的快取的命中率 Hsieh, Kai-Chung 謝凱仲 碩士 國立交通大學 資訊科學與工程研究所 102 Load-to-use latency is one of the key factors to influence microprocessor performance. Because load instruction usually has long execution latency and take large portion of total instructions at run time. Early execute load instructions is a way to reduce the load-to-use latency. To early execute load instruction, the beginning issue is to how determine the effective address of the load instruction early. To resolve this issue, several mechanisms use a special storage to keep the values of the registers which are used recently as base or index components of effective address for the effective address calculation. The mechanisms on the calculation can be speculative or non-speculative. To have the benefits of low hardware requirement for the special storage and the non-speculative mechanisms do not need recovering method for effective address calculation, we proposed a new mechanism for early executing load instructions composed of the previous mechanisms on reducing the load-to-use latency The new mechanism uses a small cache to keep the values of the registers used recently as base component of effective address; the cache is called Base Register Cache (BRC). In original mechanism, BRC is managed by the LRU (Least Recently Used) replacement policies. In most of time, LRU performs well, but sometimes LRU will inefficiently use cache space if recencies of most references are greater than the cache size (the references are called far reuses); A solution is to retain some registers long enough in the cache to contribute cache hits, the LFU (Least Frequently Used) replacement policy was proved that performing optimally in this reference characteristic. This thesis focuses on how to make the cooperation between the LRU and the LFU policies for increasing the hit rate of BRC.  We found an efficient technique used on memory cache, called Combined the LRU and LFU Policies (CRFP) which is to adaptively select LRU and LFU policies, and tried to apply it on the BRC. But we observed that CRFP still has the same hit rate as LRU even LFU has a higher hit rate on some benchmarks. So we proposed an analysis method (Mis-matched selection analysis) to find the reasons why the CRFP is not well adapted to the BRC, and also proposed methods to make CRFP be well adapted to BRC. The final evolution of our proposed mechanisms (SCRFP-SC-TagAA) has the highest average hit rate improvement on the baseline 4-entry fully-associate BRC: 1.82% to LRU and 1.1% to CRFP. On individual benchmarks, SCRFP-SC-TagAA also the highest hit rate improvement comparing to each evolution of our proposed mechanisms. Especially in the benchmark group which LRU has low hit rates, SCRFP-SC-TagAA improves the hit rate of the policy which has higher hit rate (LRU or LFU) up to 11.48% and 5.45% on the average hit rate of this benchmark group. We also estimate the effectiveness of our proposed mechanisms in terms of the ratio of the average hit rate and the hardware overhead (cost-performance ratio). The third evolution of our proposed mechanisms (SCRFP) has 5.45% more than CRFP on the cost-performance ratio. Chung, Ping-Chung 鍾崇斌 2013 學位論文 ; thesis 42 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立交通大學 === 資訊科學與工程研究所 === 102 === Load-to-use latency is one of the key factors to influence microprocessor performance. Because load instruction usually has long execution latency and take large portion of total instructions at run time. Early execute load instructions is a way to reduce the load-to-use latency. To early execute load instruction, the beginning issue is to how determine the effective address of the load instruction early. To resolve this issue, several mechanisms use a special storage to keep the values of the registers which are used recently as base or index components of effective address for the effective address calculation. The mechanisms on the calculation can be speculative or non-speculative. To have the benefits of low hardware requirement for the special storage and the non-speculative mechanisms do not need recovering method for effective address calculation, we proposed a new mechanism for early executing load instructions composed of the previous mechanisms on reducing the load-to-use latency The new mechanism uses a small cache to keep the values of the registers used recently as base component of effective address; the cache is called Base Register Cache (BRC). In original mechanism, BRC is managed by the LRU (Least Recently Used) replacement policies. In most of time, LRU performs well, but sometimes LRU will inefficiently use cache space if recencies of most references are greater than the cache size (the references are called far reuses); A solution is to retain some registers long enough in the cache to contribute cache hits, the LFU (Least Frequently Used) replacement policy was proved that performing optimally in this reference characteristic. This thesis focuses on how to make the cooperation between the LRU and the LFU policies for increasing the hit rate of BRC.  We found an efficient technique used on memory cache, called Combined the LRU and LFU Policies (CRFP) which is to adaptively select LRU and LFU policies, and tried to apply it on the BRC. But we observed that CRFP still has the same hit rate as LRU even LFU has a higher hit rate on some benchmarks. So we proposed an analysis method (Mis-matched selection analysis) to find the reasons why the CRFP is not well adapted to the BRC, and also proposed methods to make CRFP be well adapted to BRC. The final evolution of our proposed mechanisms (SCRFP-SC-TagAA) has the highest average hit rate improvement on the baseline 4-entry fully-associate BRC: 1.82% to LRU and 1.1% to CRFP. On individual benchmarks, SCRFP-SC-TagAA also the highest hit rate improvement comparing to each evolution of our proposed mechanisms. Especially in the benchmark group which LRU has low hit rates, SCRFP-SC-TagAA improves the hit rate of the policy which has higher hit rate (LRU or LFU) up to 11.48% and 5.45% on the average hit rate of this benchmark group. We also estimate the effectiveness of our proposed mechanisms in terms of the ratio of the average hit rate and the hardware overhead (cost-performance ratio). The third evolution of our proposed mechanisms (SCRFP) has 5.45% more than CRFP on the cost-performance ratio.
author2 Chung, Ping-Chung
author_facet Chung, Ping-Chung
Hsieh, Kai-Chung
謝凱仲
author Hsieh, Kai-Chung
謝凱仲
spellingShingle Hsieh, Kai-Chung
謝凱仲
Adaptive Cache Replacement Policies to Increase Hit Rate of
author_sort Hsieh, Kai-Chung
title Adaptive Cache Replacement Policies to Increase Hit Rate of
title_short Adaptive Cache Replacement Policies to Increase Hit Rate of
title_full Adaptive Cache Replacement Policies to Increase Hit Rate of
title_fullStr Adaptive Cache Replacement Policies to Increase Hit Rate of
title_full_unstemmed Adaptive Cache Replacement Policies to Increase Hit Rate of
title_sort adaptive cache replacement policies to increase hit rate of
publishDate 2013
url http://ndltd.ncl.edu.tw/handle/y2b8ky
work_keys_str_mv AT hsiehkaichung adaptivecachereplacementpoliciestoincreasehitrateof
AT xièkǎizhòng adaptivecachereplacementpoliciestoincreasehitrateof
AT hsiehkaichung lìyòngshìyīngxìngdeduōgèkuàiqǔzhìhuàncèlüèláitígāoyīgèjīzhǐzàncúnqìdekuàiqǔdemìngzhōnglǜ
AT xièkǎizhòng lìyòngshìyīngxìngdeduōgèkuàiqǔzhìhuàncèlüèláitígāoyīgèjīzhǐzàncúnqìdekuàiqǔdemìngzhōnglǜ
_version_ 1719119861908504576