Summary: | 碩士 === 國立交通大學 === 資訊科學與工程研究所 === 102 === Load-to-use latency is one of the key factors to influence microprocessor performance. Because load instruction usually has long execution latency and take large portion of total instructions at run time. Early execute load instructions is a way to reduce the load-to-use latency. To early execute load instruction, the beginning issue is to how determine the effective address of the load instruction early. To resolve this issue, several mechanisms use a special storage to keep the values of the registers which are used recently as base or index components of effective address for the effective address calculation. The mechanisms on the calculation can be speculative or non-speculative. To have the benefits of low hardware requirement for the special storage and the non-speculative mechanisms do not need recovering method for effective address calculation, we proposed a new mechanism for early executing load instructions composed of the previous mechanisms on reducing the load-to-use latency
The new mechanism uses a small cache to keep the values of the registers used recently as base component of effective address; the cache is called Base Register Cache (BRC). In original mechanism, BRC is managed by the LRU (Least Recently Used) replacement policies. In most of time, LRU performs well, but sometimes LRU will inefficiently use cache space if recencies of most references are greater than the cache size (the references are called far reuses); A solution is to retain some registers long enough in the cache to contribute cache hits, the LFU (Least Frequently Used) replacement policy was proved that performing optimally in this reference characteristic. This thesis focuses on how to make the cooperation between the LRU and the LFU policies for increasing the hit rate of BRC.
We found an efficient technique used on memory cache, called Combined the LRU and LFU Policies (CRFP) which is to adaptively select LRU and LFU policies, and tried to apply it on the BRC. But we observed that CRFP still has the same hit rate as LRU even LFU has a higher hit rate on some benchmarks. So we proposed an analysis method (Mis-matched selection analysis) to find the reasons why the CRFP is not well adapted to the BRC, and also proposed methods to make CRFP be well adapted to BRC.
The final evolution of our proposed mechanisms (SCRFP-SC-TagAA) has the highest average hit rate improvement on the baseline 4-entry fully-associate BRC: 1.82% to LRU and 1.1% to CRFP. On individual benchmarks, SCRFP-SC-TagAA also the highest hit rate improvement comparing to each evolution of our proposed mechanisms. Especially in the benchmark group which LRU has low hit rates, SCRFP-SC-TagAA improves the hit rate of the policy which has higher hit rate (LRU or LFU) up to 11.48% and 5.45% on the average hit rate of this benchmark group.
We also estimate the effectiveness of our proposed mechanisms in terms of the ratio of the average hit rate and the hardware overhead (cost-performance ratio). The third evolution of our proposed mechanisms (SCRFP) has 5.45% more than CRFP on the cost-performance ratio.
|