SALSC: Set-Associative Load/Store Caches

碩士 === 國立臺灣科技大學 === 資訊工程系 === 98 === The conventional load/store queue (LSQ) is a CAM structure where a dynamically-scheduled processor stores all in-flight memory instructions and conducts fully associative, ordering-logic searches to maintain dependencies and perform forwarding. LSQ is neither eff...

Full description

Bibliographic Details
Main Authors: Dong-Hua Wu, 吳東樺
Other Authors: Yuan-Shin Hwang
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/72199186710102839427
Description
Summary:碩士 === 國立臺灣科技大學 === 資訊工程系 === 98 === The conventional load/store queue (LSQ) is a CAM structure where a dynamically-scheduled processor stores all in-flight memory instructions and conducts fully associative, ordering-logic searches to maintain dependencies and perform forwarding. LSQ is neither efficient since previous studies have shown that dependency violations are infrequent, nor scalable due to the complexity of the CAM. This paper presents an efficient and scalable alternative to the LSQ, called the set-associative load/store cache (SALSC), that replaces the CAM with a set-associative tag array. It is analogous to substituting a set-associative cache for a fully associative cache, since the tag bit cell of a fully-associative array is a CAM. As it has been observed that set-associative caches can significantly reduce tag comparisons while approximating the miss rates of fully associative caches, SALSC can substantially lessen the search bandwidth demand without incurring noticeable performance degradation due to stalls caused by set conflicts. Furthermore, an SALSC can be viewed as a set-associative cache integrated with an age logic, and hence it is a natural and straightforward extension to treat an SALSC as an L0 cache by buffering data of memory references in the entries. Experimental results of SPECint2000 benchmarks show that both a 32-entry and a 128-entry 4-way SALSC can significantly reduce the search bandwidth demand with no visible performance penalties, while a 128-entry L0 SALSC can improve the average execution times by 0.22%.