Visible synchronization based cache coherence
In large scale machines, thousands of processor cycles, in other words, missed opportunities to issue floating point instructions, may be lost while waiting for a high latency synchronization or memory operation to complete, or a stall in an instruction pipeline to be dealt with. Latency is avoided...
Main Author: | |
---|---|
Format: | Others |
Published: |
1997
|
Online Access: | http://spectrum.library.concordia.ca/273/1/MQ44885.pdf Kumar, Krishna <http://spectrum.library.concordia.ca/view/creators/Kumar=3AKrishna=3A=3A.html> (1997) Visible synchronization based cache coherence. Masters thesis, Concordia University. |
Summary: | In large scale machines, thousands of processor cycles, in other words, missed opportunities to issue floating point instructions, may be lost while waiting for a high latency synchronization or memory operation to complete, or a stall in an instruction pipeline to be dealt with. Latency is avoided by bringing data to a nearby locale for future reference (e.g., caching) while latency is tolerated by overlapping data movement with something useful. The issue of cache coherence arises whenever there are multiple copies of a shared datum in different caches of a shared-memory multiprocessor system. It is in order to maintain consistency between these multiple copies that cache coherence protocols are employed. The efficiency of latency avoidance methods is largely dependent upon the minimization of coherence traffic in the coherence protocol used to maintain cache coherency. Cache coherence protocols in general can be divided into two classes: hardware implemented ones and compiler implemented ones. Hardware implemented ones lead to large coherence traffic, and large state storage space. Conventional compiler implemented ones involve indiscriminate wasteful invalidation. There is also redundancy between synchronization operations and coherence operations. We seek to eliminate both weaknesses, by letting visible synchronization directly coordinate changes in the writability of shared data. We propose to add scalable compiler managed caches to a TERA-like multithreaded multiprocessor architecture, with user/compiler knowledge (i.e., alias analysis, dependence analysis and user directives) used to eliminate essentially all coherence traffic. To preserve scalability, we aim to use latency tolerance methods like switch-on-every-cycle multithreading, and augment this with simple, low-latency cache coherence protocols such as our visible synchronization based one. |
---|