Scaling Distributed Cache Hierarchies through Computation and Data Co-Scheduling
Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data must be close to the threads that use it. Moreover, cache capacity is limited and contended among threads, introducing complex capacity/latency tradeoffs. Prior NUCA schemes have focused on managing data to red...
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
Institute of Electrical and Electronics Engineers (IEEE),
2015-02-26T13:37:58Z.
|
Subjects: | |
Online Access: | Get fulltext |
Summary: | Cache hierarchies are increasingly non-uniform, so for systems to scale efficiently, data must be close to the threads that use it. Moreover, cache capacity is limited and contended among threads, introducing complex capacity/latency tradeoffs. Prior NUCA schemes have focused on managing data to reduce access latency, but have ignored thread placement; and applying prior NUMA thread placement schemes to NUCA is inefficient, as capacity, not bandwidth, is the main constraint. We present CDCS, a technique to jointly place threads and data in multicores with distributed shared caches. We develop novel monitoring hardware that enables fine-grained space allocation on large caches, and data movement support to allow frequent full-chip reconfigurations. On a 64-core system, CDCS outperforms an S-NUCA LLC by 46% on average (up to 76%) in weighted speedup and saves 36% of system energy. CDCS also outperforms state-of-the-art NUCA schemes under different thread scheduling policies. National Science Foundation (U.S.) (Grant CCF-1318384) Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Jacobs Presidential Fellowship) United States. Defense Advanced Research Projects Agency (PERFECT Contract HR0011-13-2-0005) |
---|