Summary: | Modern computer systems are power or energy limited. While the number of transistors per chip continues to increase, classic Dennard voltage scaling has come to an end. Therefore, architects must improve a design's energy efficiency to continue to increase performance at historical rates, while staying within a system's power limit. Throughput processors, which use a large number of threads to tolerate
memory latency, have emerged as an energy-efficient platform for
achieving high performance on diverse workloads and are found in
systems ranging from cell phones to supercomputers. This work focuses
on graphics processing units (GPUs), which contain thousands of
threads per chip.
In this dissertation, I redesign the on-chip storage system of a
modern GPU to improve energy efficiency. Modern GPUs contain very large register files that consume between 15%-20% of the
processor's dynamic energy. Most values written into the register
file are only read a single time, often within a few instructions of
being produced. To optimize for these patterns, we explore various
designs for register file hierarchies. We study both a
hardware-managed register file cache and a software-managed operand register file. We evaluate the energy tradeoffs in varying the number of levels and the capacity of each level in the hierarchy. Our most efficient design reduces register file energy by 54%.
Beyond the register file, GPUs also contain on-chip scratchpad
memories and caches. Traditional systems have a fixed partitioning
between these three structures. Applications have diverse
requirements and often a single resource is most critical to
performance. We propose to unify the register file, primary data
cache, and scratchpad memory into a single structure that is
dynamically partitioned on a per-kernel basis to match the
application's needs.
The techniques proposed in this dissertation improve the utilization of on-chip memory, a scarce resource for systems with a large number of hardware threads. Making more efficient use of on-chip memory both improves performance and reduces energy. Future efficient systems will be achieved by the combination of several such techniques which
improve energy efficiency. === text
|