Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 102 === Integrating the CPU and GPU on the same chip has become the development trend for microprocessor design. In integrated CPU-GPU architecture, utilizing the shared last-level cache (LLC) is a critical design issue due to the pressure on shared resources and the d...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2014
|
Online Access: | http://ndltd.ncl.edu.tw/handle/33311478280299879988 |
id |
ndltd-TW-102NTU05392069 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-102NTU053920692016-03-09T04:24:19Z http://ndltd.ncl.edu.tw/handle/33311478280299879988 Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture 基於最後一層快取記憶體加權存取延遲值之中央處理器與繪圖處理器異質性架構的快取記憶體分割機制 Cheng-Hsuan Li 李承軒 碩士 國立臺灣大學 資訊工程學研究所 102 Integrating the CPU and GPU on the same chip has become the development trend for microprocessor design. In integrated CPU-GPU architecture, utilizing the shared last-level cache (LLC) is a critical design issue due to the pressure on shared resources and the different characteristics of CPU and GPU applications. Because of the latency-hiding capability provided by the GPU and the huge discrepancy in concurrent executing threads between the CPU and GPU, LLC partitioning can no longer be achieved by simply minimizing the overall cache misses as in homogeneous CPUs. State-of-the-art cache partitioning mechanism distinguishes those cache-insensitive GPU applications from those cache-sensitive ones and optimize only the cache misses for CPU applications when the GPU is cache-insensitive. However, optimizing only the cache hit rate for CPU applications generates more cache misses from the GPU and leads to longer queuing delay in the underlying DRAM system. In terms of memory access latency, the loss due to longer queuing delay may out-weight the benefit from higher cache hit ratio. Therefore, we find that even though the performance of the GPU application may not be sensitive to cache resources, CPU applications'' cache hit rate is not the only factor which should be considered in partitioning the LLC. Cache miss penalty, i.e., off-chip latency, is also an important factor in designing LLC partitioning mechanism for integrated CPU-GPU architecture. In this paper, we proposed a Weighted LLC Latency-Based Run-Time Cache Partitioning for integrated CPU-GPU architecture. In order to correlate cache partition to overall performance more accurately, we develops a mechanism to predict the off-chip latency based on the number of total cache misses, and a GPU cache-sensitivity monitor, which quantitatively profiles GPU''s performance sensitivity to memory access latency. The experimental results show that the proposed mechanism improves the overall throughput by 9.7% over TLP-aware cache partitioning (TAP), 6.2% over Utility-based Cache Partitioning (UCP), and 10.9% over LRU on 30 heterogeneous workloads. Chia-Lin Yang 楊佳玲 2014 學位論文 ; thesis 37 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 102 === Integrating the CPU and GPU on the same chip has become the development trend for microprocessor design. In integrated CPU-GPU architecture, utilizing the shared last-level cache (LLC) is a critical design issue due to the pressure on shared resources and the different characteristics of CPU and GPU applications. Because of the latency-hiding capability provided by the GPU and the huge discrepancy in concurrent executing threads between the CPU and GPU, LLC partitioning can no longer be achieved by simply minimizing the overall cache misses as in homogeneous CPUs. State-of-the-art cache partitioning mechanism distinguishes those cache-insensitive GPU applications from those cache-sensitive ones and optimize only the cache misses for CPU applications when the GPU is cache-insensitive. However, optimizing only the cache hit rate for CPU applications generates more cache misses from the GPU and leads to longer queuing delay in the underlying DRAM system. In terms of memory access latency, the loss due to longer queuing delay may out-weight the benefit from higher cache hit ratio. Therefore, we find that even though the performance of the GPU application may not be sensitive to cache resources, CPU applications'' cache hit rate is not the only factor which should be considered in partitioning the LLC. Cache miss penalty, i.e., off-chip latency, is also an important factor in designing LLC partitioning mechanism for integrated CPU-GPU architecture.
In this paper, we proposed a Weighted LLC Latency-Based Run-Time Cache Partitioning for integrated CPU-GPU architecture. In order to correlate cache partition to overall performance more accurately, we develops a mechanism to predict the off-chip latency based on the number of total cache misses, and a GPU cache-sensitivity monitor, which quantitatively profiles GPU''s performance sensitivity to memory access latency. The experimental results show that the proposed mechanism improves the overall throughput by 9.7% over TLP-aware cache partitioning (TAP), 6.2% over Utility-based Cache Partitioning (UCP), and 10.9% over LRU on 30 heterogeneous workloads.
|
author2 |
Chia-Lin Yang |
author_facet |
Chia-Lin Yang Cheng-Hsuan Li 李承軒 |
author |
Cheng-Hsuan Li 李承軒 |
spellingShingle |
Cheng-Hsuan Li 李承軒 Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture |
author_sort |
Cheng-Hsuan Li |
title |
Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture |
title_short |
Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture |
title_full |
Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture |
title_fullStr |
Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture |
title_full_unstemmed |
Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture |
title_sort |
weighted llc latency-based run-time cache partitioning for heterogeneous cpu-gpu architecture |
publishDate |
2014 |
url |
http://ndltd.ncl.edu.tw/handle/33311478280299879988 |
work_keys_str_mv |
AT chenghsuanli weightedllclatencybasedruntimecachepartitioningforheterogeneouscpugpuarchitecture AT lǐchéngxuān weightedllclatencybasedruntimecachepartitioningforheterogeneouscpugpuarchitecture AT chenghsuanli jīyúzuìhòuyīcéngkuàiqǔjìyìtǐjiāquáncúnqǔyánchízhízhīzhōngyāngchùlǐqìyǔhuìtúchùlǐqìyìzhìxìngjiàgòudekuàiqǔjìyìtǐfēngējīzhì AT lǐchéngxuān jīyúzuìhòuyīcéngkuàiqǔjìyìtǐjiāquáncúnqǔyánchízhízhīzhōngyāngchùlǐqìyǔhuìtúchùlǐqìyìzhìxìngjiàgòudekuàiqǔjìyìtǐfēngējīzhì |
_version_ |
1718200682793664512 |