Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 102 === Integrating the CPU and GPU on the same chip has become the development trend for microprocessor design. In integrated CPU-GPU architecture, utilizing the shared last-level cache (LLC) is a critical design issue due to the pressure on shared resources and the d...

Full description

Bibliographic Details
Main Authors: Cheng-Hsuan Li, 李承軒
Other Authors: Chia-Lin Yang
Format: Others
Language:en_US
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/33311478280299879988
id ndltd-TW-102NTU05392069
record_format oai_dc
spelling ndltd-TW-102NTU053920692016-03-09T04:24:19Z http://ndltd.ncl.edu.tw/handle/33311478280299879988 Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture 基於最後一層快取記憶體加權存取延遲值之中央處理器與繪圖處理器異質性架構的快取記憶體分割機制 Cheng-Hsuan Li 李承軒 碩士 國立臺灣大學 資訊工程學研究所 102 Integrating the CPU and GPU on the same chip has become the development trend for microprocessor design. In integrated CPU-GPU architecture, utilizing the shared last-level cache (LLC) is a critical design issue due to the pressure on shared resources and the different characteristics of CPU and GPU applications. Because of the latency-hiding capability provided by the GPU and the huge discrepancy in concurrent executing threads between the CPU and GPU, LLC partitioning can no longer be achieved by simply minimizing the overall cache misses as in homogeneous CPUs. State-of-the-art cache partitioning mechanism distinguishes those cache-insensitive GPU applications from those cache-sensitive ones and optimize only the cache misses for CPU applications when the GPU is cache-insensitive. However, optimizing only the cache hit rate for CPU applications generates more cache misses from the GPU and leads to longer queuing delay in the underlying DRAM system. In terms of memory access latency, the loss due to longer queuing delay may out-weight the benefit from higher cache hit ratio. Therefore, we find that even though the performance of the GPU application may not be sensitive to cache resources, CPU applications'' cache hit rate is not the only factor which should be considered in partitioning the LLC. Cache miss penalty, i.e., off-chip latency, is also an important factor in designing LLC partitioning mechanism for integrated CPU-GPU architecture. In this paper, we proposed a Weighted LLC Latency-Based Run-Time Cache Partitioning for integrated CPU-GPU architecture. In order to correlate cache partition to overall performance more accurately, we develops a mechanism to predict the off-chip latency based on the number of total cache misses, and a GPU cache-sensitivity monitor, which quantitatively profiles GPU''s performance sensitivity to memory access latency. The experimental results show that the proposed mechanism improves the overall throughput by 9.7% over TLP-aware cache partitioning (TAP), 6.2% over Utility-based Cache Partitioning (UCP), and 10.9% over LRU on 30 heterogeneous workloads. Chia-Lin Yang 楊佳玲 2014 學位論文 ; thesis 37 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 資訊工程學研究所 === 102 === Integrating the CPU and GPU on the same chip has become the development trend for microprocessor design. In integrated CPU-GPU architecture, utilizing the shared last-level cache (LLC) is a critical design issue due to the pressure on shared resources and the different characteristics of CPU and GPU applications. Because of the latency-hiding capability provided by the GPU and the huge discrepancy in concurrent executing threads between the CPU and GPU, LLC partitioning can no longer be achieved by simply minimizing the overall cache misses as in homogeneous CPUs. State-of-the-art cache partitioning mechanism distinguishes those cache-insensitive GPU applications from those cache-sensitive ones and optimize only the cache misses for CPU applications when the GPU is cache-insensitive. However, optimizing only the cache hit rate for CPU applications generates more cache misses from the GPU and leads to longer queuing delay in the underlying DRAM system. In terms of memory access latency, the loss due to longer queuing delay may out-weight the benefit from higher cache hit ratio. Therefore, we find that even though the performance of the GPU application may not be sensitive to cache resources, CPU applications'' cache hit rate is not the only factor which should be considered in partitioning the LLC. Cache miss penalty, i.e., off-chip latency, is also an important factor in designing LLC partitioning mechanism for integrated CPU-GPU architecture. In this paper, we proposed a Weighted LLC Latency-Based Run-Time Cache Partitioning for integrated CPU-GPU architecture. In order to correlate cache partition to overall performance more accurately, we develops a mechanism to predict the off-chip latency based on the number of total cache misses, and a GPU cache-sensitivity monitor, which quantitatively profiles GPU''s performance sensitivity to memory access latency. The experimental results show that the proposed mechanism improves the overall throughput by 9.7% over TLP-aware cache partitioning (TAP), 6.2% over Utility-based Cache Partitioning (UCP), and 10.9% over LRU on 30 heterogeneous workloads.
author2 Chia-Lin Yang
author_facet Chia-Lin Yang
Cheng-Hsuan Li
李承軒
author Cheng-Hsuan Li
李承軒
spellingShingle Cheng-Hsuan Li
李承軒
Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture
author_sort Cheng-Hsuan Li
title Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture
title_short Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture
title_full Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture
title_fullStr Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture
title_full_unstemmed Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture
title_sort weighted llc latency-based run-time cache partitioning for heterogeneous cpu-gpu architecture
publishDate 2014
url http://ndltd.ncl.edu.tw/handle/33311478280299879988
work_keys_str_mv AT chenghsuanli weightedllclatencybasedruntimecachepartitioningforheterogeneouscpugpuarchitecture
AT lǐchéngxuān weightedllclatencybasedruntimecachepartitioningforheterogeneouscpugpuarchitecture
AT chenghsuanli jīyúzuìhòuyīcéngkuàiqǔjìyìtǐjiāquáncúnqǔyánchízhízhīzhōngyāngchùlǐqìyǔhuìtúchùlǐqìyìzhìxìngjiàgòudekuàiqǔjìyìtǐfēngējīzhì
AT lǐchéngxuān jīyúzuìhòuyīcéngkuàiqǔjìyìtǐjiāquáncúnqǔyánchízhízhīzhōngyāngchùlǐqìyǔhuìtúchùlǐqìyìzhìxìngjiàgòudekuàiqǔjìyìtǐfēngējīzhì
_version_ 1718200682793664512