Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 102 === Integrating the CPU and GPU on the same chip has become the development trend for microprocessor design. In integrated CPU-GPU architecture, utilizing the shared last-level cache (LLC) is a critical design issue due to the pressure on shared resources and the d...

Full description

Bibliographic Details
Main Authors:	Cheng-Hsuan Li, 李承軒
Other Authors:	Chia-Lin Yang
Format:	Others
Language:	en_US
Published:	2014
Online Access:	http://ndltd.ncl.edu.tw/handle/33311478280299879988

id	ndltd-TW-102NTU05392069
record_format	oai_dc
spelling	ndltd-TW-102NTU053920692016-03-09T04:24:19Z http://ndltd.ncl.edu.tw/handle/33311478280299879988 Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture 基於最後一層快取記憶體加權存取延遲值之中央處理器與繪圖處理器異質性架構的快取記憶體分割機制 Cheng-Hsuan Li 李承軒碩士國立臺灣大學資訊工程學研究所 102 Integrating the CPU and GPU on the same chip has become the development trend for microprocessor design. In integrated CPU-GPU architecture, utilizing the shared last-level cache (LLC) is a critical design issue due to the pressure on shared resources and the different characteristics of CPU and GPU applications. Because of the latency-hiding capability provided by the GPU and the huge discrepancy in concurrent executing threads between the CPU and GPU, LLC partitioning can no longer be achieved by simply minimizing the overall cache misses as in homogeneous CPUs. State-of-the-art cache partitioning mechanism distinguishes those cache-insensitive GPU applications from those cache-sensitive ones and optimize only the cache misses for CPU applications when the GPU is cache-insensitive. However, optimizing only the cache hit rate for CPU applications generates more cache misses from the GPU and leads to longer queuing delay in the underlying DRAM system. In terms of memory access latency, the loss due to longer queuing delay may out-weight the benefit from higher cache hit ratio. Therefore, we find that even though the performance of the GPU application may not be sensitive to cache resources, CPU applications'' cache hit rate is not the only factor which should be considered in partitioning the LLC. Cache miss penalty, i.e., off-chip latency, is also an important factor in designing LLC partitioning mechanism for integrated CPU-GPU architecture. In this paper, we proposed a Weighted LLC Latency-Based Run-Time Cache Partitioning for integrated CPU-GPU architecture. In order to correlate cache partition to overall performance more accurately, we develops a mechanism to predict the off-chip latency based on the number of total cache misses, and a GPU cache-sensitivity monitor, which quantitatively profiles GPU''s performance sensitivity to memory access latency. The experimental results show that the proposed mechanism improves the overall throughput by 9.7% over TLP-aware cache partitioning (TAP), 6.2% over Utility-based Cache Partitioning (UCP), and 10.9% over LRU on 30 heterogeneous workloads. Chia-Lin Yang 楊佳玲 2014 學位論文 ; thesis 37 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立臺灣大學 === 資訊工程學研究所 === 102 === Integrating the CPU and GPU on the same chip has become the development trend for microprocessor design. In integrated CPU-GPU architecture, utilizing the shared last-level cache (LLC) is a critical design issue due to the pressure on shared resources and the different characteristics of CPU and GPU applications. Because of the latency-hiding capability provided by the GPU and the huge discrepancy in concurrent executing threads between the CPU and GPU, LLC partitioning can no longer be achieved by simply minimizing the overall cache misses as in homogeneous CPUs. State-of-the-art cache partitioning mechanism distinguishes those cache-insensitive GPU applications from those cache-sensitive ones and optimize only the cache misses for CPU applications when the GPU is cache-insensitive. However, optimizing only the cache hit rate for CPU applications generates more cache misses from the GPU and leads to longer queuing delay in the underlying DRAM system. In terms of memory access latency, the loss due to longer queuing delay may out-weight the benefit from higher cache hit ratio. Therefore, we find that even though the performance of the GPU application may not be sensitive to cache resources, CPU applications'' cache hit rate is not the only factor which should be considered in partitioning the LLC. Cache miss penalty, i.e., off-chip latency, is also an important factor in designing LLC partitioning mechanism for integrated CPU-GPU architecture. In this paper, we proposed a Weighted LLC Latency-Based Run-Time Cache Partitioning for integrated CPU-GPU architecture. In order to correlate cache partition to overall performance more accurately, we develops a mechanism to predict the off-chip latency based on the number of total cache misses, and a GPU cache-sensitivity monitor, which quantitatively profiles GPU''s performance sensitivity to memory access latency. The experimental results show that the proposed mechanism improves the overall throughput by 9.7% over TLP-aware cache partitioning (TAP), 6.2% over Utility-based Cache Partitioning (UCP), and 10.9% over LRU on 30 heterogeneous workloads.
author2	Chia-Lin Yang
author_facet	Chia-Lin Yang Cheng-Hsuan Li 李承軒
author	Cheng-Hsuan Li 李承軒
spellingShingle	Cheng-Hsuan Li 李承軒 Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture
author_sort	Cheng-Hsuan Li
title	Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture
title_short	Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture
title_full	Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture
title_fullStr	Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture
title_full_unstemmed	Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture
title_sort	weighted llc latency-based run-time cache partitioning for heterogeneous cpu-gpu architecture
publishDate	2014
url	http://ndltd.ncl.edu.tw/handle/33311478280299879988
work_keys_str_mv	AT chenghsuanli weightedllclatencybasedruntimecachepartitioningforheterogeneouscpugpuarchitecture AT lǐchéngxuān weightedllclatencybasedruntimecachepartitioningforheterogeneouscpugpuarchitecture AT chenghsuanli jīyúzuìhòuyīcéngkuàiqǔjìyìtǐjiāquáncúnqǔyánchízhízhīzhōngyāngchùlǐqìyǔhuìtúchùlǐqìyìzhìxìngjiàgòudekuàiqǔjìyìtǐfēngējīzhì AT lǐchéngxuān jīyúzuìhòuyīcéngkuàiqǔjìyìtǐjiāquáncúnqǔyánchízhízhīzhōngyāngchùlǐqìyǔhuìtúchùlǐqìyìzhìxìngjiàgòudekuàiqǔjìyìtǐfēngējīzhì
_version_	1718200682793664512

Weighted LLC Latency-Based Run-Time Cache Partitioning for Heterogeneous CPU-GPU Architecture

Similar Items