Interference-aware Batch Memory Scheduling in Heterogeneous Multicore Architecture

碩士 === 國立中正大學 === 資訊工程研究所 === 102 === In recent years, integrating Central Processing Units (CPUs) and Graphics Processing Units (GPUs) on the same chip has become a major trend. For instance, the Heterogeneous System Architecture (HSA) has been proposed as an integrated architecture that combines C...

Full description

Bibliographic Details
Main Authors: Yi-Chien Song, 宋羿謙
Other Authors: Pao-Ann Hsiung
Format: Others
Language:en_US
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/58662903211578321211
Description
Summary:碩士 === 國立中正大學 === 資訊工程研究所 === 102 === In recent years, integrating Central Processing Units (CPUs) and Graphics Processing Units (GPUs) on the same chip has become a major trend. For instance, the Heterogeneous System Architecture (HSA) has been proposed as an integrated architecture that combines CPUs and GPUs on the same chip and shares off-chip memory. When the number of processors increases, a large number of memory requests are generated causing memory access conflicts. Due to the heterogeneity of processors the issue has become more serious in HSA. To address this issue, we propose a memory scheduling algorithm called \textit{Interference-aware Batch Memory Scheduling Algorithm (IBM)}, which decouples the memory scheduling into three stages. The first stage groups memory requests into a batch based on row-buffer locality (RBL) or bank-level parallelism (BLP). The second stage uses a simpler and high-level policy to schedule batches instead of memory requests. The third stage processes the memory command queue to enforce the memory timing constraints. We compare IBM against the \textit{First-Come First-Serve (FCFS)} and \textit{Staged Memory Scheduling (SMS)}. Our evaluations show that IBM reduces at most $6.61\%$ memory access latency and $44.96\%$ CPU access latency in the general case and reduces at most $4.2\%$ memory access latency and $24.37\%$ CPU access latency in real-world benchmarks.