Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs

碩士 === 臺灣大學 === 資訊工程學研究所 === 98 === Memory Wall is a well-known obstacle to processor performance improvement. The popularity of multi-core architecture will further exaggerate the problem since memory resource is shared by all the cores. Interferences among requests from different cores may prolong...

Full description

Bibliographic Details
Main Authors: Hsiang-Yun Cheng, 鄭湘筠
Other Authors: Chia-Lin Yang
Format: Others
Language:en_US
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/56102928507045398258
id ndltd-TW-098NTU05392046
record_format oai_dc
spelling ndltd-TW-098NTU053920462015-10-13T18:49:39Z http://ndltd.ncl.edu.tw/handle/56102928507045398258 Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs 多核處理器架構下有效利用記憶體頻寬之排程策略 Hsiang-Yun Cheng 鄭湘筠 碩士 臺灣大學 資訊工程學研究所 98 Memory Wall is a well-known obstacle to processor performance improvement. The popularity of multi-core architecture will further exaggerate the problem since memory resource is shared by all the cores. Interferences among requests from different cores may prolong the execution time of memory accesses thereby degrading system performance. To tackle the problem, this thesis proposes to decouple applications into computation and memory tasks, and restrict the number of concurrent memory threads to reduce the contention. Yet with this scheduling restriction, a CPU core may spend some time in acquiring the permission to execute memory tasks and adversely impact the overall performance. Therefore, we develop a memory thread throttling mechanism that tunes the allowable memory threads dynamically under workload variation to improve system performance. The proposed run-time mechanism monitors memory and computation ratios of a program for phase detection. It then decides the memory thread constraint for the next program phase based on an analytical model that can estimate system performance under different constraint values. To prove the concept, we prototype the mechanism in some real-world applications as well as synthetic workloads. We evaluate their performance with real hardware. The experimental results demonstrate up to 20% speedup with a pool of synthetic workloads on an Intel i7 (Nehalem) machine and matches to the speedup estimated by the proposed analytical model. Furthermore, the intelligent run-time scheduling leads to a geometric mean of 12% performance improvement for realistic applications on the same hardware. Chia-Lin Yang 楊佳玲 2010 學位論文 ; thesis 47 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 臺灣大學 === 資訊工程學研究所 === 98 === Memory Wall is a well-known obstacle to processor performance improvement. The popularity of multi-core architecture will further exaggerate the problem since memory resource is shared by all the cores. Interferences among requests from different cores may prolong the execution time of memory accesses thereby degrading system performance. To tackle the problem, this thesis proposes to decouple applications into computation and memory tasks, and restrict the number of concurrent memory threads to reduce the contention. Yet with this scheduling restriction, a CPU core may spend some time in acquiring the permission to execute memory tasks and adversely impact the overall performance. Therefore, we develop a memory thread throttling mechanism that tunes the allowable memory threads dynamically under workload variation to improve system performance. The proposed run-time mechanism monitors memory and computation ratios of a program for phase detection. It then decides the memory thread constraint for the next program phase based on an analytical model that can estimate system performance under different constraint values. To prove the concept, we prototype the mechanism in some real-world applications as well as synthetic workloads. We evaluate their performance with real hardware. The experimental results demonstrate up to 20% speedup with a pool of synthetic workloads on an Intel i7 (Nehalem) machine and matches to the speedup estimated by the proposed analytical model. Furthermore, the intelligent run-time scheduling leads to a geometric mean of 12% performance improvement for realistic applications on the same hardware.
author2 Chia-Lin Yang
author_facet Chia-Lin Yang
Hsiang-Yun Cheng
鄭湘筠
author Hsiang-Yun Cheng
鄭湘筠
spellingShingle Hsiang-Yun Cheng
鄭湘筠
Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs
author_sort Hsiang-Yun Cheng
title Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs
title_short Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs
title_full Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs
title_fullStr Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs
title_full_unstemmed Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs
title_sort task scheduling for efficient memory bandwidth utilization on cmps
publishDate 2010
url http://ndltd.ncl.edu.tw/handle/56102928507045398258
work_keys_str_mv AT hsiangyuncheng taskschedulingforefficientmemorybandwidthutilizationoncmps
AT zhèngxiāngyún taskschedulingforefficientmemorybandwidthutilizationoncmps
AT hsiangyuncheng duōhéchùlǐqìjiàgòuxiàyǒuxiàolìyòngjìyìtǐpínkuānzhīpáichéngcèlüè
AT zhèngxiāngyún duōhéchùlǐqìjiàgòuxiàyǒuxiàolìyòngjìyìtǐpínkuānzhīpáichéngcèlüè
_version_ 1718037585423499264