Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs
碩士 === 臺灣大學 === 資訊工程學研究所 === 98 === Memory Wall is a well-known obstacle to processor performance improvement. The popularity of multi-core architecture will further exaggerate the problem since memory resource is shared by all the cores. Interferences among requests from different cores may prolong...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2010
|
Online Access: | http://ndltd.ncl.edu.tw/handle/56102928507045398258 |
id |
ndltd-TW-098NTU05392046 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-098NTU053920462015-10-13T18:49:39Z http://ndltd.ncl.edu.tw/handle/56102928507045398258 Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs 多核處理器架構下有效利用記憶體頻寬之排程策略 Hsiang-Yun Cheng 鄭湘筠 碩士 臺灣大學 資訊工程學研究所 98 Memory Wall is a well-known obstacle to processor performance improvement. The popularity of multi-core architecture will further exaggerate the problem since memory resource is shared by all the cores. Interferences among requests from different cores may prolong the execution time of memory accesses thereby degrading system performance. To tackle the problem, this thesis proposes to decouple applications into computation and memory tasks, and restrict the number of concurrent memory threads to reduce the contention. Yet with this scheduling restriction, a CPU core may spend some time in acquiring the permission to execute memory tasks and adversely impact the overall performance. Therefore, we develop a memory thread throttling mechanism that tunes the allowable memory threads dynamically under workload variation to improve system performance. The proposed run-time mechanism monitors memory and computation ratios of a program for phase detection. It then decides the memory thread constraint for the next program phase based on an analytical model that can estimate system performance under different constraint values. To prove the concept, we prototype the mechanism in some real-world applications as well as synthetic workloads. We evaluate their performance with real hardware. The experimental results demonstrate up to 20% speedup with a pool of synthetic workloads on an Intel i7 (Nehalem) machine and matches to the speedup estimated by the proposed analytical model. Furthermore, the intelligent run-time scheduling leads to a geometric mean of 12% performance improvement for realistic applications on the same hardware. Chia-Lin Yang 楊佳玲 2010 學位論文 ; thesis 47 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 臺灣大學 === 資訊工程學研究所 === 98 === Memory Wall is a well-known obstacle to processor performance improvement. The popularity of multi-core architecture will further exaggerate the problem since memory resource is shared by all the cores. Interferences among requests from different cores may prolong the execution time of memory accesses thereby degrading system performance. To tackle the problem, this thesis proposes to decouple applications into computation and memory tasks, and restrict the number of concurrent memory threads to reduce the contention. Yet with this scheduling restriction, a CPU core may spend some time in acquiring the permission to execute memory tasks and adversely impact the overall performance. Therefore, we develop a memory thread throttling mechanism that tunes the allowable memory threads dynamically under workload variation to improve system performance. The proposed run-time mechanism monitors memory and computation ratios of a program for phase detection. It then decides the memory thread constraint for the next program phase based on an analytical model that can estimate system performance under different constraint values. To prove the concept, we prototype the mechanism in some real-world applications as well as synthetic workloads. We evaluate their performance with real hardware. The experimental results demonstrate up to 20% speedup with a pool of synthetic workloads on an Intel i7 (Nehalem) machine and matches to the speedup estimated by the proposed analytical model. Furthermore, the intelligent run-time scheduling leads to a geometric mean of 12% performance improvement for realistic applications on the same hardware.
|
author2 |
Chia-Lin Yang |
author_facet |
Chia-Lin Yang Hsiang-Yun Cheng 鄭湘筠 |
author |
Hsiang-Yun Cheng 鄭湘筠 |
spellingShingle |
Hsiang-Yun Cheng 鄭湘筠 Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs |
author_sort |
Hsiang-Yun Cheng |
title |
Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs |
title_short |
Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs |
title_full |
Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs |
title_fullStr |
Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs |
title_full_unstemmed |
Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs |
title_sort |
task scheduling for efficient memory bandwidth utilization on cmps |
publishDate |
2010 |
url |
http://ndltd.ncl.edu.tw/handle/56102928507045398258 |
work_keys_str_mv |
AT hsiangyuncheng taskschedulingforefficientmemorybandwidthutilizationoncmps AT zhèngxiāngyún taskschedulingforefficientmemorybandwidthutilizationoncmps AT hsiangyuncheng duōhéchùlǐqìjiàgòuxiàyǒuxiàolìyòngjìyìtǐpínkuānzhīpáichéngcèlüè AT zhèngxiāngyún duōhéchùlǐqìjiàgòuxiàyǒuxiàolìyòngjìyìtǐpínkuānzhīpáichéngcèlüè |
_version_ |
1718037585423499264 |