Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs

碩士 === 臺灣大學 === 資訊工程學研究所 === 98 === Memory Wall is a well-known obstacle to processor performance improvement. The popularity of multi-core architecture will further exaggerate the problem since memory resource is shared by all the cores. Interferences among requests from different cores may prolong...

Full description

Bibliographic Details
Main Authors:	Hsiang-Yun Cheng, 鄭湘筠
Other Authors:	Chia-Lin Yang
Format:	Others
Language:	en_US
Published:	2010
Online Access:	http://ndltd.ncl.edu.tw/handle/56102928507045398258

id	ndltd-TW-098NTU05392046
record_format	oai_dc
spelling	ndltd-TW-098NTU053920462015-10-13T18:49:39Z http://ndltd.ncl.edu.tw/handle/56102928507045398258 Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs 多核處理器架構下有效利用記憶體頻寬之排程策略 Hsiang-Yun Cheng 鄭湘筠碩士臺灣大學資訊工程學研究所 98 Memory Wall is a well-known obstacle to processor performance improvement. The popularity of multi-core architecture will further exaggerate the problem since memory resource is shared by all the cores. Interferences among requests from different cores may prolong the execution time of memory accesses thereby degrading system performance. To tackle the problem, this thesis proposes to decouple applications into computation and memory tasks, and restrict the number of concurrent memory threads to reduce the contention. Yet with this scheduling restriction, a CPU core may spend some time in acquiring the permission to execute memory tasks and adversely impact the overall performance. Therefore, we develop a memory thread throttling mechanism that tunes the allowable memory threads dynamically under workload variation to improve system performance. The proposed run-time mechanism monitors memory and computation ratios of a program for phase detection. It then decides the memory thread constraint for the next program phase based on an analytical model that can estimate system performance under different constraint values. To prove the concept, we prototype the mechanism in some real-world applications as well as synthetic workloads. We evaluate their performance with real hardware. The experimental results demonstrate up to 20% speedup with a pool of synthetic workloads on an Intel i7 (Nehalem) machine and matches to the speedup estimated by the proposed analytical model. Furthermore, the intelligent run-time scheduling leads to a geometric mean of 12% performance improvement for realistic applications on the same hardware. Chia-Lin Yang 楊佳玲 2010 學位論文 ; thesis 47 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 臺灣大學 === 資訊工程學研究所 === 98 === Memory Wall is a well-known obstacle to processor performance improvement. The popularity of multi-core architecture will further exaggerate the problem since memory resource is shared by all the cores. Interferences among requests from different cores may prolong the execution time of memory accesses thereby degrading system performance. To tackle the problem, this thesis proposes to decouple applications into computation and memory tasks, and restrict the number of concurrent memory threads to reduce the contention. Yet with this scheduling restriction, a CPU core may spend some time in acquiring the permission to execute memory tasks and adversely impact the overall performance. Therefore, we develop a memory thread throttling mechanism that tunes the allowable memory threads dynamically under workload variation to improve system performance. The proposed run-time mechanism monitors memory and computation ratios of a program for phase detection. It then decides the memory thread constraint for the next program phase based on an analytical model that can estimate system performance under different constraint values. To prove the concept, we prototype the mechanism in some real-world applications as well as synthetic workloads. We evaluate their performance with real hardware. The experimental results demonstrate up to 20% speedup with a pool of synthetic workloads on an Intel i7 (Nehalem) machine and matches to the speedup estimated by the proposed analytical model. Furthermore, the intelligent run-time scheduling leads to a geometric mean of 12% performance improvement for realistic applications on the same hardware.
author2	Chia-Lin Yang
author_facet	Chia-Lin Yang Hsiang-Yun Cheng 鄭湘筠
author	Hsiang-Yun Cheng 鄭湘筠
spellingShingle	Hsiang-Yun Cheng 鄭湘筠 Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs
author_sort	Hsiang-Yun Cheng
title	Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs
title_short	Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs
title_full	Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs
title_fullStr	Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs
title_full_unstemmed	Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs
title_sort	task scheduling for efficient memory bandwidth utilization on cmps
publishDate	2010
url	http://ndltd.ncl.edu.tw/handle/56102928507045398258
work_keys_str_mv	AT hsiangyuncheng taskschedulingforefficientmemorybandwidthutilizationoncmps AT zhèngxiāngyún taskschedulingforefficientmemorybandwidthutilizationoncmps AT hsiangyuncheng duōhéchùlǐqìjiàgòuxiàyǒuxiàolìyòngjìyìtǐpínkuānzhīpáichéngcèlüè AT zhèngxiāngyún duōhéchùlǐqìjiàgòuxiàyǒuxiàolìyòngjìyìtǐpínkuānzhīpáichéngcèlüè
_version_	1718037585423499264

Task Scheduling for Efficient Memory Bandwidth Utilization on CMPs

Similar Items