Using Multi-Task Queues to Improve Data Locality in Hadoop
碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 101 === Distributed computing system as cloud computing becomes more and more popular, and Hadoop is one of the familiar systems. Because the default task scheduling in Hadoop is FCFS that is not efficient, Master probably selects tasks without data locality for slav...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2013
|
Online Access: | http://ndltd.ncl.edu.tw/handle/60951468672167042635 |
id |
ndltd-TW-101NCKU5392044 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-101NCKU53920442015-10-13T22:51:34Z http://ndltd.ncl.edu.tw/handle/60951468672167042635 Using Multi-Task Queues to Improve Data Locality in Hadoop 使用多重任務佇列以改善Hadoop中之資料地域性 Jhong-YiChen 陳仲毅 碩士 國立成功大學 資訊工程學系碩博士班 101 Distributed computing system as cloud computing becomes more and more popular, and Hadoop is one of the familiar systems. Because the default task scheduling in Hadoop is FCFS that is not efficient, Master probably selects tasks without data locality for slaves. It causes many unnecessary data transfer within slaves that directly increase jobs' execution times and make racks become the network bandwidth bottleneck. In this paper, we present a scheduling algorithm to globally consider task with data locality and load-balance. First, we create multiple queues for every slave and put each task to the queue with best data locality. Next, we compute the time balancing limit, and we start to shift tasks according the limit to get a better task assignment. In contrast with default scheduling, the proposed method could always keep less data transfer and improve the computing system performance. Sun-Yuan Hsieh 謝孫源 2013 學位論文 ; thesis 50 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 101 === Distributed computing system as cloud computing becomes more and more popular, and Hadoop is one of the familiar systems. Because the default task scheduling in Hadoop is FCFS that is not efficient, Master probably selects tasks without data locality for slaves. It causes many unnecessary data transfer within slaves that directly increase jobs' execution times and make racks become the network bandwidth bottleneck. In this paper, we present a scheduling algorithm to globally consider task with data locality and load-balance. First, we create multiple queues for every slave and put each task to the queue with best data locality. Next, we compute the time balancing limit, and we start to shift tasks according the limit to get a better task assignment. In contrast with default scheduling, the proposed method could always keep less data transfer and improve the computing system performance.
|
author2 |
Sun-Yuan Hsieh |
author_facet |
Sun-Yuan Hsieh Jhong-YiChen 陳仲毅 |
author |
Jhong-YiChen 陳仲毅 |
spellingShingle |
Jhong-YiChen 陳仲毅 Using Multi-Task Queues to Improve Data Locality in Hadoop |
author_sort |
Jhong-YiChen |
title |
Using Multi-Task Queues to Improve Data Locality in Hadoop |
title_short |
Using Multi-Task Queues to Improve Data Locality in Hadoop |
title_full |
Using Multi-Task Queues to Improve Data Locality in Hadoop |
title_fullStr |
Using Multi-Task Queues to Improve Data Locality in Hadoop |
title_full_unstemmed |
Using Multi-Task Queues to Improve Data Locality in Hadoop |
title_sort |
using multi-task queues to improve data locality in hadoop |
publishDate |
2013 |
url |
http://ndltd.ncl.edu.tw/handle/60951468672167042635 |
work_keys_str_mv |
AT jhongyichen usingmultitaskqueuestoimprovedatalocalityinhadoop AT chénzhòngyì usingmultitaskqueuestoimprovedatalocalityinhadoop AT jhongyichen shǐyòngduōzhòngrènwùzhùlièyǐgǎishànhadoopzhōngzhīzīliàodeyùxìng AT chénzhòngyì shǐyòngduōzhòngrènwùzhùlièyǐgǎishànhadoopzhōngzhīzīliàodeyùxìng |
_version_ |
1718080738924953600 |