Using Multi-Task Queues to Improve Data Locality in Hadoop

碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 101 === Distributed computing system as cloud computing becomes more and more popular, and Hadoop is one of the familiar systems. Because the default task scheduling in Hadoop is FCFS that is not efficient, Master probably selects tasks without data locality for slav...

Full description

Bibliographic Details
Main Authors: Jhong-YiChen, 陳仲毅
Other Authors: Sun-Yuan Hsieh
Format: Others
Language:en_US
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/60951468672167042635
id ndltd-TW-101NCKU5392044
record_format oai_dc
spelling ndltd-TW-101NCKU53920442015-10-13T22:51:34Z http://ndltd.ncl.edu.tw/handle/60951468672167042635 Using Multi-Task Queues to Improve Data Locality in Hadoop 使用多重任務佇列以改善Hadoop中之資料地域性 Jhong-YiChen 陳仲毅 碩士 國立成功大學 資訊工程學系碩博士班 101 Distributed computing system as cloud computing becomes more and more popular, and Hadoop is one of the familiar systems. Because the default task scheduling in Hadoop is FCFS that is not efficient, Master probably selects tasks without data locality for slaves. It causes many unnecessary data transfer within slaves that directly increase jobs' execution times and make racks become the network bandwidth bottleneck. In this paper, we present a scheduling algorithm to globally consider task with data locality and load-balance. First, we create multiple queues for every slave and put each task to the queue with best data locality. Next, we compute the time balancing limit, and we start to shift tasks according the limit to get a better task assignment. In contrast with default scheduling, the proposed method could always keep less data transfer and improve the computing system performance. Sun-Yuan Hsieh 謝孫源 2013 學位論文 ; thesis 50 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 101 === Distributed computing system as cloud computing becomes more and more popular, and Hadoop is one of the familiar systems. Because the default task scheduling in Hadoop is FCFS that is not efficient, Master probably selects tasks without data locality for slaves. It causes many unnecessary data transfer within slaves that directly increase jobs' execution times and make racks become the network bandwidth bottleneck. In this paper, we present a scheduling algorithm to globally consider task with data locality and load-balance. First, we create multiple queues for every slave and put each task to the queue with best data locality. Next, we compute the time balancing limit, and we start to shift tasks according the limit to get a better task assignment. In contrast with default scheduling, the proposed method could always keep less data transfer and improve the computing system performance.
author2 Sun-Yuan Hsieh
author_facet Sun-Yuan Hsieh
Jhong-YiChen
陳仲毅
author Jhong-YiChen
陳仲毅
spellingShingle Jhong-YiChen
陳仲毅
Using Multi-Task Queues to Improve Data Locality in Hadoop
author_sort Jhong-YiChen
title Using Multi-Task Queues to Improve Data Locality in Hadoop
title_short Using Multi-Task Queues to Improve Data Locality in Hadoop
title_full Using Multi-Task Queues to Improve Data Locality in Hadoop
title_fullStr Using Multi-Task Queues to Improve Data Locality in Hadoop
title_full_unstemmed Using Multi-Task Queues to Improve Data Locality in Hadoop
title_sort using multi-task queues to improve data locality in hadoop
publishDate 2013
url http://ndltd.ncl.edu.tw/handle/60951468672167042635
work_keys_str_mv AT jhongyichen usingmultitaskqueuestoimprovedatalocalityinhadoop
AT chénzhòngyì usingmultitaskqueuestoimprovedatalocalityinhadoop
AT jhongyichen shǐyòngduōzhòngrènwùzhùlièyǐgǎishànhadoopzhōngzhīzīliàodeyùxìng
AT chénzhòngyì shǐyòngduōzhòngrènwùzhùlièyǐgǎishànhadoopzhōngzhīzīliàodeyùxìng
_version_ 1718080738924953600