Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highly Available Implementation of HDFS

The Hadoop Distributed Filesystem (HDFS) is the storage layer of Hadoop, scaling to support tens of petabytes of data at companies such as Facebook and Yahoo. One wellknown limitation of HDFS is that its metadata has been stored inmemory on a single node, called the NameNode. To overcome NameNode’s...

Full description

Bibliographic Details
Main Authors:	Peiro Sajjad, Hooman, Hakimzadeh Harirbaf, Mahmoud
Format:	Others
Language:	English
Published:	KTH, Skolan för informations- och kommunikationsteknik (ICT) 2013
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-127464

id	ndltd-UPSALLA1-oai-DiVA.org-kth-127464
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-kth-1274642013-08-31T05:00:21ZMaintaining Strong Consistency Semantics in a Horizontally Scalable and Highly Available Implementation of HDFSengPeiro Sajjad, HoomanHakimzadeh Harirbaf, MahmoudKTH, Skolan för informations- och kommunikationsteknik (ICT)KTH, Skolan för informations- och kommunikationsteknik (ICT)2013The Hadoop Distributed Filesystem (HDFS) is the storage layer of Hadoop, scaling to support tens of petabytes of data at companies such as Facebook and Yahoo. One wellknown limitation of HDFS is that its metadata has been stored inmemory on a single node, called the NameNode. To overcome NameNode’s limitations, a distributed file system concept based on HDFS, called KTHFS, was proposed in which NameNode’s metadata are stored on an inmemory replicated distributed database, MySQL Cluster. In this thesis, we show how to store the metadata of HDFS NameNode in an external distributed database while maintaining strong consistency semantics of HDFS for both filesystem operations and primitive HDFS operations. Our implementation supports MySQL Cluster, to store the metadata, although it only supports a readcommitted transaction isolation model. As a readcommitted isolation model cannot guarantee strong consistency, we needed to carefully design how metadata is read and written in MySQL Cluster to ensure our system preserves HDFS’s consistency model and is both deadlock free and highly performant. We developed a transaction model based on taking metadata snapshotting and the careful ordering of database operations. Our model is general enough to support any database providing at least readcommitted isolation level. We evaluate our model and show how HDFS can scale, while maintaining strong consistency, to terabytes of metadata. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-127464Trita-ICT-EX ; 2013:79application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
description	The Hadoop Distributed Filesystem (HDFS) is the storage layer of Hadoop, scaling to support tens of petabytes of data at companies such as Facebook and Yahoo. One wellknown limitation of HDFS is that its metadata has been stored inmemory on a single node, called the NameNode. To overcome NameNode’s limitations, a distributed file system concept based on HDFS, called KTHFS, was proposed in which NameNode’s metadata are stored on an inmemory replicated distributed database, MySQL Cluster. In this thesis, we show how to store the metadata of HDFS NameNode in an external distributed database while maintaining strong consistency semantics of HDFS for both filesystem operations and primitive HDFS operations. Our implementation supports MySQL Cluster, to store the metadata, although it only supports a readcommitted transaction isolation model. As a readcommitted isolation model cannot guarantee strong consistency, we needed to carefully design how metadata is read and written in MySQL Cluster to ensure our system preserves HDFS’s consistency model and is both deadlock free and highly performant. We developed a transaction model based on taking metadata snapshotting and the careful ordering of database operations. Our model is general enough to support any database providing at least readcommitted isolation level. We evaluate our model and show how HDFS can scale, while maintaining strong consistency, to terabytes of metadata.
author	Peiro Sajjad, Hooman Hakimzadeh Harirbaf, Mahmoud
spellingShingle	Peiro Sajjad, Hooman Hakimzadeh Harirbaf, Mahmoud Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highly Available Implementation of HDFS
author_facet	Peiro Sajjad, Hooman Hakimzadeh Harirbaf, Mahmoud
author_sort	Peiro Sajjad, Hooman
title	Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highly Available Implementation of HDFS
title_short	Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highly Available Implementation of HDFS
title_full	Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highly Available Implementation of HDFS
title_fullStr	Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highly Available Implementation of HDFS
title_full_unstemmed	Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highly Available Implementation of HDFS
title_sort	maintaining strong consistency semantics in a horizontally scalable and highly available implementation of hdfs
publisher	KTH, Skolan för informations- och kommunikationsteknik (ICT)
publishDate	2013
url	http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-127464
work_keys_str_mv	AT peirosajjadhooman maintainingstrongconsistencysemanticsinahorizontallyscalableandhighlyavailableimplementationofhdfs AT hakimzadehharirbafmahmoud maintainingstrongconsistencysemanticsinahorizontallyscalableandhighlyavailableimplementationofhdfs
_version_	1716596824502960128

Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highly Available Implementation of HDFS

Similar Items