BlobSeer: Towards efficient data storage management for large-scale, distributed systems

With data volumes increasing at a high rate and the emergence of highly scalable infrastructures (cloud computing, petascale computing), distributed management of data becomes a crucial issue that faces many challenges. This thesis brings several contributions in order to address such challenges. Fi...

Full description

Bibliographic Details
Main Author:	Nicolae, Bogdan
Language:	ENG
Published:	Université Rennes 1 2010
Subjects:	[INFO] Computer Science large scale data storage cloud storage versioning decentralized metadata management high throughput heavy access concurrency
Online Access:	http://tel.archives-ouvertes.fr/tel-00552271 http://tel.archives-ouvertes.fr/docs/00/55/22/71/PDF/thesis.pdf

id	ndltd-CCSD-oai-tel.archives-ouvertes.fr-tel-00552271
record_format	oai_dc
spelling	ndltd-CCSD-oai-tel.archives-ouvertes.fr-tel-005522712013-01-07T17:46:02Z http://tel.archives-ouvertes.fr/tel-00552271 http://tel.archives-ouvertes.fr/docs/00/55/22/71/PDF/thesis.pdf BlobSeer: Towards efficient data storage management for large-scale, distributed systems Nicolae, Bogdan [INFO] Computer Science large scale data storage cloud storage versioning decentralized metadata management high throughput heavy access concurrency With data volumes increasing at a high rate and the emergence of highly scalable infrastructures (cloud computing, petascale computing), distributed management of data becomes a crucial issue that faces many challenges. This thesis brings several contributions in order to address such challenges. First, it proposes a set of principles for designing highly scalable distributed storage systems that are optimized for heavy data access concurrency. In particular, it highlights the potentially large benefits of using versioning in this context. Second, based on these principles, it introduces a series of distributed data and metadata management algorithms that enable a high throughput under concurrency. Third, it shows how to efficiently implement these algorithms in practice, dealing with key issues such as high-performance parallel transfers, efficient maintainance of distributed data structures, fault tolerance, etc. These results are used to build BlobSeer, an experimental prototype that is used to demonstrate both the theoretical benefits of the approach in synthetic benchmarks, as well as the practical benefits in real-life, applicative scenarios: as a storage backend for MapReduce applications, as a storage backend for deployment and snapshotting of virtual machine images in clouds, as a quality-of-service enabled data storage service for cloud applications. Extensive experimentations on the Grid'5000 testbed show that BlobSeer remains scalable and sustains a high throughput even under heavy access concurrency, outperforming by a large margin several state-of-art approaches. 2010-11-30 ENG PhD thesis Université Rennes 1
collection	NDLTD
language	ENG
sources	NDLTD
topic	[INFO] Computer Science large scale data storage cloud storage versioning decentralized metadata management high throughput heavy access concurrency
spellingShingle	[INFO] Computer Science large scale data storage cloud storage versioning decentralized metadata management high throughput heavy access concurrency Nicolae, Bogdan BlobSeer: Towards efficient data storage management for large-scale, distributed systems
description	With data volumes increasing at a high rate and the emergence of highly scalable infrastructures (cloud computing, petascale computing), distributed management of data becomes a crucial issue that faces many challenges. This thesis brings several contributions in order to address such challenges. First, it proposes a set of principles for designing highly scalable distributed storage systems that are optimized for heavy data access concurrency. In particular, it highlights the potentially large benefits of using versioning in this context. Second, based on these principles, it introduces a series of distributed data and metadata management algorithms that enable a high throughput under concurrency. Third, it shows how to efficiently implement these algorithms in practice, dealing with key issues such as high-performance parallel transfers, efficient maintainance of distributed data structures, fault tolerance, etc. These results are used to build BlobSeer, an experimental prototype that is used to demonstrate both the theoretical benefits of the approach in synthetic benchmarks, as well as the practical benefits in real-life, applicative scenarios: as a storage backend for MapReduce applications, as a storage backend for deployment and snapshotting of virtual machine images in clouds, as a quality-of-service enabled data storage service for cloud applications. Extensive experimentations on the Grid'5000 testbed show that BlobSeer remains scalable and sustains a high throughput even under heavy access concurrency, outperforming by a large margin several state-of-art approaches.
author	Nicolae, Bogdan
author_facet	Nicolae, Bogdan
author_sort	Nicolae, Bogdan
title	BlobSeer: Towards efficient data storage management for large-scale, distributed systems
title_short	BlobSeer: Towards efficient data storage management for large-scale, distributed systems
title_full	BlobSeer: Towards efficient data storage management for large-scale, distributed systems
title_fullStr	BlobSeer: Towards efficient data storage management for large-scale, distributed systems
title_full_unstemmed	BlobSeer: Towards efficient data storage management for large-scale, distributed systems
title_sort	blobseer: towards efficient data storage management for large-scale, distributed systems
publisher	Université Rennes 1
publishDate	2010
url	http://tel.archives-ouvertes.fr/tel-00552271 http://tel.archives-ouvertes.fr/docs/00/55/22/71/PDF/thesis.pdf
work_keys_str_mv	AT nicolaebogdan blobseertowardsefficientdatastoragemanagementforlargescaledistributedsystems
_version_	1716396844545736704

BlobSeer: Towards efficient data storage management for large-scale, distributed systems

Similar Items