Storing and managing data in a distributed hash table
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008. === Includes bibliographical references (p. 83-90). === Distributed hash tables (DHTs) have been proposed as a generic, robust storage infrastructure for simplifying the construction o...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | English |
Published: |
Massachusetts Institute of Technology
2009
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/44711 |
id |
ndltd-MIT-oai-dspace.mit.edu-1721.1-44711 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-MIT-oai-dspace.mit.edu-1721.1-447112019-05-02T16:16:19Z Storing and managing data in a distributed hash table Sit, Emil, 1977- M. Frans Kaashoek and Robert T. Morris. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008. Includes bibliographical references (p. 83-90). Distributed hash tables (DHTs) have been proposed as a generic, robust storage infrastructure for simplifying the construction of large-scale, wide-area applications. For example, UsenetDHT is a new design for Usenet News developed in this thesis that uses a DHT to cooperatively deliver Usenet articles: the DHT allows a set of N hosts to share storage of Usenet articles, reducing their combined storage requirements by a factor of O(N). Usenet generates a continuous stream of writes that exceeds 1 Tbyte/day in volume, comprising over ten million writes. Supporting this and the associated read workload requires a DHT engineered for durability and efficiency. Recovering from network and machine failures efficiently poses a challenge for DHT replication maintenance algorithms that provide durability. To avoid losing the last replica, replica maintenance must create additional replicas when failures are detected. However, creating replicas after every failure stresses network and storage resources unnecessarily. Tracking the location of every replica of every object would allow a replica maintenance algorithm to create replicas only when necessary, but when storing terabytes of data, such tracking is difficult to perform accurately and efficiently. This thesis describes a new algorithm, Passing Tone, that maintains durability efficiently, in a completely decentralized manner, despite transient and permanent failures. Passing Tone nodes make replication decisions with just basic DHT routing state, without maintaining state about the number or location of extant replicas and without responding to every transient failure with a new replica. Passing Tone is implemented in a revised version of DHash, optimized for both disk and network performance. (cont.) A sample 12 node deployment of Passing Tone and UsenetDHT supports a partial Usenet feed of 2.5 Mbyte/s (processing over 80 Tbyte of data per year), while providing 30 Mbyte/s of read throughput, limited currently by disk seeks. This deployment is the first public DHT to store terabytes of data. These results indicate that DHT-based designs can successfully simplify the construction of large-scale, wide-area systems. by Emil Sit. Ph.D. 2009-03-16T19:32:47Z 2009-03-16T19:32:47Z 2008 2008 Thesis http://hdl.handle.net/1721.1/44711 297432535 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 90 p. application/pdf Massachusetts Institute of Technology |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Electrical Engineering and Computer Science. |
spellingShingle |
Electrical Engineering and Computer Science. Sit, Emil, 1977- Storing and managing data in a distributed hash table |
description |
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008. === Includes bibliographical references (p. 83-90). === Distributed hash tables (DHTs) have been proposed as a generic, robust storage infrastructure for simplifying the construction of large-scale, wide-area applications. For example, UsenetDHT is a new design for Usenet News developed in this thesis that uses a DHT to cooperatively deliver Usenet articles: the DHT allows a set of N hosts to share storage of Usenet articles, reducing their combined storage requirements by a factor of O(N). Usenet generates a continuous stream of writes that exceeds 1 Tbyte/day in volume, comprising over ten million writes. Supporting this and the associated read workload requires a DHT engineered for durability and efficiency. Recovering from network and machine failures efficiently poses a challenge for DHT replication maintenance algorithms that provide durability. To avoid losing the last replica, replica maintenance must create additional replicas when failures are detected. However, creating replicas after every failure stresses network and storage resources unnecessarily. Tracking the location of every replica of every object would allow a replica maintenance algorithm to create replicas only when necessary, but when storing terabytes of data, such tracking is difficult to perform accurately and efficiently. This thesis describes a new algorithm, Passing Tone, that maintains durability efficiently, in a completely decentralized manner, despite transient and permanent failures. Passing Tone nodes make replication decisions with just basic DHT routing state, without maintaining state about the number or location of extant replicas and without responding to every transient failure with a new replica. Passing Tone is implemented in a revised version of DHash, optimized for both disk and network performance. === (cont.) A sample 12 node deployment of Passing Tone and UsenetDHT supports a partial Usenet feed of 2.5 Mbyte/s (processing over 80 Tbyte of data per year), while providing 30 Mbyte/s of read throughput, limited currently by disk seeks. This deployment is the first public DHT to store terabytes of data. These results indicate that DHT-based designs can successfully simplify the construction of large-scale, wide-area systems. === by Emil Sit. === Ph.D. |
author2 |
M. Frans Kaashoek and Robert T. Morris. |
author_facet |
M. Frans Kaashoek and Robert T. Morris. Sit, Emil, 1977- |
author |
Sit, Emil, 1977- |
author_sort |
Sit, Emil, 1977- |
title |
Storing and managing data in a distributed hash table |
title_short |
Storing and managing data in a distributed hash table |
title_full |
Storing and managing data in a distributed hash table |
title_fullStr |
Storing and managing data in a distributed hash table |
title_full_unstemmed |
Storing and managing data in a distributed hash table |
title_sort |
storing and managing data in a distributed hash table |
publisher |
Massachusetts Institute of Technology |
publishDate |
2009 |
url |
http://hdl.handle.net/1721.1/44711 |
work_keys_str_mv |
AT sitemil1977 storingandmanagingdatainadistributedhashtable |
_version_ |
1719037414856458240 |