Set Reconciliation and File Synchronization Using Invertible Bloom Lookup Tables

As more and more data migrate to the cloud, and the same files become accessible from multiple different machines, finding effective ways to ensure data consistency is becoming increasingly important. In this thesis, we cover current methods for efficiently maintaining sets of objects without the u...

Full description

Bibliographic Details
Main Author: Gentili, Marco
Format: Others
Language:en
Published: Harvard University 2015
Subjects:
Online Access:http://nrs.harvard.edu/urn-3:HUL.InstRepos:14398536
id ndltd-harvard.edu-oai-dash.harvard.edu-1-14398536
record_format oai_dc
spelling ndltd-harvard.edu-oai-dash.harvard.edu-1-143985362017-07-27T15:51:33ZSet Reconciliation and File Synchronization Using Invertible Bloom Lookup TablesGentili, MarcoComputer ScienceAs more and more data migrate to the cloud, and the same files become accessible from multiple different machines, finding effective ways to ensure data consistency is becoming increasingly important. In this thesis, we cover current methods for efficiently maintaining sets of objects without the use of logs or other prior context, which is better known as the set reconciliation problem. We also discuss the state of the art for file synchronization, including methods that use set reconciliation techniques as an intermediate step. We explain the design and implementation of a novel file synchronization protocol tailored to minimize transmission complexity and targeted for files with relatively few changes. We also propose an extension of our file synchronization protocol for more general file directory synchronization. We describe IBLTsync, our implementation of the aforementioned file synchronization protocol, and benchmark it against a naïve file transmission protocol and rsync, a popular file synchronization library. We find that for files with relatively few changes, IBLTsync transmits significantly less data than the naïve protocol, and moderately less data than rsync. In addition, we provide the first (to our knowledge) implementation of multi-party set reconciliation using Invertible Bloom Lookup Tables, a hash based data structure, and evaluate its performance for message propagation in large networks.2015-04-09T13:56:00Z2015-052015-04-0820152015-04-09T13:56:00ZThesis or Dissertationtextapplication/pdfGentili, Marco. 2015. Set Reconciliation and File Synchronization Using Invertible Bloom Lookup Tables. Bachelor's thesis, Harvard College.http://nrs.harvard.edu/urn-3:HUL.InstRepos:14398536enopenhttp://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAAHarvard University
collection NDLTD
language en
format Others
sources NDLTD
topic Computer Science
spellingShingle Computer Science
Gentili, Marco
Set Reconciliation and File Synchronization Using Invertible Bloom Lookup Tables
description As more and more data migrate to the cloud, and the same files become accessible from multiple different machines, finding effective ways to ensure data consistency is becoming increasingly important. In this thesis, we cover current methods for efficiently maintaining sets of objects without the use of logs or other prior context, which is better known as the set reconciliation problem. We also discuss the state of the art for file synchronization, including methods that use set reconciliation techniques as an intermediate step. We explain the design and implementation of a novel file synchronization protocol tailored to minimize transmission complexity and targeted for files with relatively few changes. We also propose an extension of our file synchronization protocol for more general file directory synchronization. We describe IBLTsync, our implementation of the aforementioned file synchronization protocol, and benchmark it against a naïve file transmission protocol and rsync, a popular file synchronization library. We find that for files with relatively few changes, IBLTsync transmits significantly less data than the naïve protocol, and moderately less data than rsync. In addition, we provide the first (to our knowledge) implementation of multi-party set reconciliation using Invertible Bloom Lookup Tables, a hash based data structure, and evaluate its performance for message propagation in large networks.
author Gentili, Marco
author_facet Gentili, Marco
author_sort Gentili, Marco
title Set Reconciliation and File Synchronization Using Invertible Bloom Lookup Tables
title_short Set Reconciliation and File Synchronization Using Invertible Bloom Lookup Tables
title_full Set Reconciliation and File Synchronization Using Invertible Bloom Lookup Tables
title_fullStr Set Reconciliation and File Synchronization Using Invertible Bloom Lookup Tables
title_full_unstemmed Set Reconciliation and File Synchronization Using Invertible Bloom Lookup Tables
title_sort set reconciliation and file synchronization using invertible bloom lookup tables
publisher Harvard University
publishDate 2015
url http://nrs.harvard.edu/urn-3:HUL.InstRepos:14398536
work_keys_str_mv AT gentilimarco setreconciliationandfilesynchronizationusinginvertiblebloomlookuptables
_version_ 1718507021704101888