BlendDB : blending table layouts to support efficient browsing of relational databases

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008. === Includes bibliographical references (p. 63-65). === The physical implementation of most relational databases follows their logical description, where each relation is stored in its o...

Full description

Bibliographic Details
Main Author: Marcus, Adam, Ph. D. Massachusetts Institute of Technology
Other Authors: Samuel R. Madden and David R. Karger.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2009
Subjects:
Online Access:http://hdl.handle.net/1721.1/45890
Description
Summary:Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008. === Includes bibliographical references (p. 63-65). === The physical implementation of most relational databases follows their logical description, where each relation is stored in its own file or collection of files on disk. Such an implementation is good for queries that filter or aggregate large portions of a single table, and provides reasonable performance for queries that join many records from one table to another. It is much less ideal, however, for join queries that follow paths from a small number of tuples in one table to small collections of tuples in other tables to accumulate facts about a related collection of objects (e.g., co-authors of a particular author in a publications database), since answering such queries involves one or more random I/Os per table involved in the path. If the primary workload of a database consists of many such path queries, as is likely to be the case when supporting browsing-oriented applications, performance will be quite poor. This thesis focuses on optimizing the performance of these kinds of path queries in a system called BlendDB, a relational database that supports on-disk co-location of tuples from different relations. To make BlendDB efficient, the thesis will propose a clustering algorithm that, given knowledge of the database workload, co-locates the tuples of multiple relations if they join along common paths. To support the claim of improved performance, the thesis will include experiments in which BlendDB provides better performance than traditional relational databases on queries against the IMDB movie dataset. Additionally, this thesis will show that BlendDB provides commensurate performance to materialized views while using less disk space, and can achieve better performance than materialized views in exchange for more disk space when users navigate between related items in the database. === by Adam Marcus. === S.M.