Summary: | Increased scaling costs and lack of desired features is leading to the evolution of high-performance storage systems from centralized architectures and specialized hardware to
decentralized, commodity storage clusters. Existing systems try to address storage cost and management issues at the filesystem level. Besides dictating the use of a specific filesystem, however, this approach leads to increased complexity and load imbalance towards the file-server side,
which in turn increase costs to scale.
In this thesis, we examine these problems at the block-level. This approach has several advantages, such as transparency, cost-efficiency, better resource utilization,
simplicity and easier management.
First of all, we explore the mechanisms, the merits, and the overheads associated with advanced metadata-intensive functionality at the block level, by providing versioning at
the block level. We find that block-level versioning has low overhead and offers transparency and simplicity advantages over filesystem-based approaches.
Secondly, we study the problem of providing extensibility required by diverse and changing application needs that may
use a single storage system. We provide support for (i)adding desired functions as block-level extensions, and (ii)flexibly combining them to create modular I/O
hierarchies. In this direction, we design, implement and evaluate an extensible block-level storage virtualization framework, Violin, with support for metadata-intensive
functions. Extending Violin we build Orchestra, an extensible framework for cluster storage virtualization and scalable storage sharing at the block-level. We show that Orchestra's enhanced block interface can substantially simplify the design of higher-level storage services, such
as cluster filesystems, while being scalable.
Finally, we consider the problem of consistency and availability in decentralized commodity clusters. We propose
RIBD, a novel storage system that provides support for handling both data and metadata consistency issues at the block layer. RIBD uses the notion of consistency intervals
(CIs) to provide fine-grain consistency semantics on sequences of block level operations by means of a lightweight transactional mechanism. RIBD relies on
Orchestra's virtualization mechanisms and uses a roll-back recovery mechanism based on low-overhead block-level versioning. We evaluate RIBD on a cluster of 24 nodes, and
find that it performs comparably to two popular cluster filesystems, PVFS and GFS, while offering stronger consistency guarantees.
|