Big data analytics made affordable using hardware-accelerated flash storage

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 175-192). === Vast amount of data is continuously being collected from sources including so...

Full description

Bibliographic Details
Main Author: Jun, Sang-Woo
Other Authors: Arvind.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2018
Subjects:
Online Access:http://hdl.handle.net/1721.1/118088
id ndltd-MIT-oai-dspace.mit.edu-1721.1-118088
record_format oai_dc
spelling ndltd-MIT-oai-dspace.mit.edu-1721.1-1180882019-05-02T15:37:37Z Big data analytics made affordable using hardware-accelerated flash storage Jun, Sang-Woo Arvind. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018. Cataloged from PDF version of thesis. Includes bibliographical references (pages 175-192). Vast amount of data is continuously being collected from sources including social networks, web pages, and sensor networks, and their economic value is dependent on our ability to analyze them in a timely and affordable manner. High performance analytics have traditionally required a machine or a cluster of machines with enough DRAM to accommodate the entire working set, due to their need for random accesses. However, datasets of interest are now regularly exceeding terabytes in size, and the cost of purchasing and operating a cluster with hundreds of machines is becoming a significant overhead. Furthermore, the performance of many random-access-intensive applications plummets even when a fraction of data does not fit in memory. On the other hand, such datasets could be stored easily in the flash-based secondary storage of a rack-scale cluster, or even a single machine for a fraction of capital and operating costs. While flash storage has much better performance compared to hard disks, there are many hurdles to overcome in order to reach the performance of DRAM-based clusters. This thesis presents a new system architecture as well as operational methods that enable flash-based systems to achieve performance comparable to much costlier DRAM-based clusters for many important applications. We describe a highly customizable architecture called BlueDBM, which includes flash storage devices augmented with in-storage hardware accelerators, networked using a separate storage-area network. Using a prototype BlueDBM cluster with custom-designed accelerated storage devices, as well as novel accelerator designs and storage management algorithms, we have demonstrated high performance at low cost for applications including graph analytics, sorting, and database operations. We believe this approach to handling Big Data analytics is an attractive solution to the cost-performance issue of Big Data analytics. by Sang-Woo Jun. Ph. D. 2018-09-17T15:57:00Z 2018-09-17T15:57:00Z 2018 2018 Thesis http://hdl.handle.net/1721.1/118088 1052124029 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 192 pages application/pdf Massachusetts Institute of Technology
collection NDLTD
language English
format Others
sources NDLTD
topic Electrical Engineering and Computer Science.
spellingShingle Electrical Engineering and Computer Science.
Jun, Sang-Woo
Big data analytics made affordable using hardware-accelerated flash storage
description Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 175-192). === Vast amount of data is continuously being collected from sources including social networks, web pages, and sensor networks, and their economic value is dependent on our ability to analyze them in a timely and affordable manner. High performance analytics have traditionally required a machine or a cluster of machines with enough DRAM to accommodate the entire working set, due to their need for random accesses. However, datasets of interest are now regularly exceeding terabytes in size, and the cost of purchasing and operating a cluster with hundreds of machines is becoming a significant overhead. Furthermore, the performance of many random-access-intensive applications plummets even when a fraction of data does not fit in memory. On the other hand, such datasets could be stored easily in the flash-based secondary storage of a rack-scale cluster, or even a single machine for a fraction of capital and operating costs. While flash storage has much better performance compared to hard disks, there are many hurdles to overcome in order to reach the performance of DRAM-based clusters. This thesis presents a new system architecture as well as operational methods that enable flash-based systems to achieve performance comparable to much costlier DRAM-based clusters for many important applications. We describe a highly customizable architecture called BlueDBM, which includes flash storage devices augmented with in-storage hardware accelerators, networked using a separate storage-area network. Using a prototype BlueDBM cluster with custom-designed accelerated storage devices, as well as novel accelerator designs and storage management algorithms, we have demonstrated high performance at low cost for applications including graph analytics, sorting, and database operations. We believe this approach to handling Big Data analytics is an attractive solution to the cost-performance issue of Big Data analytics. === by Sang-Woo Jun. === Ph. D.
author2 Arvind.
author_facet Arvind.
Jun, Sang-Woo
author Jun, Sang-Woo
author_sort Jun, Sang-Woo
title Big data analytics made affordable using hardware-accelerated flash storage
title_short Big data analytics made affordable using hardware-accelerated flash storage
title_full Big data analytics made affordable using hardware-accelerated flash storage
title_fullStr Big data analytics made affordable using hardware-accelerated flash storage
title_full_unstemmed Big data analytics made affordable using hardware-accelerated flash storage
title_sort big data analytics made affordable using hardware-accelerated flash storage
publisher Massachusetts Institute of Technology
publishDate 2018
url http://hdl.handle.net/1721.1/118088
work_keys_str_mv AT junsangwoo bigdataanalyticsmadeaffordableusinghardwareacceleratedflashstorage
_version_ 1719024711947517952