BlueDBM: An Appliance for Big Data Analytics

Complex data queries, because of their need for random accesses, have proven to be slow unless all the data can be accommodated in DRAM. There are many domains, such as genomics, geological data and daily twitter feeds where the datasets of interest are 5TB to 20 TB. For such a dataset, one would ne...

Full description

Bibliographic Details
Main Authors:	Jun, SangWoo (Contributor), Liu, Ming Gang (Contributor), Lee, Sungjin (Contributor), Hicks, Jamey (Author), Ankcorn, John (Author), King, Myron Decker (Author), Xu, Shuotao (Contributor), Arvind, Arvind (Contributor)
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor), Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format:	Article
Language:	English
Published:	Association for Computing Machinery (ACM), 2015-07-16T12:19:12Z.
Subjects:	Article
Online Access:	Get fulltext


LEADER	02583 am a22003613u 4500
001	97746
042			\|a dc
100	1	0	\|a Jun, SangWoo \|e author
100	1	0	\|a Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory \|e contributor
100	1	0	\|a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science \|e contributor
100	1	0	\|a Jun, SangWoo \|e contributor
100	1	0	\|a Liu, Ming Gang \|e contributor
100	1	0	\|a Lee, Sungjin \|e contributor
100	1	0	\|a Xu, Shuotao \|e contributor
100	1	0	\|a Arvind, Arvind \|e contributor
700	1	0	\|a Liu, Ming Gang \|e author
700	1	0	\|a Lee, Sungjin \|e author
700	1	0	\|a Hicks, Jamey \|e author
700	1	0	\|a Ankcorn, John \|e author
700	1	0	\|a King, Myron Decker \|e author
700	1	0	\|a Xu, Shuotao \|e author
700	1	0	\|a Arvind, Arvind \|e author
245	0	0	\|a BlueDBM: An Appliance for Big Data Analytics
260			\|b Association for Computing Machinery (ACM), \|c 2015-07-16T12:19:12Z.
856			\|z Get fulltext \|u http://hdl.handle.net/1721.1/97746
520			\|a Complex data queries, because of their need for random accesses, have proven to be slow unless all the data can be accommodated in DRAM. There are many domains, such as genomics, geological data and daily twitter feeds where the datasets of interest are 5TB to 20 TB. For such a dataset, one would need a cluster with 100 servers, each with 128GB to 256GBs of DRAM, to accommodate all the data in DRAM. On the other hand, such datasets could be stored easily in the flash memory of a rack-sized cluster. Flash storage has much better random access performance than hard disks, which makes it desirable for analytics workloads. In this paper we present BlueDBM, a new system architecture which has flash-based storage with in-store processing capability and a low-latency high-throughput inter-controller network. We show that BlueDBM outperforms a flash-based system without these features by a factor of 10 for some important applications. While the performance of a ram-cloud system falls sharply even if only 5%~10% of the references are to the secondary storage, this sharp performance degradation is not an issue in BlueDBM. BlueDBM presents an attractive point in the cost-performance trade-off for Big Data analytics.
520			\|a Quanta Computer (Firm)
520			\|a Samsung (Firm)
520			\|a Lincoln Laboratory (PO7000261350)
520			\|a Intel Corporation
546			\|a en_US
655	7		\|a Article
773			\|t Proceedings of the 42nd International Symposium on Computer Architecture (ISCA 2015)

BlueDBM: An Appliance for Big Data Analytics

Similar Items