Knowing when you're wrong: Building fast and reliable approximate query processing systems

Modern data analytics applications typically process massive amounts of data on clusters of tens, hundreds, or thousands of machines to support near-real-time decisions.The quantity of data and limitations of disk and memory bandwidth often make it infeasible to deliver answers at interactive speeds...

Full description

Bibliographic Details
Main Authors:	Agarwal, Sameer (Author), Milner, Henry (Author), Kleiner, Ariel (Author), Talwalkar, Ameet (Author), Jordan, Michael (Author), Mozafari, Barzan (Author), Stoica, Ion (Author), Madden, Samuel R. (Contributor)
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor), Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format:	Article
Language:	English
Published:	Association for Computing Machinery (ACM), 2014-09-26T13:32:27Z.
Subjects:	Article
Online Access:	Get fulltext


LEADER	03142 am a22004213u 4500
001	90383
042			\|a dc
100	1	0	\|a Agarwal, Sameer \|e author
100	1	0	\|a Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory \|e contributor
100	1	0	\|a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science \|e contributor
100	1	0	\|a Madden, Samuel R. \|e contributor
700	1	0	\|a Milner, Henry \|e author
700	1	0	\|a Kleiner, Ariel \|e author
700	1	0	\|a Talwalkar, Ameet \|e author
700	1	0	\|a Jordan, Michael \|e author
700	1	0	\|a Mozafari, Barzan \|e author
700	1	0	\|a Stoica, Ion \|e author
700	1	0	\|a Madden, Samuel R. \|e author
245	0	0	\|a Knowing when you're wrong: Building fast and reliable approximate query processing systems
260			\|b Association for Computing Machinery (ACM), \|c 2014-09-26T13:32:27Z.
856			\|z Get fulltext \|u http://hdl.handle.net/1721.1/90383
520			\|a Modern data analytics applications typically process massive amounts of data on clusters of tens, hundreds, or thousands of machines to support near-real-time decisions.The quantity of data and limitations of disk and memory bandwidth often make it infeasible to deliver answers at interactive speeds. However, it has been widely observed that many applications can tolerate some degree of inaccuracy. This is especially true for exploratory queries on data, where users are satisfied with "close-enough" answers if they can come quickly. A popular technique for speeding up queries at the cost of accuracy is to execute each query on a sample of data, rather than the whole dataset. To ensure that the returned result is not too inaccurate, past work on approximate query processing has used statistical techniques to estimate "error bars" on returned results. However, existing work in the sampling-based approximate query processing (S-AQP) community has not validated whether these techniques actually generate accurate error bars for real query workloads. In fact, we find that error bar estimation often fails on real world production workloads. Fortunately, it is possible to quickly and accurately diagnose the failure of error estimation for a query. In this paper, we show that it is possible to implement a query approximation pipeline that produces approximate answers and reliable error bars at interactive speeds.
520			\|a National Science Foundation (U.S.) (CISE Expeditions Award CCF-1139158)
520			\|a Lawrence Berkeley National Laboratory (Award 7076018)
520			\|a United States. Defense Advanced Research Projects Agency (XData Award FA8750-12-2-0331)
520			\|a Amazon.com (Firm)
520			\|a Google (Firm)
520			\|a SAP Corporation
520			\|a Thomas and Stacey Siebel Foundation
520			\|a Apple Computer, Inc.
520			\|a Cisco Systems, Inc.
520			\|a Cloudera, Inc.
520			\|a EMC Corporation
520			\|a Ericsson, Inc.
520			\|a Facebook (Firm)
546			\|a en_US
655	7		\|a Article
773			\|t Proceedings of the 2014 ACM SIGMOD international conference on Management of data (SIGMOD '14)

Knowing when you're wrong: Building fast and reliable approximate query processing systems

Similar Items