MacroBase: Prioritizing Attention in Fast Data

As data volumes continue to rise, manual inspection is becoming increasingly untenable. In response, we present MacroBase, a data analytics engine that prioritizes end-user attention in high-volume fast data streams. MacroBase enables efficient, accurate, and modular analyses that highlight and aggr...

Full description

Bibliographic Details
Main Authors: Bailis, Peter (Author), Gan, Edward (Author), Madden, Samuel (Author), Narayanan, Deepak (Author), Rong, Kexin (Author), Suri, Sahaana (Author)
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor)
Format: Article
Language:English
Published: Association for Computing Machinery (ACM), 2021-11-08T20:13:09Z.
Subjects:
Online Access:Get fulltext
Description
Summary:As data volumes continue to rise, manual inspection is becoming increasingly untenable. In response, we present MacroBase, a data analytics engine that prioritizes end-user attention in high-volume fast data streams. MacroBase enables efficient, accurate, and modular analyses that highlight and aggregate important and unusual behavior, acting as a search engine for fast data. MacroBase is able to deliver order-of-magnitude speedups over alternatives by optimizing the combination of explanation and classification tasks and by leveraging a new reservoir sampler and heavy-hitters sketch specialized for fast data streams. As a result, MacroBase delivers accurate results at speeds of up to 2M events per second per query on a single core. The system has delivered meaningful results in production, including at a telematics company monitoring hundreds of thousands of vehicles,