MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis

Model diagnosis is the process of analyzing machine learning (ML) model performance to identify where the model works well and where it doesn't. It is a key part of the modeling process and helps ML developers iteratively improve model accuracy. Often, model diagnosis is performed by analyzing...

Full description

Bibliographic Details
Main Authors:	Vartak, Manasi (Author), Trindade, Joana M. F. da (Author), Madden, Samuel R (Author), Zaharia, Matei A (Author)
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor), Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format:	Article
Language:	English
Published:	Association for Computing Machinery (ACM), 2019-06-18T18:10:52Z.
Subjects:	Article
Online Access:	Get fulltext


LEADER	02645 am a22002413u 4500
001	121346
042			\|a dc
100	1	0	\|a Vartak, Manasi \|e author
100	1	0	\|a Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory \|e contributor
100	1	0	\|a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science \|e contributor
700	1	0	\|a Trindade, Joana M. F. da \|e author
700	1	0	\|a Madden, Samuel R \|e author
700	1	0	\|a Zaharia, Matei A \|e author
245	0	0	\|a MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis
260			\|b Association for Computing Machinery (ACM), \|c 2019-06-18T18:10:52Z.
856			\|z Get fulltext \|u https://hdl.handle.net/1721.1/121346
520			\|a Model diagnosis is the process of analyzing machine learning (ML) model performance to identify where the model works well and where it doesn't. It is a key part of the modeling process and helps ML developers iteratively improve model accuracy. Often, model diagnosis is performed by analyzing different datasets or inter- mediates associated with the model such as the input data and hidden representations learned by the model (e.g., [ 4 , 24 , 39 ]). The bottleneck in fast model diagnosis is the creation and storage of model intermediates. Storing these intermediates requires tens to hundreds of GB of storage whereas re-running the model for each diagnostic query slows down model diagnosis. To address this bottleneck, we propose a system called MISTIQUE that can work with traditional ML pipelines as well as deep neural networks to efficiently capture, store, and query model intermediates for diag- nosis. For each diagnostic query, MISTIQUE intelligently chooses whether to re-run the model or read a previously stored intermediate. For intermediates that are stored in MISTIQUE , we propose a range of optimizations to reduce storage footprint including quantization, summarization, and data deduplication. We evaluate our techniques on a range of real-world ML models in scikit-learn and Tensorflow. We demonstrate that our optimizations reduce storage by up to 110X for traditional ML pipelines and up to 6X for deep neural networks. Furthermore, by using MISTIQUE , we can speed up diagnostic queries on traditional ML pipelines by up to 390X and 210X on deep neural networks.
520			\|a Facebook PhD Fellowship
520			\|a Alfred P. Sloan Foundation. University Centers for Exemplary Mentoring (UCEM) fellowship
546			\|a en
655	7		\|a Article
773			\|t 10.1145/3183713.3196934
773			\|t SIGMOD'18: 2018 International Conference on Management of Data

MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis

Similar Items