|
|
|
|
LEADER |
02645 am a22002413u 4500 |
001 |
121346 |
042 |
|
|
|a dc
|
100 |
1 |
0 |
|a Vartak, Manasi
|e author
|
100 |
1 |
0 |
|a Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
|e contributor
|
100 |
1 |
0 |
|a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
|e contributor
|
700 |
1 |
0 |
|a Trindade, Joana M. F. da
|e author
|
700 |
1 |
0 |
|a Madden, Samuel R
|e author
|
700 |
1 |
0 |
|a Zaharia, Matei A
|e author
|
245 |
0 |
0 |
|a MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis
|
260 |
|
|
|b Association for Computing Machinery (ACM),
|c 2019-06-18T18:10:52Z.
|
856 |
|
|
|z Get fulltext
|u https://hdl.handle.net/1721.1/121346
|
520 |
|
|
|a Model diagnosis is the process of analyzing machine learning (ML) model performance to identify where the model works well and where it doesn't. It is a key part of the modeling process and helps ML developers iteratively improve model accuracy. Often, model diagnosis is performed by analyzing different datasets or inter- mediates associated with the model such as the input data and hidden representations learned by the model (e.g., [ 4 , 24 , 39 ]). The bottleneck in fast model diagnosis is the creation and storage of model intermediates. Storing these intermediates requires tens to hundreds of GB of storage whereas re-running the model for each diagnostic query slows down model diagnosis. To address this bottleneck, we propose a system called MISTIQUE that can work with traditional ML pipelines as well as deep neural networks to efficiently capture, store, and query model intermediates for diag- nosis. For each diagnostic query, MISTIQUE intelligently chooses whether to re-run the model or read a previously stored intermediate. For intermediates that are stored in MISTIQUE , we propose a range of optimizations to reduce storage footprint including quantization, summarization, and data deduplication. We evaluate our techniques on a range of real-world ML models in scikit-learn and Tensorflow. We demonstrate that our optimizations reduce storage by up to 110X for traditional ML pipelines and up to 6X for deep neural networks. Furthermore, by using MISTIQUE , we can speed up diagnostic queries on traditional ML pipelines by up to 390X and 210X on deep neural networks.
|
520 |
|
|
|a Facebook PhD Fellowship
|
520 |
|
|
|a Alfred P. Sloan Foundation. University Centers for Exemplary Mentoring (UCEM) fellowship
|
546 |
|
|
|a en
|
655 |
7 |
|
|a Article
|
773 |
|
|
|t 10.1145/3183713.3196934
|
773 |
|
|
|t SIGMOD'18: 2018 International Conference on Management of Data
|