Evaluating discrete choice prediction models when the evaluation data is corrupted: analytic results and bias corrections for the area under the ROC

There has been a growing recognition that issues of data quality, which are routine in practice, can materially affect the assessment of learned model performance. In this paper, we develop some analytic results that are useful in sizing the biases associated with tests of discriminatory model power...

Full description

Bibliographic Details
Main Author: Stein, Roger Mark (Contributor)
Other Authors: Sloan School of Management (Contributor)
Format: Article
Language:English
Published: Springer US, 2017-02-16T20:43:00Z.
Subjects:
Online Access:Get fulltext
LEADER 01802 am a22001693u 4500
001 106979
042 |a dc 
100 1 0 |a Stein, Roger Mark  |e author 
100 1 0 |a Sloan School of Management  |e contributor 
100 1 0 |a Stein, Roger Mark  |e contributor 
245 0 0 |a Evaluating discrete choice prediction models when the evaluation data is corrupted: analytic results and bias corrections for the area under the ROC 
260 |b Springer US,   |c 2017-02-16T20:43:00Z. 
856 |z Get fulltext  |u http://hdl.handle.net/1721.1/106979 
520 |a There has been a growing recognition that issues of data quality, which are routine in practice, can materially affect the assessment of learned model performance. In this paper, we develop some analytic results that are useful in sizing the biases associated with tests of discriminatory model power when these are performed using corrupt ("noisy") data. As it is sometimes unavoidable to test models with data that are known to be corrupt, we also provide some guidance on interpreting results of such tests. In some cases, with appropriate knowledge of the corruption mechanism, the true values of the performance statistics such as the area under the ROC curve may be recovered (in expectation), even when the underlying data have been corrupted. We also provide estimators of the standard errors of such recovered performance statistics. An analysis of the estimators reveals interesting behavior including the observation that "noisy" data does not "cancel out" across models even when the same corrupt data set is used to test multiple candidate models. Because our results are analytic, they may be applied in a broad range of settings and this can be done without the need for simulation. 
546 |a en 
655 7 |a Article 
773 |t Data Mining and Knowledge Discovery