Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics
The most successful Machine Learning (ML) systems remain complex black boxes to end-users, and even experts are often unable to understand the rationale behind their decisions. The lack of transparency of such systems can have severe consequences or poor uses of limited valuable resources in medical...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-03-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/10/5/593 |
id |
doaj-9c4f6fd7ab2941259f05ab5dd3174bb0 |
---|---|
record_format |
Article |
spelling |
doaj-9c4f6fd7ab2941259f05ab5dd3174bb02021-03-05T00:01:06ZengMDPI AGElectronics2079-92922021-03-011059359310.3390/electronics10050593Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and MetricsJianlong Zhou0Amir H. Gandomi1Fang Chen2Andreas Holzinger3Data Science Institute, University of Technology Sydney, Ultimo, NSW 2007, AustraliaData Science Institute, University of Technology Sydney, Ultimo, NSW 2007, AustraliaData Science Institute, University of Technology Sydney, Ultimo, NSW 2007, AustraliaHuman-Centered AI Lab, Institute of Medical Informatics/Statistics, Medical University of Graz, 8036 Graz, AustriaThe most successful Machine Learning (ML) systems remain complex black boxes to end-users, and even experts are often unable to understand the rationale behind their decisions. The lack of transparency of such systems can have severe consequences or poor uses of limited valuable resources in medical diagnosis, financial decision-making, and in other high-stake domains. Therefore, the issue of ML explanation has experienced a surge in interest from the research community to application domains. While numerous explanation methods have been explored, there is a need for evaluations to quantify the quality of explanation methods to determine whether and to what extent the offered explainability achieves the defined objective, and compare available explanation methods and suggest the best explanation from the comparison for a specific task. This survey paper presents a comprehensive overview of methods proposed in the current literature for the evaluation of ML explanations. We identify properties of explainability from the review of definitions of explainability. The identified properties of explainability are used as objectives that evaluation metrics should achieve. The survey found that the quantitative metrics for both model-based and example-based explanations are primarily used to evaluate the parsimony/simplicity of interpretability, while the quantitative metrics for attribution-based explanations are primarily used to evaluate the soundness of fidelity of explainability. The survey also demonstrated that subjective measures, such as trust and confidence, have been embraced as the focal point for the human-centered evaluation of explainable systems. The paper concludes that the evaluation of ML explanations is a multidisciplinary research topic. It is also not possible to define an implementation of evaluation metrics, which can be applied to all explanation methods.https://www.mdpi.com/2079-9292/10/5/593explainable machine learningevaluation of explainabilityapplication-grounded evaluationhuman-grounded evaluationfunctionality-grounded evaluationevaluation metrics |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jianlong Zhou Amir H. Gandomi Fang Chen Andreas Holzinger |
spellingShingle |
Jianlong Zhou Amir H. Gandomi Fang Chen Andreas Holzinger Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics Electronics explainable machine learning evaluation of explainability application-grounded evaluation human-grounded evaluation functionality-grounded evaluation evaluation metrics |
author_facet |
Jianlong Zhou Amir H. Gandomi Fang Chen Andreas Holzinger |
author_sort |
Jianlong Zhou |
title |
Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics |
title_short |
Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics |
title_full |
Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics |
title_fullStr |
Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics |
title_full_unstemmed |
Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics |
title_sort |
evaluating the quality of machine learning explanations: a survey on methods and metrics |
publisher |
MDPI AG |
series |
Electronics |
issn |
2079-9292 |
publishDate |
2021-03-01 |
description |
The most successful Machine Learning (ML) systems remain complex black boxes to end-users, and even experts are often unable to understand the rationale behind their decisions. The lack of transparency of such systems can have severe consequences or poor uses of limited valuable resources in medical diagnosis, financial decision-making, and in other high-stake domains. Therefore, the issue of ML explanation has experienced a surge in interest from the research community to application domains. While numerous explanation methods have been explored, there is a need for evaluations to quantify the quality of explanation methods to determine whether and to what extent the offered explainability achieves the defined objective, and compare available explanation methods and suggest the best explanation from the comparison for a specific task. This survey paper presents a comprehensive overview of methods proposed in the current literature for the evaluation of ML explanations. We identify properties of explainability from the review of definitions of explainability. The identified properties of explainability are used as objectives that evaluation metrics should achieve. The survey found that the quantitative metrics for both model-based and example-based explanations are primarily used to evaluate the parsimony/simplicity of interpretability, while the quantitative metrics for attribution-based explanations are primarily used to evaluate the soundness of fidelity of explainability. The survey also demonstrated that subjective measures, such as trust and confidence, have been embraced as the focal point for the human-centered evaluation of explainable systems. The paper concludes that the evaluation of ML explanations is a multidisciplinary research topic. It is also not possible to define an implementation of evaluation metrics, which can be applied to all explanation methods. |
topic |
explainable machine learning evaluation of explainability application-grounded evaluation human-grounded evaluation functionality-grounded evaluation evaluation metrics |
url |
https://www.mdpi.com/2079-9292/10/5/593 |
work_keys_str_mv |
AT jianlongzhou evaluatingthequalityofmachinelearningexplanationsasurveyonmethodsandmetrics AT amirhgandomi evaluatingthequalityofmachinelearningexplanationsasurveyonmethodsandmetrics AT fangchen evaluatingthequalityofmachinelearningexplanationsasurveyonmethodsandmetrics AT andreasholzinger evaluatingthequalityofmachinelearningexplanationsasurveyonmethodsandmetrics |
_version_ |
1724231502029389824 |