Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics

The most successful Machine Learning (ML) systems remain complex black boxes to end-users, and even experts are often unable to understand the rationale behind their decisions. The lack of transparency of such systems can have severe consequences or poor uses of limited valuable resources in medical...

Full description

Bibliographic Details
Main Authors: Jianlong Zhou, Amir H. Gandomi, Fang Chen, Andreas Holzinger
Format: Article
Language:English
Published: MDPI AG 2021-03-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/10/5/593
id doaj-9c4f6fd7ab2941259f05ab5dd3174bb0
record_format Article
spelling doaj-9c4f6fd7ab2941259f05ab5dd3174bb02021-03-05T00:01:06ZengMDPI AGElectronics2079-92922021-03-011059359310.3390/electronics10050593Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and MetricsJianlong Zhou0Amir H. Gandomi1Fang Chen2Andreas Holzinger3Data Science Institute, University of Technology Sydney, Ultimo, NSW 2007, AustraliaData Science Institute, University of Technology Sydney, Ultimo, NSW 2007, AustraliaData Science Institute, University of Technology Sydney, Ultimo, NSW 2007, AustraliaHuman-Centered AI Lab, Institute of Medical Informatics/Statistics, Medical University of Graz, 8036 Graz, AustriaThe most successful Machine Learning (ML) systems remain complex black boxes to end-users, and even experts are often unable to understand the rationale behind their decisions. The lack of transparency of such systems can have severe consequences or poor uses of limited valuable resources in medical diagnosis, financial decision-making, and in other high-stake domains. Therefore, the issue of ML explanation has experienced a surge in interest from the research community to application domains. While numerous explanation methods have been explored, there is a need for evaluations to quantify the quality of explanation methods to determine whether and to what extent the offered explainability achieves the defined objective, and compare available explanation methods and suggest the best explanation from the comparison for a specific task. This survey paper presents a comprehensive overview of methods proposed in the current literature for the evaluation of ML explanations. We identify properties of explainability from the review of definitions of explainability. The identified properties of explainability are used as objectives that evaluation metrics should achieve. The survey found that the quantitative metrics for both model-based and example-based explanations are primarily used to evaluate the parsimony/simplicity of interpretability, while the quantitative metrics for attribution-based explanations are primarily used to evaluate the soundness of fidelity of explainability. The survey also demonstrated that subjective measures, such as trust and confidence, have been embraced as the focal point for the human-centered evaluation of explainable systems. The paper concludes that the evaluation of ML explanations is a multidisciplinary research topic. It is also not possible to define an implementation of evaluation metrics, which can be applied to all explanation methods.https://www.mdpi.com/2079-9292/10/5/593explainable machine learningevaluation of explainabilityapplication-grounded evaluationhuman-grounded evaluationfunctionality-grounded evaluationevaluation metrics
collection DOAJ
language English
format Article
sources DOAJ
author Jianlong Zhou
Amir H. Gandomi
Fang Chen
Andreas Holzinger
spellingShingle Jianlong Zhou
Amir H. Gandomi
Fang Chen
Andreas Holzinger
Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics
Electronics
explainable machine learning
evaluation of explainability
application-grounded evaluation
human-grounded evaluation
functionality-grounded evaluation
evaluation metrics
author_facet Jianlong Zhou
Amir H. Gandomi
Fang Chen
Andreas Holzinger
author_sort Jianlong Zhou
title Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics
title_short Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics
title_full Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics
title_fullStr Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics
title_full_unstemmed Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics
title_sort evaluating the quality of machine learning explanations: a survey on methods and metrics
publisher MDPI AG
series Electronics
issn 2079-9292
publishDate 2021-03-01
description The most successful Machine Learning (ML) systems remain complex black boxes to end-users, and even experts are often unable to understand the rationale behind their decisions. The lack of transparency of such systems can have severe consequences or poor uses of limited valuable resources in medical diagnosis, financial decision-making, and in other high-stake domains. Therefore, the issue of ML explanation has experienced a surge in interest from the research community to application domains. While numerous explanation methods have been explored, there is a need for evaluations to quantify the quality of explanation methods to determine whether and to what extent the offered explainability achieves the defined objective, and compare available explanation methods and suggest the best explanation from the comparison for a specific task. This survey paper presents a comprehensive overview of methods proposed in the current literature for the evaluation of ML explanations. We identify properties of explainability from the review of definitions of explainability. The identified properties of explainability are used as objectives that evaluation metrics should achieve. The survey found that the quantitative metrics for both model-based and example-based explanations are primarily used to evaluate the parsimony/simplicity of interpretability, while the quantitative metrics for attribution-based explanations are primarily used to evaluate the soundness of fidelity of explainability. The survey also demonstrated that subjective measures, such as trust and confidence, have been embraced as the focal point for the human-centered evaluation of explainable systems. The paper concludes that the evaluation of ML explanations is a multidisciplinary research topic. It is also not possible to define an implementation of evaluation metrics, which can be applied to all explanation methods.
topic explainable machine learning
evaluation of explainability
application-grounded evaluation
human-grounded evaluation
functionality-grounded evaluation
evaluation metrics
url https://www.mdpi.com/2079-9292/10/5/593
work_keys_str_mv AT jianlongzhou evaluatingthequalityofmachinelearningexplanationsasurveyonmethodsandmetrics
AT amirhgandomi evaluatingthequalityofmachinelearningexplanationsasurveyonmethodsandmetrics
AT fangchen evaluatingthequalityofmachinelearningexplanationsasurveyonmethodsandmetrics
AT andreasholzinger evaluatingthequalityofmachinelearningexplanationsasurveyonmethodsandmetrics
_version_ 1724231502029389824