Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics

The most successful Machine Learning (ML) systems remain complex black boxes to end-users, and even experts are often unable to understand the rationale behind their decisions. The lack of transparency of such systems can have severe consequences or poor uses of limited valuable resources in medical...

Full description

Bibliographic Details
Main Authors:	Jianlong Zhou, Amir H. Gandomi, Fang Chen, Andreas Holzinger
Format:	Article
Language:	English
Published:	MDPI AG 2021-03-01
Series:	Electronics
Subjects:	explainable machine learning evaluation of explainability application-grounded evaluation human-grounded evaluation functionality-grounded evaluation evaluation metrics
Online Access:	https://www.mdpi.com/2079-9292/10/5/593

id	doaj-9c4f6fd7ab2941259f05ab5dd3174bb0
record_format	Article
spelling	doaj-9c4f6fd7ab2941259f05ab5dd3174bb02021-03-05T00:01:06ZengMDPI AGElectronics2079-92922021-03-011059359310.3390/electronics10050593Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and MetricsJianlong Zhou0Amir H. Gandomi1Fang Chen2Andreas Holzinger3Data Science Institute, University of Technology Sydney, Ultimo, NSW 2007, AustraliaData Science Institute, University of Technology Sydney, Ultimo, NSW 2007, AustraliaData Science Institute, University of Technology Sydney, Ultimo, NSW 2007, AustraliaHuman-Centered AI Lab, Institute of Medical Informatics/Statistics, Medical University of Graz, 8036 Graz, AustriaThe most successful Machine Learning (ML) systems remain complex black boxes to end-users, and even experts are often unable to understand the rationale behind their decisions. The lack of transparency of such systems can have severe consequences or poor uses of limited valuable resources in medical diagnosis, financial decision-making, and in other high-stake domains. Therefore, the issue of ML explanation has experienced a surge in interest from the research community to application domains. While numerous explanation methods have been explored, there is a need for evaluations to quantify the quality of explanation methods to determine whether and to what extent the offered explainability achieves the defined objective, and compare available explanation methods and suggest the best explanation from the comparison for a specific task. This survey paper presents a comprehensive overview of methods proposed in the current literature for the evaluation of ML explanations. We identify properties of explainability from the review of definitions of explainability. The identified properties of explainability are used as objectives that evaluation metrics should achieve. The survey found that the quantitative metrics for both model-based and example-based explanations are primarily used to evaluate the parsimony/simplicity of interpretability, while the quantitative metrics for attribution-based explanations are primarily used to evaluate the soundness of fidelity of explainability. The survey also demonstrated that subjective measures, such as trust and confidence, have been embraced as the focal point for the human-centered evaluation of explainable systems. The paper concludes that the evaluation of ML explanations is a multidisciplinary research topic. It is also not possible to define an implementation of evaluation metrics, which can be applied to all explanation methods.https://www.mdpi.com/2079-9292/10/5/593explainable machine learningevaluation of explainabilityapplication-grounded evaluationhuman-grounded evaluationfunctionality-grounded evaluationevaluation metrics
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Jianlong Zhou Amir H. Gandomi Fang Chen Andreas Holzinger
spellingShingle	Jianlong Zhou Amir H. Gandomi Fang Chen Andreas Holzinger Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics Electronics explainable machine learning evaluation of explainability application-grounded evaluation human-grounded evaluation functionality-grounded evaluation evaluation metrics
author_facet	Jianlong Zhou Amir H. Gandomi Fang Chen Andreas Holzinger
author_sort	Jianlong Zhou
title	Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics
title_short	Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics
title_full	Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics
title_fullStr	Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics
title_full_unstemmed	Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics
title_sort	evaluating the quality of machine learning explanations: a survey on methods and metrics
publisher	MDPI AG
series	Electronics
issn	2079-9292
publishDate	2021-03-01
description	The most successful Machine Learning (ML) systems remain complex black boxes to end-users, and even experts are often unable to understand the rationale behind their decisions. The lack of transparency of such systems can have severe consequences or poor uses of limited valuable resources in medical diagnosis, financial decision-making, and in other high-stake domains. Therefore, the issue of ML explanation has experienced a surge in interest from the research community to application domains. While numerous explanation methods have been explored, there is a need for evaluations to quantify the quality of explanation methods to determine whether and to what extent the offered explainability achieves the defined objective, and compare available explanation methods and suggest the best explanation from the comparison for a specific task. This survey paper presents a comprehensive overview of methods proposed in the current literature for the evaluation of ML explanations. We identify properties of explainability from the review of definitions of explainability. The identified properties of explainability are used as objectives that evaluation metrics should achieve. The survey found that the quantitative metrics for both model-based and example-based explanations are primarily used to evaluate the parsimony/simplicity of interpretability, while the quantitative metrics for attribution-based explanations are primarily used to evaluate the soundness of fidelity of explainability. The survey also demonstrated that subjective measures, such as trust and confidence, have been embraced as the focal point for the human-centered evaluation of explainable systems. The paper concludes that the evaluation of ML explanations is a multidisciplinary research topic. It is also not possible to define an implementation of evaluation metrics, which can be applied to all explanation methods.
topic	explainable machine learning evaluation of explainability application-grounded evaluation human-grounded evaluation functionality-grounded evaluation evaluation metrics
url	https://www.mdpi.com/2079-9292/10/5/593
work_keys_str_mv	AT jianlongzhou evaluatingthequalityofmachinelearningexplanationsasurveyonmethodsandmetrics AT amirhgandomi evaluatingthequalityofmachinelearningexplanationsasurveyonmethodsandmetrics AT fangchen evaluatingthequalityofmachinelearningexplanationsasurveyonmethodsandmetrics AT andreasholzinger evaluatingthequalityofmachinelearningexplanationsasurveyonmethodsandmetrics
_version_	1724231502029389824

Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics

Similar Items