Predicting Risks of Machine Translations of Public Health Resources by Developing Interpretable Machine Learning Classifiers

We aimed to develop machine learning classifiers as a risk-prevention mechanism to help medical professionals with little or no knowledge of the patient’s languages in order to predict the likelihood of clinically significant mistakes or incomprehensible MT outputs based on the features of English s...

Full description

Bibliographic Details
Main Authors:	Wenxiu Xie, Meng Ji, Riliu Huang, Tianyong Hao, Chi-Yin Chow
Format:	Article
Language:	English
Published:	MDPI AG 2021-08-01
Series:	International Journal of Environmental Research and Public Health
Subjects:	multinominal naïve bayes classifier public health education and promotion machine learning digital vulnerability
Online Access:	https://www.mdpi.com/1660-4601/18/16/8789

id	doaj-688d26ca39c04d2b94e8c528a899843e
record_format	Article
spelling	doaj-688d26ca39c04d2b94e8c528a899843e2021-08-26T13:50:23ZengMDPI AGInternational Journal of Environmental Research and Public Health1661-78271660-46012021-08-01188789878910.3390/ijerph18168789Predicting Risks of Machine Translations of Public Health Resources by Developing Interpretable Machine Learning ClassifiersWenxiu Xie0Meng Ji1Riliu Huang2Tianyong Hao3Chi-Yin Chow4Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong 518057, ChinaSchool of Languages and Cultures, University of Sydney, Sydney 2006, AustraliaSchool of Languages and Cultures, University of Sydney, Sydney 2006, AustraliaSchool of Computer Science, South China Normal University, Guangzhou 510631, ChinaDepartment of Computer Science, City University of Hong Kong, Kowloon, Hong Kong 518057, ChinaWe aimed to develop machine learning classifiers as a risk-prevention mechanism to help medical professionals with little or no knowledge of the patient’s languages in order to predict the likelihood of clinically significant mistakes or incomprehensible MT outputs based on the features of English source information as input to the MT systems. A MNB classifier was developed to provide intuitive probabilistic predictions of erroneous health translation outputs based on the computational modelling of a small number of optimised features of the original English source texts. The best performing multinominal Naïve Bayes classifier (MNB) using a small number of optimised features (8) achieved statistically higher AUC (M = 0.760, SD = 0.03) than the classifier using high-dimension natural features (135) (M = 0.631, SD = 0.006, <i>p</i> < 0.0001, SE = 0.004) and the automatically optimised classifier (22) (M = 0.7231, SD = 0.0084, <i>p</i> < 0.0001, SE = 0.004). Furthermore, MNB (8) had statistically higher sensitivity (M = 0.885, SD = 0.100) compared with the full-feature classifier (135) (M = 0.577, SD = 0.155, <i>p</i> < 0.0001, SE = 0.005) and the automatically optimised classifier (22) (M = 0.731, SD = 0.139, <i>p</i> < 0.0001, SE = 0.0023). Finally, MNB (8) reached statistically higher specificity (M = 0.667, SD = 0.138) compared to the full-feature classifier (135) (M = 0.567, SD = 0.139, <i>p</i> = 0.0002, SE = 0.026) and the automatically optimised classifier (22) (M = 0.633, SD = 0.141, <i>p</i> = 0.0133, SE = 0.026).https://www.mdpi.com/1660-4601/18/16/8789multinominal naïve bayes classifierpublic health education and promotionmachine learningdigital vulnerability
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Wenxiu Xie Meng Ji Riliu Huang Tianyong Hao Chi-Yin Chow
spellingShingle	Wenxiu Xie Meng Ji Riliu Huang Tianyong Hao Chi-Yin Chow Predicting Risks of Machine Translations of Public Health Resources by Developing Interpretable Machine Learning Classifiers International Journal of Environmental Research and Public Health multinominal naïve bayes classifier public health education and promotion machine learning digital vulnerability
author_facet	Wenxiu Xie Meng Ji Riliu Huang Tianyong Hao Chi-Yin Chow
author_sort	Wenxiu Xie
title	Predicting Risks of Machine Translations of Public Health Resources by Developing Interpretable Machine Learning Classifiers
title_short	Predicting Risks of Machine Translations of Public Health Resources by Developing Interpretable Machine Learning Classifiers
title_full	Predicting Risks of Machine Translations of Public Health Resources by Developing Interpretable Machine Learning Classifiers
title_fullStr	Predicting Risks of Machine Translations of Public Health Resources by Developing Interpretable Machine Learning Classifiers
title_full_unstemmed	Predicting Risks of Machine Translations of Public Health Resources by Developing Interpretable Machine Learning Classifiers
title_sort	predicting risks of machine translations of public health resources by developing interpretable machine learning classifiers
publisher	MDPI AG
series	International Journal of Environmental Research and Public Health
issn	1661-7827 1660-4601
publishDate	2021-08-01
description	We aimed to develop machine learning classifiers as a risk-prevention mechanism to help medical professionals with little or no knowledge of the patient’s languages in order to predict the likelihood of clinically significant mistakes or incomprehensible MT outputs based on the features of English source information as input to the MT systems. A MNB classifier was developed to provide intuitive probabilistic predictions of erroneous health translation outputs based on the computational modelling of a small number of optimised features of the original English source texts. The best performing multinominal Naïve Bayes classifier (MNB) using a small number of optimised features (8) achieved statistically higher AUC (M = 0.760, SD = 0.03) than the classifier using high-dimension natural features (135) (M = 0.631, SD = 0.006, <i>p</i> < 0.0001, SE = 0.004) and the automatically optimised classifier (22) (M = 0.7231, SD = 0.0084, <i>p</i> < 0.0001, SE = 0.004). Furthermore, MNB (8) had statistically higher sensitivity (M = 0.885, SD = 0.100) compared with the full-feature classifier (135) (M = 0.577, SD = 0.155, <i>p</i> < 0.0001, SE = 0.005) and the automatically optimised classifier (22) (M = 0.731, SD = 0.139, <i>p</i> < 0.0001, SE = 0.0023). Finally, MNB (8) reached statistically higher specificity (M = 0.667, SD = 0.138) compared to the full-feature classifier (135) (M = 0.567, SD = 0.139, <i>p</i> = 0.0002, SE = 0.026) and the automatically optimised classifier (22) (M = 0.633, SD = 0.141, <i>p</i> = 0.0133, SE = 0.026).
topic	multinominal naïve bayes classifier public health education and promotion machine learning digital vulnerability
url	https://www.mdpi.com/1660-4601/18/16/8789
work_keys_str_mv	AT wenxiuxie predictingrisksofmachinetranslationsofpublichealthresourcesbydevelopinginterpretablemachinelearningclassifiers AT mengji predictingrisksofmachinetranslationsofpublichealthresourcesbydevelopinginterpretablemachinelearningclassifiers AT riliuhuang predictingrisksofmachinetranslationsofpublichealthresourcesbydevelopinginterpretablemachinelearningclassifiers AT tianyonghao predictingrisksofmachinetranslationsofpublichealthresourcesbydevelopinginterpretablemachinelearningclassifiers AT chiyinchow predictingrisksofmachinetranslationsofpublichealthresourcesbydevelopinginterpretablemachinelearningclassifiers
_version_	1721192818423103488

Predicting Risks of Machine Translations of Public Health Resources by Developing Interpretable Machine Learning Classifiers

Similar Items