A study of real-world micrograph data quality and machine learning model robustness

Abstract Machine-learning (ML) techniques hold the potential of enabling efficient quantitative micrograph analysis, but the robustness of ML models with respect to real-world micrograph quality variations has not been carefully evaluated. We collected thousands of scanning electron microscopy (SEM)...

Full description

Bibliographic Details
Main Authors: Xiaoting Zhong, Brian Gallagher, Keenan Eves, Emily Robertson, T. Nathan Mundhenk, T. Yong-Jin Han
Format: Article
Language:English
Published: Nature Publishing Group 2021-10-01
Series:npj Computational Materials
Online Access:https://doi.org/10.1038/s41524-021-00616-3
id doaj-15813301f7d54bfcb75f7018545b366d
record_format Article
spelling doaj-15813301f7d54bfcb75f7018545b366d2021-10-10T11:18:02ZengNature Publishing Groupnpj Computational Materials2057-39602021-10-017111110.1038/s41524-021-00616-3A study of real-world micrograph data quality and machine learning model robustnessXiaoting Zhong0Brian Gallagher1Keenan Eves2Emily Robertson3T. Nathan Mundhenk4T. Yong-Jin Han5Materials Science Division, Lawrence Livermore National LaboratoryCenter for Applied Scientific Computing, Lawrence Livermore National LaboratoryDefense Technologies Engineering Division, Lawrence Livermore National LaboratoryMaterials Science Division, Lawrence Livermore National LaboratoryComputational Engineering Division, Lawrence Livermore National LaboratoryMaterials Science Division, Lawrence Livermore National LaboratoryAbstract Machine-learning (ML) techniques hold the potential of enabling efficient quantitative micrograph analysis, but the robustness of ML models with respect to real-world micrograph quality variations has not been carefully evaluated. We collected thousands of scanning electron microscopy (SEM) micrographs for molecular solid materials, in which image pixel intensities vary due to both the microstructure content and microscope instrument conditions. We then built ML models to predict the ultimate compressive strength (UCS) of consolidated molecular solids, by encoding micrographs with different image feature descriptors and training a random forest regressor, and by training an end-to-end deep-learning (DL) model. Results show that instrument-induced pixel intensity signals can affect ML model predictions in a consistently negative way. As a remedy, we explored intensity normalization techniques. It is seen that intensity normalization helps to improve micrograph data quality and ML model robustness, but microscope-induced intensity variations can be difficult to eliminate.https://doi.org/10.1038/s41524-021-00616-3
collection DOAJ
language English
format Article
sources DOAJ
author Xiaoting Zhong
Brian Gallagher
Keenan Eves
Emily Robertson
T. Nathan Mundhenk
T. Yong-Jin Han
spellingShingle Xiaoting Zhong
Brian Gallagher
Keenan Eves
Emily Robertson
T. Nathan Mundhenk
T. Yong-Jin Han
A study of real-world micrograph data quality and machine learning model robustness
npj Computational Materials
author_facet Xiaoting Zhong
Brian Gallagher
Keenan Eves
Emily Robertson
T. Nathan Mundhenk
T. Yong-Jin Han
author_sort Xiaoting Zhong
title A study of real-world micrograph data quality and machine learning model robustness
title_short A study of real-world micrograph data quality and machine learning model robustness
title_full A study of real-world micrograph data quality and machine learning model robustness
title_fullStr A study of real-world micrograph data quality and machine learning model robustness
title_full_unstemmed A study of real-world micrograph data quality and machine learning model robustness
title_sort study of real-world micrograph data quality and machine learning model robustness
publisher Nature Publishing Group
series npj Computational Materials
issn 2057-3960
publishDate 2021-10-01
description Abstract Machine-learning (ML) techniques hold the potential of enabling efficient quantitative micrograph analysis, but the robustness of ML models with respect to real-world micrograph quality variations has not been carefully evaluated. We collected thousands of scanning electron microscopy (SEM) micrographs for molecular solid materials, in which image pixel intensities vary due to both the microstructure content and microscope instrument conditions. We then built ML models to predict the ultimate compressive strength (UCS) of consolidated molecular solids, by encoding micrographs with different image feature descriptors and training a random forest regressor, and by training an end-to-end deep-learning (DL) model. Results show that instrument-induced pixel intensity signals can affect ML model predictions in a consistently negative way. As a remedy, we explored intensity normalization techniques. It is seen that intensity normalization helps to improve micrograph data quality and ML model robustness, but microscope-induced intensity variations can be difficult to eliminate.
url https://doi.org/10.1038/s41524-021-00616-3
work_keys_str_mv AT xiaotingzhong astudyofrealworldmicrographdataqualityandmachinelearningmodelrobustness
AT briangallagher astudyofrealworldmicrographdataqualityandmachinelearningmodelrobustness
AT keenaneves astudyofrealworldmicrographdataqualityandmachinelearningmodelrobustness
AT emilyrobertson astudyofrealworldmicrographdataqualityandmachinelearningmodelrobustness
AT tnathanmundhenk astudyofrealworldmicrographdataqualityandmachinelearningmodelrobustness
AT tyongjinhan astudyofrealworldmicrographdataqualityandmachinelearningmodelrobustness
AT xiaotingzhong studyofrealworldmicrographdataqualityandmachinelearningmodelrobustness
AT briangallagher studyofrealworldmicrographdataqualityandmachinelearningmodelrobustness
AT keenaneves studyofrealworldmicrographdataqualityandmachinelearningmodelrobustness
AT emilyrobertson studyofrealworldmicrographdataqualityandmachinelearningmodelrobustness
AT tnathanmundhenk studyofrealworldmicrographdataqualityandmachinelearningmodelrobustness
AT tyongjinhan studyofrealworldmicrographdataqualityandmachinelearningmodelrobustness
_version_ 1716829804118933504