A study of real-world micrograph data quality and machine learning model robustness
Abstract Machine-learning (ML) techniques hold the potential of enabling efficient quantitative micrograph analysis, but the robustness of ML models with respect to real-world micrograph quality variations has not been carefully evaluated. We collected thousands of scanning electron microscopy (SEM)...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Publishing Group
2021-10-01
|
Series: | npj Computational Materials |
Online Access: | https://doi.org/10.1038/s41524-021-00616-3 |
id |
doaj-15813301f7d54bfcb75f7018545b366d |
---|---|
record_format |
Article |
spelling |
doaj-15813301f7d54bfcb75f7018545b366d2021-10-10T11:18:02ZengNature Publishing Groupnpj Computational Materials2057-39602021-10-017111110.1038/s41524-021-00616-3A study of real-world micrograph data quality and machine learning model robustnessXiaoting Zhong0Brian Gallagher1Keenan Eves2Emily Robertson3T. Nathan Mundhenk4T. Yong-Jin Han5Materials Science Division, Lawrence Livermore National LaboratoryCenter for Applied Scientific Computing, Lawrence Livermore National LaboratoryDefense Technologies Engineering Division, Lawrence Livermore National LaboratoryMaterials Science Division, Lawrence Livermore National LaboratoryComputational Engineering Division, Lawrence Livermore National LaboratoryMaterials Science Division, Lawrence Livermore National LaboratoryAbstract Machine-learning (ML) techniques hold the potential of enabling efficient quantitative micrograph analysis, but the robustness of ML models with respect to real-world micrograph quality variations has not been carefully evaluated. We collected thousands of scanning electron microscopy (SEM) micrographs for molecular solid materials, in which image pixel intensities vary due to both the microstructure content and microscope instrument conditions. We then built ML models to predict the ultimate compressive strength (UCS) of consolidated molecular solids, by encoding micrographs with different image feature descriptors and training a random forest regressor, and by training an end-to-end deep-learning (DL) model. Results show that instrument-induced pixel intensity signals can affect ML model predictions in a consistently negative way. As a remedy, we explored intensity normalization techniques. It is seen that intensity normalization helps to improve micrograph data quality and ML model robustness, but microscope-induced intensity variations can be difficult to eliminate.https://doi.org/10.1038/s41524-021-00616-3 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Xiaoting Zhong Brian Gallagher Keenan Eves Emily Robertson T. Nathan Mundhenk T. Yong-Jin Han |
spellingShingle |
Xiaoting Zhong Brian Gallagher Keenan Eves Emily Robertson T. Nathan Mundhenk T. Yong-Jin Han A study of real-world micrograph data quality and machine learning model robustness npj Computational Materials |
author_facet |
Xiaoting Zhong Brian Gallagher Keenan Eves Emily Robertson T. Nathan Mundhenk T. Yong-Jin Han |
author_sort |
Xiaoting Zhong |
title |
A study of real-world micrograph data quality and machine learning model robustness |
title_short |
A study of real-world micrograph data quality and machine learning model robustness |
title_full |
A study of real-world micrograph data quality and machine learning model robustness |
title_fullStr |
A study of real-world micrograph data quality and machine learning model robustness |
title_full_unstemmed |
A study of real-world micrograph data quality and machine learning model robustness |
title_sort |
study of real-world micrograph data quality and machine learning model robustness |
publisher |
Nature Publishing Group |
series |
npj Computational Materials |
issn |
2057-3960 |
publishDate |
2021-10-01 |
description |
Abstract Machine-learning (ML) techniques hold the potential of enabling efficient quantitative micrograph analysis, but the robustness of ML models with respect to real-world micrograph quality variations has not been carefully evaluated. We collected thousands of scanning electron microscopy (SEM) micrographs for molecular solid materials, in which image pixel intensities vary due to both the microstructure content and microscope instrument conditions. We then built ML models to predict the ultimate compressive strength (UCS) of consolidated molecular solids, by encoding micrographs with different image feature descriptors and training a random forest regressor, and by training an end-to-end deep-learning (DL) model. Results show that instrument-induced pixel intensity signals can affect ML model predictions in a consistently negative way. As a remedy, we explored intensity normalization techniques. It is seen that intensity normalization helps to improve micrograph data quality and ML model robustness, but microscope-induced intensity variations can be difficult to eliminate. |
url |
https://doi.org/10.1038/s41524-021-00616-3 |
work_keys_str_mv |
AT xiaotingzhong astudyofrealworldmicrographdataqualityandmachinelearningmodelrobustness AT briangallagher astudyofrealworldmicrographdataqualityandmachinelearningmodelrobustness AT keenaneves astudyofrealworldmicrographdataqualityandmachinelearningmodelrobustness AT emilyrobertson astudyofrealworldmicrographdataqualityandmachinelearningmodelrobustness AT tnathanmundhenk astudyofrealworldmicrographdataqualityandmachinelearningmodelrobustness AT tyongjinhan astudyofrealworldmicrographdataqualityandmachinelearningmodelrobustness AT xiaotingzhong studyofrealworldmicrographdataqualityandmachinelearningmodelrobustness AT briangallagher studyofrealworldmicrographdataqualityandmachinelearningmodelrobustness AT keenaneves studyofrealworldmicrographdataqualityandmachinelearningmodelrobustness AT emilyrobertson studyofrealworldmicrographdataqualityandmachinelearningmodelrobustness AT tnathanmundhenk studyofrealworldmicrographdataqualityandmachinelearningmodelrobustness AT tyongjinhan studyofrealworldmicrographdataqualityandmachinelearningmodelrobustness |
_version_ |
1716829804118933504 |