Gas Chromatographic Retention Index Prediction Using Multimodal Machine Learning

Gas chromatography is a widely used method in analytical chemistry and metabolomics. Using gas chromatography, vaporizable compounds can be separated for their further identification. Retention indices are standardized values that depend only on a chemical structure of a compound and on a stationary...

Full description

Bibliographic Details
Main Authors:	Dmitriy D. Matyushin, Aleksey K. Buryak
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Analytical chemistry convolutional neural network deep learning gas chromatography gradient boosting residual neural network
Online Access:	https://ieeexplore.ieee.org/document/9294096/

id	doaj-7a68ed446bac40b29e4ed5597f98ecb7
record_format	Article
spelling	doaj-7a68ed446bac40b29e4ed5597f98ecb72021-03-30T03:48:59ZengIEEEIEEE Access2169-35362020-01-01822314022315510.1109/ACCESS.2020.30450479294096Gas Chromatographic Retention Index Prediction Using Multimodal Machine LearningDmitriy D. Matyushin0https://orcid.org/0000-0003-0978-7666Aleksey K. Buryak1A. N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow, RussiaA. N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow, RussiaGas chromatography is a widely used method in analytical chemistry and metabolomics. Using gas chromatography, vaporizable compounds can be separated for their further identification. Retention indices are standardized values that depend only on a chemical structure of a compound and on a stationary phase and characterize the retention of a compound in a chromatographic system. Retention index prediction is an important task because databases contain experimental values for a small fraction of all possible molecules, while this information is usable for untargeted analysis. In this work, we consider four machine learning models for retention index prediction: 1D and 2D convolutional neural networks, deep residual multilayer perceptron, and gradient boosting. String representation of the molecule, 2D representation of the chemical structure, molecular descriptors and fingerprints, and molecular descriptors are used as inputs of these four models, respectively, along with information about the stationary phase. The first and third models show the best performance, while the other two perform slightly worse. The models predict retention index values for various standard and semi-standard non-polar stationary phases. Further improvement in performance was achieved using a linear model that uses the results of four previous models as inputs (model stacking). The models were tested using various diverse data sets: flavor compounds, essential oils, metabolomics-related compounds. Achieved accuracy: median absolute and percentage errors - 6-40 units and 0.8-2.2%. Accuracy depends on a test data set. The stacking model outperforms previously reported approaches for all test data sets. Parameters of a pre-trained model and some source code are provided.https://ieeexplore.ieee.org/document/9294096/Analytical chemistryconvolutional neural networkdeep learninggas chromatographygradient boostingresidual neural network
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Dmitriy D. Matyushin Aleksey K. Buryak
spellingShingle	Dmitriy D. Matyushin Aleksey K. Buryak Gas Chromatographic Retention Index Prediction Using Multimodal Machine Learning IEEE Access Analytical chemistry convolutional neural network deep learning gas chromatography gradient boosting residual neural network
author_facet	Dmitriy D. Matyushin Aleksey K. Buryak
author_sort	Dmitriy D. Matyushin
title	Gas Chromatographic Retention Index Prediction Using Multimodal Machine Learning
title_short	Gas Chromatographic Retention Index Prediction Using Multimodal Machine Learning
title_full	Gas Chromatographic Retention Index Prediction Using Multimodal Machine Learning
title_fullStr	Gas Chromatographic Retention Index Prediction Using Multimodal Machine Learning
title_full_unstemmed	Gas Chromatographic Retention Index Prediction Using Multimodal Machine Learning
title_sort	gas chromatographic retention index prediction using multimodal machine learning
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	Gas chromatography is a widely used method in analytical chemistry and metabolomics. Using gas chromatography, vaporizable compounds can be separated for their further identification. Retention indices are standardized values that depend only on a chemical structure of a compound and on a stationary phase and characterize the retention of a compound in a chromatographic system. Retention index prediction is an important task because databases contain experimental values for a small fraction of all possible molecules, while this information is usable for untargeted analysis. In this work, we consider four machine learning models for retention index prediction: 1D and 2D convolutional neural networks, deep residual multilayer perceptron, and gradient boosting. String representation of the molecule, 2D representation of the chemical structure, molecular descriptors and fingerprints, and molecular descriptors are used as inputs of these four models, respectively, along with information about the stationary phase. The first and third models show the best performance, while the other two perform slightly worse. The models predict retention index values for various standard and semi-standard non-polar stationary phases. Further improvement in performance was achieved using a linear model that uses the results of four previous models as inputs (model stacking). The models were tested using various diverse data sets: flavor compounds, essential oils, metabolomics-related compounds. Achieved accuracy: median absolute and percentage errors - 6-40 units and 0.8-2.2%. Accuracy depends on a test data set. The stacking model outperforms previously reported approaches for all test data sets. Parameters of a pre-trained model and some source code are provided.
topic	Analytical chemistry convolutional neural network deep learning gas chromatography gradient boosting residual neural network
url	https://ieeexplore.ieee.org/document/9294096/
work_keys_str_mv	AT dmitriydmatyushin gaschromatographicretentionindexpredictionusingmultimodalmachinelearning AT alekseykburyak gaschromatographicretentionindexpredictionusingmultimodalmachinelearning
_version_	1724182851758325760

Gas Chromatographic Retention Index Prediction Using Multimodal Machine Learning

Similar Items