Prediction of Chromatography Conditions for Purification in Organic Synthesis Using Deep Learning

In this research, a process for developing normal-phase liquid chromatography solvent systems has been proposed. In contrast to the development of conditions via thin-layer chromatography (TLC), this process is based on the architecture of two hierarchically connected neural network-based components...

Full description

Bibliographic Details
Main Authors: Mantas Vaškevičius, Jurgita Kapočiūtė-Dzikienė, Liudas Šlepikas
Format: Article
Language:English
Published: MDPI AG 2021-04-01
Series:Molecules
Subjects:
Online Access:https://www.mdpi.com/1420-3049/26/9/2474
id doaj-ee0408718f4e45a29865908a359620e0
record_format Article
spelling doaj-ee0408718f4e45a29865908a359620e02021-04-23T23:05:28ZengMDPI AGMolecules1420-30492021-04-01262474247410.3390/molecules26092474Prediction of Chromatography Conditions for Purification in Organic Synthesis Using Deep LearningMantas Vaškevičius0Jurgita Kapočiūtė-Dzikienė1Liudas Šlepikas2Department of Applied Informatics, Vytautas Magnus University, LT-44404 Kaunas, LithuaniaDepartment of Applied Informatics, Vytautas Magnus University, LT-44404 Kaunas, LithuaniaJSC Synhet, Biržų Str. 6, LT-44139 Kaunas, LithuaniaIn this research, a process for developing normal-phase liquid chromatography solvent systems has been proposed. In contrast to the development of conditions via thin-layer chromatography (TLC), this process is based on the architecture of two hierarchically connected neural network-based components. Using a large database of reaction procedures allows those two components to perform an essential role in the machine-learning-based prediction of chromatographic purification conditions, i.e., solvents and the ratio between solvents. In our paper, we build two datasets and test various molecular vectorization approaches, such as extended-connectivity fingerprints, learned embedding, and auto-encoders along with different types of deep neural networks to demonstrate a novel method for modeling chromatographic solvent systems employing two neural networks in sequence. Afterward, we present our findings and provide insights on the most effective methods for solving prediction tasks. Our approach results in a system of two neural networks with long short-term memory (LSTM)-based auto-encoders, where the first predicts solvent labels (by reaching the classification accuracy of 0.950 ± 0.001) and in the case of two solvents, the second one predicts the ratio between two solvents (R<sup>2</sup> metric equal to 0.982 ± 0.001). Our approach can be used as a guidance instrument in laboratories to accelerate scouting for suitable chromatography conditions.https://www.mdpi.com/1420-3049/26/9/2474deep learningchromatographyneural networksmachine learningsolvent predictionorganic synthesis
collection DOAJ
language English
format Article
sources DOAJ
author Mantas Vaškevičius
Jurgita Kapočiūtė-Dzikienė
Liudas Šlepikas
spellingShingle Mantas Vaškevičius
Jurgita Kapočiūtė-Dzikienė
Liudas Šlepikas
Prediction of Chromatography Conditions for Purification in Organic Synthesis Using Deep Learning
Molecules
deep learning
chromatography
neural networks
machine learning
solvent prediction
organic synthesis
author_facet Mantas Vaškevičius
Jurgita Kapočiūtė-Dzikienė
Liudas Šlepikas
author_sort Mantas Vaškevičius
title Prediction of Chromatography Conditions for Purification in Organic Synthesis Using Deep Learning
title_short Prediction of Chromatography Conditions for Purification in Organic Synthesis Using Deep Learning
title_full Prediction of Chromatography Conditions for Purification in Organic Synthesis Using Deep Learning
title_fullStr Prediction of Chromatography Conditions for Purification in Organic Synthesis Using Deep Learning
title_full_unstemmed Prediction of Chromatography Conditions for Purification in Organic Synthesis Using Deep Learning
title_sort prediction of chromatography conditions for purification in organic synthesis using deep learning
publisher MDPI AG
series Molecules
issn 1420-3049
publishDate 2021-04-01
description In this research, a process for developing normal-phase liquid chromatography solvent systems has been proposed. In contrast to the development of conditions via thin-layer chromatography (TLC), this process is based on the architecture of two hierarchically connected neural network-based components. Using a large database of reaction procedures allows those two components to perform an essential role in the machine-learning-based prediction of chromatographic purification conditions, i.e., solvents and the ratio between solvents. In our paper, we build two datasets and test various molecular vectorization approaches, such as extended-connectivity fingerprints, learned embedding, and auto-encoders along with different types of deep neural networks to demonstrate a novel method for modeling chromatographic solvent systems employing two neural networks in sequence. Afterward, we present our findings and provide insights on the most effective methods for solving prediction tasks. Our approach results in a system of two neural networks with long short-term memory (LSTM)-based auto-encoders, where the first predicts solvent labels (by reaching the classification accuracy of 0.950 ± 0.001) and in the case of two solvents, the second one predicts the ratio between two solvents (R<sup>2</sup> metric equal to 0.982 ± 0.001). Our approach can be used as a guidance instrument in laboratories to accelerate scouting for suitable chromatography conditions.
topic deep learning
chromatography
neural networks
machine learning
solvent prediction
organic synthesis
url https://www.mdpi.com/1420-3049/26/9/2474
work_keys_str_mv AT mantasvaskevicius predictionofchromatographyconditionsforpurificationinorganicsynthesisusingdeeplearning
AT jurgitakapociutedzikiene predictionofchromatographyconditionsforpurificationinorganicsynthesisusingdeeplearning
AT liudasslepikas predictionofchromatographyconditionsforpurificationinorganicsynthesisusingdeeplearning
_version_ 1721512112632627200