Convolutional Neural Network-Based Compound Fingerprint Prediction for Metabolite Annotation

Metabolite annotation has been a challenging issue especially in untargeted metabolomics studies by liquid chromatography coupled with mass spectrometry (LC-MS). This is in part due to the limitations of publicly available spectral libraries, which consist of tandem mass spectrometry (MS/MS) data ac...

Full description

Bibliographic Details
Main Authors: Ao, H. (Author), Chau, H.Y.K (Author), Gao, S. (Author), Ressom, H.W (Author), Varghese, R.S (Author), Wang, K. (Author)
Format: Article
Language:English
Published: MDPI 2022
Subjects:
Online Access:View Fulltext in Publisher
LEADER 02617nam a2200349Ia 4500
001 10.3390-metabo12070605
008 220718s2022 CNT 000 0 und d
020 |a 22181989 (ISSN) 
245 1 0 |a Convolutional Neural Network-Based Compound Fingerprint Prediction for Metabolite Annotation 
260 0 |b MDPI  |c 2022 
856 |z View Fulltext in Publisher  |u https://doi.org/10.3390/metabo12070605 
520 3 |a Metabolite annotation has been a challenging issue especially in untargeted metabolomics studies by liquid chromatography coupled with mass spectrometry (LC-MS). This is in part due to the limitations of publicly available spectral libraries, which consist of tandem mass spectrometry (MS/MS) data acquired from just a fraction of known metabolites. Machine learning provides the opportunity to predict molecular fingerprints based on MS/MS data. The predicted molecular fingerprints can then be used to help rank putative metabolite IDs obtained by using either the precursor mass or the formula of the unknown metabolite. This method is particularly useful to help annotate metabolites whose corresponding MS/MS spectra are missing or cannot be matched with those in accessible spectral libraries. We investigated a convolutional neural network (CNN) for molecular fingerprint prediction based on data acquired by MS/MS. We used more than 680,000 MS/MS spectra obtained from the MoNA repository and NIST 20, representing about 36,000 compounds for training and testing our CNN model. The trained CNN model is implemented as a python package, MetFID. The package is available on GitHub for users to enter their MS/MS spectra and corresponding putative metabolite IDs to obtain ranked lists of metabolites. Better performance is achieved by MetFID in ranking putative metabolite IDs using the CASMI 2016 benchmark dataset compared to two other machine learning-based tools (CSI:FingerID and ChemDistiller). © 2022 by the authors. Licensee MDPI, Basel, Switzerland. 
650 0 4 |a article 
650 0 4 |a convolutional neural network 
650 0 4 |a deep learning 
650 0 4 |a deep learning 
650 0 4 |a library 
650 0 4 |a machine learning 
650 0 4 |a metabolite identification 
650 0 4 |a metabolomics 
650 0 4 |a metabolomics 
650 0 4 |a molecular fingerprint 
650 0 4 |a molecular fingerprinting 
650 0 4 |a prediction 
650 0 4 |a tandem mass spectrometry 
700 1 |a Ao, H.  |e author 
700 1 |a Chau, H.Y.K.  |e author 
700 1 |a Gao, S.  |e author 
700 1 |a Ressom, H.W.  |e author 
700 1 |a Varghese, R.S.  |e author 
700 1 |a Wang, K.  |e author 
773 |t Metabolites