Deriving Lipid Classification Based on Molecular Formulas

Despite instrument and algorithmic improvements, the untargeted and accurate assignment of metabolites remains an unsolved problem in metabolomics. New assignment methods such as our SMIRFE algorithm can assign elemental molecular formulas to observed spectral features in a highly untargeted manner...

Full description

Bibliographic Details
Main Authors: Joshua M. Mitchell, Robert M. Flight, Hunter N.B. Moseley
Format: Article
Language:English
Published: MDPI AG 2020-03-01
Series:Metabolites
Subjects:
Online Access:https://www.mdpi.com/2218-1989/10/3/122
id doaj-a1a512dee1d04d3caa4ebd5b051cf4ab
record_format Article
spelling doaj-a1a512dee1d04d3caa4ebd5b051cf4ab2020-11-25T01:37:46ZengMDPI AGMetabolites2218-19892020-03-0110312210.3390/metabo10030122metabo10030122Deriving Lipid Classification Based on Molecular FormulasJoshua M. Mitchell0Robert M. Flight1Hunter N.B. Moseley2Department of Molecular & Cellular Biochemistry, University of Kentucky, Lexington, KY 40536, USADepartment of Molecular & Cellular Biochemistry, University of Kentucky, Lexington, KY 40536, USADepartment of Molecular & Cellular Biochemistry, University of Kentucky, Lexington, KY 40536, USADespite instrument and algorithmic improvements, the untargeted and accurate assignment of metabolites remains an unsolved problem in metabolomics. New assignment methods such as our SMIRFE algorithm can assign elemental molecular formulas to observed spectral features in a highly untargeted manner without orthogonal information from tandem MS or chromatography. However, for many lipidomics applications, it is necessary to know at least the lipid category or class that is associated with a detected spectral feature to derive a biochemical interpretation. Our goal is to develop a method for robustly classifying elemental molecular formula assignments into lipid categories for an application to SMIRFE-generated assignments. Using a Random Forest machine learning approach, we developed a method that can predict lipid category and class from SMIRFE non-adducted molecular formula assignments. Our methods achieve high average predictive accuracy (>90%) and precision (>83%) across all eight of the lipid categories in the LIPIDMAPS database. Classification performance was evaluated using sets of theoretical, data-derived, and artifactual molecular formulas. Our methods enable the lipid classification of non-adducted molecular formula assignments generated by SMIRFE without orthogonal information, facilitating the biochemical interpretation of untargeted lipidomics experiments. This lipid classification appears insufficient for validating single-spectrum assignments, but could be useful in cross-spectrum assignment validation.https://www.mdpi.com/2218-1989/10/3/122smirfelipidomicsmetabolomicslipid categorymachine learningrandom forest
collection DOAJ
language English
format Article
sources DOAJ
author Joshua M. Mitchell
Robert M. Flight
Hunter N.B. Moseley
spellingShingle Joshua M. Mitchell
Robert M. Flight
Hunter N.B. Moseley
Deriving Lipid Classification Based on Molecular Formulas
Metabolites
smirfe
lipidomics
metabolomics
lipid category
machine learning
random forest
author_facet Joshua M. Mitchell
Robert M. Flight
Hunter N.B. Moseley
author_sort Joshua M. Mitchell
title Deriving Lipid Classification Based on Molecular Formulas
title_short Deriving Lipid Classification Based on Molecular Formulas
title_full Deriving Lipid Classification Based on Molecular Formulas
title_fullStr Deriving Lipid Classification Based on Molecular Formulas
title_full_unstemmed Deriving Lipid Classification Based on Molecular Formulas
title_sort deriving lipid classification based on molecular formulas
publisher MDPI AG
series Metabolites
issn 2218-1989
publishDate 2020-03-01
description Despite instrument and algorithmic improvements, the untargeted and accurate assignment of metabolites remains an unsolved problem in metabolomics. New assignment methods such as our SMIRFE algorithm can assign elemental molecular formulas to observed spectral features in a highly untargeted manner without orthogonal information from tandem MS or chromatography. However, for many lipidomics applications, it is necessary to know at least the lipid category or class that is associated with a detected spectral feature to derive a biochemical interpretation. Our goal is to develop a method for robustly classifying elemental molecular formula assignments into lipid categories for an application to SMIRFE-generated assignments. Using a Random Forest machine learning approach, we developed a method that can predict lipid category and class from SMIRFE non-adducted molecular formula assignments. Our methods achieve high average predictive accuracy (>90%) and precision (>83%) across all eight of the lipid categories in the LIPIDMAPS database. Classification performance was evaluated using sets of theoretical, data-derived, and artifactual molecular formulas. Our methods enable the lipid classification of non-adducted molecular formula assignments generated by SMIRFE without orthogonal information, facilitating the biochemical interpretation of untargeted lipidomics experiments. This lipid classification appears insufficient for validating single-spectrum assignments, but could be useful in cross-spectrum assignment validation.
topic smirfe
lipidomics
metabolomics
lipid category
machine learning
random forest
url https://www.mdpi.com/2218-1989/10/3/122
work_keys_str_mv AT joshuammitchell derivinglipidclassificationbasedonmolecularformulas
AT robertmflight derivinglipidclassificationbasedonmolecularformulas
AT hunternbmoseley derivinglipidclassificationbasedonmolecularformulas
_version_ 1725057524223180800