Automated supervised learning pipeline for non-targeted GC-MS data analysis

Non-targeted analysis is nowadays applied in many different domains of analytical chemistry such as metabolomics, environmental and food analysis. Conventional processing strategies for GC-MS data include baseline correction, feature detection, and retention time alignment before multivariate modeli...

Full description

Bibliographic Details
Main Authors: Kimmo Sirén, Ulrich Fischer, Jochen Vestner
Format: Article
Language:English
Published: Elsevier 2019-03-01
Series:Analytica Chimica Acta: X
Online Access:http://www.sciencedirect.com/science/article/pii/S2590134619300015
id doaj-298a96c2f93a441a89f70a55b6e38ca1
record_format Article
spelling doaj-298a96c2f93a441a89f70a55b6e38ca12020-11-24T21:32:20ZengElsevierAnalytica Chimica Acta: X2590-13462019-03-011Automated supervised learning pipeline for non-targeted GC-MS data analysisKimmo Sirén0Ulrich Fischer1Jochen Vestner2Institute for Viticulture and Oenology, DLR Rheinpfalz, Breitenweg 71, D-67435, Neustadt, Germany; Department of Chemistry, University of Kaiserslautern, Erwin-Schroedinger-Strasse 52, D-67663, Kaiserslautern, GermanyInstitute for Viticulture and Oenology, DLR Rheinpfalz, Breitenweg 71, D-67435, Neustadt, GermanyInstitute for Viticulture and Oenology, DLR Rheinpfalz, Breitenweg 71, D-67435, Neustadt, Germany; Corresponding author.Non-targeted analysis is nowadays applied in many different domains of analytical chemistry such as metabolomics, environmental and food analysis. Conventional processing strategies for GC-MS data include baseline correction, feature detection, and retention time alignment before multivariate modeling. These techniques can be prone to errors and therefore time-consuming manual corrections are generally necessary. We introduce here a novel fully automated approach to non-targeted GC-MS data processing. This new approach avoids feature extraction and retention time alignment. Supervised machine learning on decomposed tensors of segmented chromatographic raw data signal is used to rank regions in the chromatograms contributing to differentiation between sample classes. The performance of this novel data analysis approach is demonstrated on three published datasets. Keywords: Metabolomics, Chemometrics, Tensor decomposition, Machine learning, Classification, Exploratory data analysishttp://www.sciencedirect.com/science/article/pii/S2590134619300015
collection DOAJ
language English
format Article
sources DOAJ
author Kimmo Sirén
Ulrich Fischer
Jochen Vestner
spellingShingle Kimmo Sirén
Ulrich Fischer
Jochen Vestner
Automated supervised learning pipeline for non-targeted GC-MS data analysis
Analytica Chimica Acta: X
author_facet Kimmo Sirén
Ulrich Fischer
Jochen Vestner
author_sort Kimmo Sirén
title Automated supervised learning pipeline for non-targeted GC-MS data analysis
title_short Automated supervised learning pipeline for non-targeted GC-MS data analysis
title_full Automated supervised learning pipeline for non-targeted GC-MS data analysis
title_fullStr Automated supervised learning pipeline for non-targeted GC-MS data analysis
title_full_unstemmed Automated supervised learning pipeline for non-targeted GC-MS data analysis
title_sort automated supervised learning pipeline for non-targeted gc-ms data analysis
publisher Elsevier
series Analytica Chimica Acta: X
issn 2590-1346
publishDate 2019-03-01
description Non-targeted analysis is nowadays applied in many different domains of analytical chemistry such as metabolomics, environmental and food analysis. Conventional processing strategies for GC-MS data include baseline correction, feature detection, and retention time alignment before multivariate modeling. These techniques can be prone to errors and therefore time-consuming manual corrections are generally necessary. We introduce here a novel fully automated approach to non-targeted GC-MS data processing. This new approach avoids feature extraction and retention time alignment. Supervised machine learning on decomposed tensors of segmented chromatographic raw data signal is used to rank regions in the chromatograms contributing to differentiation between sample classes. The performance of this novel data analysis approach is demonstrated on three published datasets. Keywords: Metabolomics, Chemometrics, Tensor decomposition, Machine learning, Classification, Exploratory data analysis
url http://www.sciencedirect.com/science/article/pii/S2590134619300015
work_keys_str_mv AT kimmosiren automatedsupervisedlearningpipelinefornontargetedgcmsdataanalysis
AT ulrichfischer automatedsupervisedlearningpipelinefornontargetedgcmsdataanalysis
AT jochenvestner automatedsupervisedlearningpipelinefornontargetedgcmsdataanalysis
_version_ 1725958115463004160