Automated supervised learning pipeline for non-targeted GC-MS data analysis

Non-targeted analysis is nowadays applied in many different domains of analytical chemistry such as metabolomics, environmental and food analysis. Conventional processing strategies for GC-MS data include baseline correction, feature detection, and retention time alignment before multivariate modeli...

Full description

Bibliographic Details
Main Authors:	Kimmo Sirén, Ulrich Fischer, Jochen Vestner
Format:	Article
Language:	English
Published:	Elsevier 2019-03-01
Series:	Analytica Chimica Acta: X
Online Access:	http://www.sciencedirect.com/science/article/pii/S2590134619300015

id	doaj-298a96c2f93a441a89f70a55b6e38ca1
record_format	Article
spelling	doaj-298a96c2f93a441a89f70a55b6e38ca12020-11-24T21:32:20ZengElsevierAnalytica Chimica Acta: X2590-13462019-03-011Automated supervised learning pipeline for non-targeted GC-MS data analysisKimmo Sirén0Ulrich Fischer1Jochen Vestner2Institute for Viticulture and Oenology, DLR Rheinpfalz, Breitenweg 71, D-67435, Neustadt, Germany; Department of Chemistry, University of Kaiserslautern, Erwin-Schroedinger-Strasse 52, D-67663, Kaiserslautern, GermanyInstitute for Viticulture and Oenology, DLR Rheinpfalz, Breitenweg 71, D-67435, Neustadt, GermanyInstitute for Viticulture and Oenology, DLR Rheinpfalz, Breitenweg 71, D-67435, Neustadt, Germany; Corresponding author.Non-targeted analysis is nowadays applied in many different domains of analytical chemistry such as metabolomics, environmental and food analysis. Conventional processing strategies for GC-MS data include baseline correction, feature detection, and retention time alignment before multivariate modeling. These techniques can be prone to errors and therefore time-consuming manual corrections are generally necessary. We introduce here a novel fully automated approach to non-targeted GC-MS data processing. This new approach avoids feature extraction and retention time alignment. Supervised machine learning on decomposed tensors of segmented chromatographic raw data signal is used to rank regions in the chromatograms contributing to differentiation between sample classes. The performance of this novel data analysis approach is demonstrated on three published datasets. Keywords: Metabolomics, Chemometrics, Tensor decomposition, Machine learning, Classification, Exploratory data analysishttp://www.sciencedirect.com/science/article/pii/S2590134619300015
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Kimmo Sirén Ulrich Fischer Jochen Vestner
spellingShingle	Kimmo Sirén Ulrich Fischer Jochen Vestner Automated supervised learning pipeline for non-targeted GC-MS data analysis Analytica Chimica Acta: X
author_facet	Kimmo Sirén Ulrich Fischer Jochen Vestner
author_sort	Kimmo Sirén
title	Automated supervised learning pipeline for non-targeted GC-MS data analysis
title_short	Automated supervised learning pipeline for non-targeted GC-MS data analysis
title_full	Automated supervised learning pipeline for non-targeted GC-MS data analysis
title_fullStr	Automated supervised learning pipeline for non-targeted GC-MS data analysis
title_full_unstemmed	Automated supervised learning pipeline for non-targeted GC-MS data analysis
title_sort	automated supervised learning pipeline for non-targeted gc-ms data analysis
publisher	Elsevier
series	Analytica Chimica Acta: X
issn	2590-1346
publishDate	2019-03-01
description	Non-targeted analysis is nowadays applied in many different domains of analytical chemistry such as metabolomics, environmental and food analysis. Conventional processing strategies for GC-MS data include baseline correction, feature detection, and retention time alignment before multivariate modeling. These techniques can be prone to errors and therefore time-consuming manual corrections are generally necessary. We introduce here a novel fully automated approach to non-targeted GC-MS data processing. This new approach avoids feature extraction and retention time alignment. Supervised machine learning on decomposed tensors of segmented chromatographic raw data signal is used to rank regions in the chromatograms contributing to differentiation between sample classes. The performance of this novel data analysis approach is demonstrated on three published datasets. Keywords: Metabolomics, Chemometrics, Tensor decomposition, Machine learning, Classification, Exploratory data analysis
url	http://www.sciencedirect.com/science/article/pii/S2590134619300015
work_keys_str_mv	AT kimmosiren automatedsupervisedlearningpipelinefornontargetedgcmsdataanalysis AT ulrichfischer automatedsupervisedlearningpipelinefornontargetedgcmsdataanalysis AT jochenvestner automatedsupervisedlearningpipelinefornontargetedgcmsdataanalysis
_version_	1725958115463004160

Automated supervised learning pipeline for non-targeted GC-MS data analysis

Similar Items