Application of chemometrics for the robust analysis of chemical and biochemical data

In the last two decades chemometrics has become an essential tool for the experimental biologist and chemist. The level of contribution varies strongly depending on the type of research performed. Therefore, chemometrics may be used to interpret and explain results, to compare experimental data with...

Full description

Bibliographic Details
Main Author:	Gromski, Piotr Sebastian
Published:	University of Manchester 2015
Subjects:	543 Chemometrics
Online Access:	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.654801

id	ndltd-bl.uk-oai-ethos.bl.uk-654801
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-6548012017-07-25T03:25:45ZApplication of chemometrics for the robust analysis of chemical and biochemical dataGromski, Piotr Sebastian2015In the last two decades chemometrics has become an essential tool for the experimental biologist and chemist. The level of contribution varies strongly depending on the type of research performed. Therefore, chemometrics may be used to interpret and explain results, to compare experimental data with real-word ‘unseen’ data, to accurately detect certain chemical vapour, to identify cancerous related metabolites, to identify and rank potentially relevant/important variables or simply just for a pictorial interpretation and understanding of the results. Whilst many chemometrics methods are well-established in the area of chemistry and metabolomics many scientists are still using them with what is often referred to as a ‘black-box’ approach, that is without prior knowledge of the methods and well-recognised statistical properties. This lack of knowledge is thanks to the wide availability of powerful computers and – perhaps more notably – up-to-date, easy to use and reliable software. The main aim of this study is to reduce this gap by providing extensive demonstration of several approaches applied at different stages of the data analysis pipeline highlighting the importance of appropriate method selection. The comparisons are based both on chemical and biochemical (metabolomics) data and construct a firm basis for the researchers in terms of understanding of chemometric methods and the influence of parameter selection. Consequently, in this thesis the exploration and comparison of different approaches employed for various statistical steps are investigated. These include pre-treatment steps such as dealing with missing data and scaling. First, different substitution of missing values and their influence on unsupervised and supervised learning have been compared, where it has been shown that metabolites that display skewness in distribution can have a significant impact on the replacement approach. The scaling approaches were compared in terms of effect on classification accuracy for variety of metabolomics data sets. It was shown that the most standard option which is autoscaling is not always the best. In the next step a comparison of various variable selection methods which are commonly used for the analysis of chemical data has been carried out. The results revealed that random forests, with its variable selection techniques, and support vector machines, combined with recursive feature elimination as a variable selection method, displayed the best results in comparison to other approaches. Moreover, in this study a double cross-validation procedure was applied to minimize the consequence of over-fitting. Finally, seven different algorithms and two model validation procedures based on either 10-fold cross-validation or bootstrapping were investigated in order to allow direct comparison between different classification approaches.543ChemometricsUniversity of Manchesterhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.654801https://www.research.manchester.ac.uk/portal/en/theses/application-of-chemometrics-for-the-robust-analysis-of-chemical-and-biochemical-data(3049006f-e218-4286-83a8-e1fd85004366).htmlElectronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
topic	543 Chemometrics
spellingShingle	543 Chemometrics Gromski, Piotr Sebastian Application of chemometrics for the robust analysis of chemical and biochemical data
description	In the last two decades chemometrics has become an essential tool for the experimental biologist and chemist. The level of contribution varies strongly depending on the type of research performed. Therefore, chemometrics may be used to interpret and explain results, to compare experimental data with real-word ‘unseen’ data, to accurately detect certain chemical vapour, to identify cancerous related metabolites, to identify and rank potentially relevant/important variables or simply just for a pictorial interpretation and understanding of the results. Whilst many chemometrics methods are well-established in the area of chemistry and metabolomics many scientists are still using them with what is often referred to as a ‘black-box’ approach, that is without prior knowledge of the methods and well-recognised statistical properties. This lack of knowledge is thanks to the wide availability of powerful computers and – perhaps more notably – up-to-date, easy to use and reliable software. The main aim of this study is to reduce this gap by providing extensive demonstration of several approaches applied at different stages of the data analysis pipeline highlighting the importance of appropriate method selection. The comparisons are based both on chemical and biochemical (metabolomics) data and construct a firm basis for the researchers in terms of understanding of chemometric methods and the influence of parameter selection. Consequently, in this thesis the exploration and comparison of different approaches employed for various statistical steps are investigated. These include pre-treatment steps such as dealing with missing data and scaling. First, different substitution of missing values and their influence on unsupervised and supervised learning have been compared, where it has been shown that metabolites that display skewness in distribution can have a significant impact on the replacement approach. The scaling approaches were compared in terms of effect on classification accuracy for variety of metabolomics data sets. It was shown that the most standard option which is autoscaling is not always the best. In the next step a comparison of various variable selection methods which are commonly used for the analysis of chemical data has been carried out. The results revealed that random forests, with its variable selection techniques, and support vector machines, combined with recursive feature elimination as a variable selection method, displayed the best results in comparison to other approaches. Moreover, in this study a double cross-validation procedure was applied to minimize the consequence of over-fitting. Finally, seven different algorithms and two model validation procedures based on either 10-fold cross-validation or bootstrapping were investigated in order to allow direct comparison between different classification approaches.
author	Gromski, Piotr Sebastian
author_facet	Gromski, Piotr Sebastian
author_sort	Gromski, Piotr Sebastian
title	Application of chemometrics for the robust analysis of chemical and biochemical data
title_short	Application of chemometrics for the robust analysis of chemical and biochemical data
title_full	Application of chemometrics for the robust analysis of chemical and biochemical data
title_fullStr	Application of chemometrics for the robust analysis of chemical and biochemical data
title_full_unstemmed	Application of chemometrics for the robust analysis of chemical and biochemical data
title_sort	application of chemometrics for the robust analysis of chemical and biochemical data
publisher	University of Manchester
publishDate	2015
url	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.654801
work_keys_str_mv	AT gromskipiotrsebastian applicationofchemometricsfortherobustanalysisofchemicalandbiochemicaldata
_version_	1718504839794655232

Application of chemometrics for the robust analysis of chemical and biochemical data

Similar Items