OnPLS : Orthogonal projections to latent structures in multiblock and path model data analysis

The amounts of data collected from each sample of e.g. chemical or biological materials have increased by orders of magnitude since the beginning of the 20th century. Furthermore, the number of ways to collect data from observations is also increasing. Such configurations with several massive data s...

Full description

Bibliographic Details
Main Author: Löfstedt, Tommy
Format: Doctoral Thesis
Language:English
Published: Umeå universitet, Kemiska institutionen 2012
Subjects:
PLS
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-55438
http://nbn-resolving.de/urn:isbn:978-91-7459-442-3
id ndltd-UPSALLA1-oai-DiVA.org-umu-55438
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-umu-554382013-01-08T13:09:08ZOnPLS : Orthogonal projections to latent structures in multiblock and path model data analysisengLöfstedt, TommyUmeå universitet, Kemiska institutionenUmeå : Umeå universitet2012OnPLSOPLSO2PLSPLSMultivariate analysisMultiblock and path modellingChemometricsThe amounts of data collected from each sample of e.g. chemical or biological materials have increased by orders of magnitude since the beginning of the 20th century. Furthermore, the number of ways to collect data from observations is also increasing. Such configurations with several massive data sets increase the demands on the methods used to analyse them. Methods that handle such data are called multiblock methods and they are the topic of this thesis. Data collected from advanced analytical instruments often contain variation from diverse mutually independent sources, which may confound observed patterns and hinder interpretation of latent variable models. For this reason, new methods have been developed that decompose the data matrices, placing variation from different sources of variation into separate parts. Such procedures are no longer merely pre-processing filters, as they initially were, but have become integral elements of model building and interpretation. One strain of such methods, called OPLS, has been particularly successful since it is easy to use, understand and interpret. This thesis describes the development of a new multiblock data analysis method called OnPLS, which extends the OPLS framework to the analysis of multiblock and path models with very general relationships between blocks in both rows and columns. OnPLS utilises OPLS to decompose sets of matrices, dividing each matrix into a globally joint part (a part shared with all the matrices it is connected to), several locally joint parts (parts shared with some, but not all, of the connected matrices) and a unique part that no other matrix shares. The OnPLS method was applied to several synthetic data sets and data sets of “real” measurements. For the synthetic data sets, where the results could be compared to known, true parameters, the method generated global multiblock (and path) models that were more similar to the true underlying structures compared to models without such decompositions. I.e. the globally joint, locally joint and unique models more closely resembled the corresponding true data. When applied to the real data sets, the OnPLS models revealed chemically or biologically relevant information in all kinds of variation, effectively increasing the interpretability since different kinds of variation are distinguished and separately analysed. OnPLS thus improves the quality of the models and facilitates better understanding of the data since it separates and separately analyses different kinds of variation. Each kind of variation is purer and less tainted by other kinds. OnPLS is therefore highly recommended to anyone engaged in multiblock or path model data analysis. Doctoral thesis, comprehensive summaryinfo:eu-repo/semantics/doctoralThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-55438urn:isbn:978-91-7459-442-3application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Doctoral Thesis
sources NDLTD
topic OnPLS
OPLS
O2PLS
PLS
Multivariate analysis
Multiblock and path modelling
Chemometrics
spellingShingle OnPLS
OPLS
O2PLS
PLS
Multivariate analysis
Multiblock and path modelling
Chemometrics
Löfstedt, Tommy
OnPLS : Orthogonal projections to latent structures in multiblock and path model data analysis
description The amounts of data collected from each sample of e.g. chemical or biological materials have increased by orders of magnitude since the beginning of the 20th century. Furthermore, the number of ways to collect data from observations is also increasing. Such configurations with several massive data sets increase the demands on the methods used to analyse them. Methods that handle such data are called multiblock methods and they are the topic of this thesis. Data collected from advanced analytical instruments often contain variation from diverse mutually independent sources, which may confound observed patterns and hinder interpretation of latent variable models. For this reason, new methods have been developed that decompose the data matrices, placing variation from different sources of variation into separate parts. Such procedures are no longer merely pre-processing filters, as they initially were, but have become integral elements of model building and interpretation. One strain of such methods, called OPLS, has been particularly successful since it is easy to use, understand and interpret. This thesis describes the development of a new multiblock data analysis method called OnPLS, which extends the OPLS framework to the analysis of multiblock and path models with very general relationships between blocks in both rows and columns. OnPLS utilises OPLS to decompose sets of matrices, dividing each matrix into a globally joint part (a part shared with all the matrices it is connected to), several locally joint parts (parts shared with some, but not all, of the connected matrices) and a unique part that no other matrix shares. The OnPLS method was applied to several synthetic data sets and data sets of “real” measurements. For the synthetic data sets, where the results could be compared to known, true parameters, the method generated global multiblock (and path) models that were more similar to the true underlying structures compared to models without such decompositions. I.e. the globally joint, locally joint and unique models more closely resembled the corresponding true data. When applied to the real data sets, the OnPLS models revealed chemically or biologically relevant information in all kinds of variation, effectively increasing the interpretability since different kinds of variation are distinguished and separately analysed. OnPLS thus improves the quality of the models and facilitates better understanding of the data since it separates and separately analyses different kinds of variation. Each kind of variation is purer and less tainted by other kinds. OnPLS is therefore highly recommended to anyone engaged in multiblock or path model data analysis.
author Löfstedt, Tommy
author_facet Löfstedt, Tommy
author_sort Löfstedt, Tommy
title OnPLS : Orthogonal projections to latent structures in multiblock and path model data analysis
title_short OnPLS : Orthogonal projections to latent structures in multiblock and path model data analysis
title_full OnPLS : Orthogonal projections to latent structures in multiblock and path model data analysis
title_fullStr OnPLS : Orthogonal projections to latent structures in multiblock and path model data analysis
title_full_unstemmed OnPLS : Orthogonal projections to latent structures in multiblock and path model data analysis
title_sort onpls : orthogonal projections to latent structures in multiblock and path model data analysis
publisher Umeå universitet, Kemiska institutionen
publishDate 2012
url http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-55438
http://nbn-resolving.de/urn:isbn:978-91-7459-442-3
work_keys_str_mv AT lofstedttommy onplsorthogonalprojectionstolatentstructuresinmultiblockandpathmodeldataanalysis
_version_ 1716510483836567552