On linear dimension reduction based on diagonalization of scatter matrices for bioinformatics downstream analyses

Dimension reduction is often a preliminary step in the analysis of data sets with a large number of variables. Most classical, both supervised and unsupervised, dimension reduction methods such as principal component analysis (PCA), independent component analysis (ICA) or sliced inverse regression (...

Full description

Bibliographic Details
Main Authors:	Daniel Fischer, Klaus Nordhausen, Hannu Oja
Format:	Article
Language:	English
Published:	Elsevier 2020-12-01
Series:	Heliyon
Subjects:	Computer science Mathematics Statistics Bioinformatics Microbial genomics Genomics
Online Access:	http://www.sciencedirect.com/science/article/pii/S2405844020325755

id	doaj-1a3f02c4f1104f759c70f37959ca3c71
record_format	Article
spelling	doaj-1a3f02c4f1104f759c70f37959ca3c712021-01-05T09:33:18ZengElsevierHeliyon2405-84402020-12-01612e05732On linear dimension reduction based on diagonalization of scatter matrices for bioinformatics downstream analysesDaniel Fischer0Klaus Nordhausen1Hannu Oja2Natural Resources Institute Finland (Luke), Applied Statistical Methods, Myllytie 1, 31600 Jokionen, Finland; Corresponding author.CSTAT - Computational Statistics, Institute of Statistics & Mathematical Methods in Economics, Vienna University of Technology, Wiedner Hauptstraße 7, A-1040 Vienna, AustriaDepartment of Mathematics and Statistics, University of Turku, 20014 Turku, FinlandDimension reduction is often a preliminary step in the analysis of data sets with a large number of variables. Most classical, both supervised and unsupervised, dimension reduction methods such as principal component analysis (PCA), independent component analysis (ICA) or sliced inverse regression (SIR) can be formulated using one, two or several different scatter matrix functionals. Scatter matrices can be seen as different measures of multivariate dispersion and might highlight different features of the data and when compared might reveal interesting structures. Such analysis then searches for a projection onto an interesting (signal) part of the data, and it is also important to know the correct dimension of the signal subspace. These approaches usually make either no model assumptions or work in wide classes of semiparametric models. Theoretical results in the literature are however limited to the case where the sample size exceeds the number of variables which is hardly ever true for data sets encountered in bioinformatics. In this paper, we briefly review the relevant literature and explore if the dimension reduction tools can be used to find relevant and interesting subspaces for small-n-large-p data sets. We illustrate the methods with a microarray dataset of prostate cancer patients and healthy controls.http://www.sciencedirect.com/science/article/pii/S2405844020325755Computer scienceMathematicsStatisticsBioinformaticsMicrobial genomicsGenomics
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Daniel Fischer Klaus Nordhausen Hannu Oja
spellingShingle	Daniel Fischer Klaus Nordhausen Hannu Oja On linear dimension reduction based on diagonalization of scatter matrices for bioinformatics downstream analyses Heliyon Computer science Mathematics Statistics Bioinformatics Microbial genomics Genomics
author_facet	Daniel Fischer Klaus Nordhausen Hannu Oja
author_sort	Daniel Fischer
title	On linear dimension reduction based on diagonalization of scatter matrices for bioinformatics downstream analyses
title_short	On linear dimension reduction based on diagonalization of scatter matrices for bioinformatics downstream analyses
title_full	On linear dimension reduction based on diagonalization of scatter matrices for bioinformatics downstream analyses
title_fullStr	On linear dimension reduction based on diagonalization of scatter matrices for bioinformatics downstream analyses
title_full_unstemmed	On linear dimension reduction based on diagonalization of scatter matrices for bioinformatics downstream analyses
title_sort	on linear dimension reduction based on diagonalization of scatter matrices for bioinformatics downstream analyses
publisher	Elsevier
series	Heliyon
issn	2405-8440
publishDate	2020-12-01
description	Dimension reduction is often a preliminary step in the analysis of data sets with a large number of variables. Most classical, both supervised and unsupervised, dimension reduction methods such as principal component analysis (PCA), independent component analysis (ICA) or sliced inverse regression (SIR) can be formulated using one, two or several different scatter matrix functionals. Scatter matrices can be seen as different measures of multivariate dispersion and might highlight different features of the data and when compared might reveal interesting structures. Such analysis then searches for a projection onto an interesting (signal) part of the data, and it is also important to know the correct dimension of the signal subspace. These approaches usually make either no model assumptions or work in wide classes of semiparametric models. Theoretical results in the literature are however limited to the case where the sample size exceeds the number of variables which is hardly ever true for data sets encountered in bioinformatics. In this paper, we briefly review the relevant literature and explore if the dimension reduction tools can be used to find relevant and interesting subspaces for small-n-large-p data sets. We illustrate the methods with a microarray dataset of prostate cancer patients and healthy controls.
topic	Computer science Mathematics Statistics Bioinformatics Microbial genomics Genomics
url	http://www.sciencedirect.com/science/article/pii/S2405844020325755
work_keys_str_mv	AT danielfischer onlineardimensionreductionbasedondiagonalizationofscattermatricesforbioinformaticsdownstreamanalyses AT klausnordhausen onlineardimensionreductionbasedondiagonalizationofscattermatricesforbioinformaticsdownstreamanalyses AT hannuoja onlineardimensionreductionbasedondiagonalizationofscattermatricesforbioinformaticsdownstreamanalyses
_version_	1724348290973040640

On linear dimension reduction based on diagonalization of scatter matrices for bioinformatics downstream analyses

Similar Items