Multivariate classification of neuroimaging data with nested subclasses: Biased accuracy and implications for hypothesis testing.

Biological data sets are typically characterized by high dimensionality and low effect sizes. A powerful method for detecting systematic differences between experimental conditions in such multivariate data sets is multivariate pattern analysis (MVPA), particularly pattern classification. However, i...

Full description

Bibliographic Details
Main Authors:	Hamidreza Jamalabadi, Sarah Alizadeh, Monika Schönauer, Christian Leibold, Steffen Gais
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2018-09-01
Series:	PLoS Computational Biology
Online Access:	http://europepmc.org/articles/PMC6177201?pdf=render

id	doaj-c195be29510a452191a8db4e4f0cabd9
record_format	Article
spelling	doaj-c195be29510a452191a8db4e4f0cabd92020-11-25T01:13:35ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582018-09-01149e100648610.1371/journal.pcbi.1006486Multivariate classification of neuroimaging data with nested subclasses: Biased accuracy and implications for hypothesis testing.Hamidreza JamalabadiSarah AlizadehMonika SchönauerChristian LeiboldSteffen GaisBiological data sets are typically characterized by high dimensionality and low effect sizes. A powerful method for detecting systematic differences between experimental conditions in such multivariate data sets is multivariate pattern analysis (MVPA), particularly pattern classification. However, in virtually all applications, data from the classes that correspond to the conditions of interest are not homogeneous but contain subclasses. Such subclasses can for example arise from individual subjects that contribute multiple data points, or from correlations of items within classes. We show here that in multivariate data that have subclasses nested within its class structure, these subclasses introduce systematic information that improves classifiability beyond what is expected by the size of the class difference. We analytically prove that this subclass bias systematically inflates correct classification rates (CCRs) of linear classifiers depending on the number of subclasses as well as on the portion of variance induced by the subclasses. In simulations, we demonstrate that subclass bias is highest when between-class effect size is low and subclass variance high. This bias can be reduced by increasing the total number of subclasses. However, we can account for the subclass bias by using permutation tests that explicitly consider the subclass structure of the data. We illustrate our result in several experiments that recorded human EEG activity, demonstrating that parametric statistical tests as well as typical trial-wise permutation fail to determine significance of classification outcomes correctly.http://europepmc.org/articles/PMC6177201?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Hamidreza Jamalabadi Sarah Alizadeh Monika Schönauer Christian Leibold Steffen Gais
spellingShingle	Hamidreza Jamalabadi Sarah Alizadeh Monika Schönauer Christian Leibold Steffen Gais Multivariate classification of neuroimaging data with nested subclasses: Biased accuracy and implications for hypothesis testing. PLoS Computational Biology
author_facet	Hamidreza Jamalabadi Sarah Alizadeh Monika Schönauer Christian Leibold Steffen Gais
author_sort	Hamidreza Jamalabadi
title	Multivariate classification of neuroimaging data with nested subclasses: Biased accuracy and implications for hypothesis testing.
title_short	Multivariate classification of neuroimaging data with nested subclasses: Biased accuracy and implications for hypothesis testing.
title_full	Multivariate classification of neuroimaging data with nested subclasses: Biased accuracy and implications for hypothesis testing.
title_fullStr	Multivariate classification of neuroimaging data with nested subclasses: Biased accuracy and implications for hypothesis testing.
title_full_unstemmed	Multivariate classification of neuroimaging data with nested subclasses: Biased accuracy and implications for hypothesis testing.
title_sort	multivariate classification of neuroimaging data with nested subclasses: biased accuracy and implications for hypothesis testing.
publisher	Public Library of Science (PLoS)
series	PLoS Computational Biology
issn	1553-734X 1553-7358
publishDate	2018-09-01
description	Biological data sets are typically characterized by high dimensionality and low effect sizes. A powerful method for detecting systematic differences between experimental conditions in such multivariate data sets is multivariate pattern analysis (MVPA), particularly pattern classification. However, in virtually all applications, data from the classes that correspond to the conditions of interest are not homogeneous but contain subclasses. Such subclasses can for example arise from individual subjects that contribute multiple data points, or from correlations of items within classes. We show here that in multivariate data that have subclasses nested within its class structure, these subclasses introduce systematic information that improves classifiability beyond what is expected by the size of the class difference. We analytically prove that this subclass bias systematically inflates correct classification rates (CCRs) of linear classifiers depending on the number of subclasses as well as on the portion of variance induced by the subclasses. In simulations, we demonstrate that subclass bias is highest when between-class effect size is low and subclass variance high. This bias can be reduced by increasing the total number of subclasses. However, we can account for the subclass bias by using permutation tests that explicitly consider the subclass structure of the data. We illustrate our result in several experiments that recorded human EEG activity, demonstrating that parametric statistical tests as well as typical trial-wise permutation fail to determine significance of classification outcomes correctly.
url	http://europepmc.org/articles/PMC6177201?pdf=render
work_keys_str_mv	AT hamidrezajamalabadi multivariateclassificationofneuroimagingdatawithnestedsubclassesbiasedaccuracyandimplicationsforhypothesistesting AT sarahalizadeh multivariateclassificationofneuroimagingdatawithnestedsubclassesbiasedaccuracyandimplicationsforhypothesistesting AT monikaschonauer multivariateclassificationofneuroimagingdatawithnestedsubclassesbiasedaccuracyandimplicationsforhypothesistesting AT christianleibold multivariateclassificationofneuroimagingdatawithnestedsubclassesbiasedaccuracyandimplicationsforhypothesistesting AT steffengais multivariateclassificationofneuroimagingdatawithnestedsubclassesbiasedaccuracyandimplicationsforhypothesistesting
_version_	1725161370453803008

Multivariate classification of neuroimaging data with nested subclasses: Biased accuracy and implications for hypothesis testing.

Similar Items