Cavalier Use of Inferential Statistics Is a Major Source of False and Irreproducible Scientific Findings

I uncover previously underappreciated systematic sources of false and irreproducible results in natural, biomedical and social sciences that are rooted in statistical methodology. They include the inevitably occurring deviations from basic assumptions behind statistical analyses and the use of vario...

Full description

Bibliographic Details
Main Author: Leonid Hanin
Format: Article
Language:English
Published: MDPI AG 2021-03-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/9/6/603
id doaj-0dff556f4b9948209d685eb08d491f7f
record_format Article
spelling doaj-0dff556f4b9948209d685eb08d491f7f2021-03-12T00:03:42ZengMDPI AGMathematics2227-73902021-03-01960360310.3390/math9060603Cavalier Use of Inferential Statistics Is a Major Source of False and Irreproducible Scientific FindingsLeonid Hanin0Department of Mathematics and Statistics, Idaho State University, 921 S. 8th Avenue, Stop 8085, Pocatello, ID 83209-8085, USAI uncover previously underappreciated systematic sources of false and irreproducible results in natural, biomedical and social sciences that are rooted in statistical methodology. They include the inevitably occurring deviations from basic assumptions behind statistical analyses and the use of various approximations. I show through a number of examples that (a) arbitrarily small deviations from distributional homogeneity can lead to arbitrarily large deviations in the outcomes of statistical analyses; (b) samples of random size may violate the Law of Large Numbers and thus are generally unsuitable for conventional statistical inference; (c) the same is true, in particular, when random sample size and observations are stochastically dependent; and (d) the use of the Gaussian approximation based on the Central Limit Theorem has dramatic implications for <i>p</i>-values and statistical significance essentially making pursuit of small significance levels and <i>p</i>-values for a fixed sample size meaningless. The latter is proven rigorously in the case of one-sided Z test. This article could serve as a cautionary guidance to scientists and practitioners employing statistical methods in their work.https://www.mdpi.com/2227-7390/9/6/603central limit theoremdistributional homogeneitylaw of large numbersprobability metric<i>p</i>-valuerandom sample size
collection DOAJ
language English
format Article
sources DOAJ
author Leonid Hanin
spellingShingle Leonid Hanin
Cavalier Use of Inferential Statistics Is a Major Source of False and Irreproducible Scientific Findings
Mathematics
central limit theorem
distributional homogeneity
law of large numbers
probability metric
<i>p</i>-value
random sample size
author_facet Leonid Hanin
author_sort Leonid Hanin
title Cavalier Use of Inferential Statistics Is a Major Source of False and Irreproducible Scientific Findings
title_short Cavalier Use of Inferential Statistics Is a Major Source of False and Irreproducible Scientific Findings
title_full Cavalier Use of Inferential Statistics Is a Major Source of False and Irreproducible Scientific Findings
title_fullStr Cavalier Use of Inferential Statistics Is a Major Source of False and Irreproducible Scientific Findings
title_full_unstemmed Cavalier Use of Inferential Statistics Is a Major Source of False and Irreproducible Scientific Findings
title_sort cavalier use of inferential statistics is a major source of false and irreproducible scientific findings
publisher MDPI AG
series Mathematics
issn 2227-7390
publishDate 2021-03-01
description I uncover previously underappreciated systematic sources of false and irreproducible results in natural, biomedical and social sciences that are rooted in statistical methodology. They include the inevitably occurring deviations from basic assumptions behind statistical analyses and the use of various approximations. I show through a number of examples that (a) arbitrarily small deviations from distributional homogeneity can lead to arbitrarily large deviations in the outcomes of statistical analyses; (b) samples of random size may violate the Law of Large Numbers and thus are generally unsuitable for conventional statistical inference; (c) the same is true, in particular, when random sample size and observations are stochastically dependent; and (d) the use of the Gaussian approximation based on the Central Limit Theorem has dramatic implications for <i>p</i>-values and statistical significance essentially making pursuit of small significance levels and <i>p</i>-values for a fixed sample size meaningless. The latter is proven rigorously in the case of one-sided Z test. This article could serve as a cautionary guidance to scientists and practitioners employing statistical methods in their work.
topic central limit theorem
distributional homogeneity
law of large numbers
probability metric
<i>p</i>-value
random sample size
url https://www.mdpi.com/2227-7390/9/6/603
work_keys_str_mv AT leonidhanin cavalieruseofinferentialstatisticsisamajorsourceoffalseandirreproduciblescientificfindings
_version_ 1724223347688996864