Calculating the true level of predictors significance when carrying out the procedure of regression equation specification

The paper is devoted to a new randomization method that yields unbiased adjustments of p-values for linear regression models predictors by incorporating the number of potential explanatory variables, their variance-covariance matrix and its uncertainty, based on the number of observations. This adju...

Full description

Bibliographic Details
Main Author: Nikita A. Moiseev
Format: Article
Language:Russian
Published: Plekhanov Russian University of Economics 2017-07-01
Series:Statistika i Èkonomika
Subjects:
Online Access:https://statecon.rea.ru/jour/article/view/1092
id doaj-afe444c1b71e4f69adbebbeae06e78dc
record_format Article
spelling doaj-afe444c1b71e4f69adbebbeae06e78dc2021-07-28T21:20:03ZrusPlekhanov Russian University of EconomicsStatistika i Èkonomika2500-39252017-07-0103102010.21686/2500-3925-2017-3-10-201070Calculating the true level of predictors significance when carrying out the procedure of regression equation specificationNikita A. Moiseev0Plekhanov Russian University of EconomicsThe paper is devoted to a new randomization method that yields unbiased adjustments of p-values for linear regression models predictors by incorporating the number of potential explanatory variables, their variance-covariance matrix and its uncertainty, based on the number of observations. This adjustment helps to control type I errors in scientific studies, significantly decreasing the number of publications that report false relations to be authentic ones. Comparative analysis with such existing methods as Bonferroni correction and Shehata and White adjustments explicitly shows their imperfections, especially in case when the number of observations and the number of potential explanatory variables are approximately equal. Also during the comparative analysis it was shown that when the variance-covariance matrix of a set of potential predictors is diagonal, i.e. the data are independent, the proposed simple correction is the best and easiest way to implement the method to obtain unbiased corrections of traditional p-values. However, in the case of the presence of strongly correlated data, a simple correction overestimates the true pvalues, which can lead to type II errors. It was also found that the corrected p-values depend on the number of observations, the number of potential explanatory variables and the sample variance-covariance matrix. For example, if there are only two potential explanatory variables competing for one position in the regression model, then if they are weakly correlated, the corrected p-value will be lower than when the number of observations is smaller and vice versa; if the data are highly correlated, the case with a larger number of observations will show a lower corrected p-value. With increasing correlation, all corrections, regardless of the number of observations, tend to the original p-value. This phenomenon is easy to explain: as correlation coefficient tends to one, two variables almost linearly depend on each other, and in case if one of them is significant, the other will almost certainly show the same significance. On the other hand, if the sample variance-covariance matrix tends to be diagonal and the number of observations tends to infinity, the proposed numerical method will return corrections close to the simple correction. In the case when the number of observations is much greater than the number of potential predictors, then the Shehata and White corrections give approximately the same corrections with the proposed numerical method. However, in much more common cases, when the number of observations is comparable to the number of potential predictors, the existing methods demonstrate significant inaccuracies. When the number of potential predictors is greater than the available number of observations, it seems impossible to calculate the true p-values. Therefore, it is recommended not to consider such datasets when constructing regression models, since only the fulfillment of the above condition ensures calculation of unbiased p-value corrections. The proposed method is easy to program and can be integrated into any statistical software package.https://statecon.rea.ru/jour/article/view/1092regression modelsp-value adjustmentsignificance of predictorsrandomization methodwishart distributionvariance-covariance matrixcholesky decomposition
collection DOAJ
language Russian
format Article
sources DOAJ
author Nikita A. Moiseev
spellingShingle Nikita A. Moiseev
Calculating the true level of predictors significance when carrying out the procedure of regression equation specification
Statistika i Èkonomika
regression models
p-value adjustment
significance of predictors
randomization method
wishart distribution
variance-covariance matrix
cholesky decomposition
author_facet Nikita A. Moiseev
author_sort Nikita A. Moiseev
title Calculating the true level of predictors significance when carrying out the procedure of regression equation specification
title_short Calculating the true level of predictors significance when carrying out the procedure of regression equation specification
title_full Calculating the true level of predictors significance when carrying out the procedure of regression equation specification
title_fullStr Calculating the true level of predictors significance when carrying out the procedure of regression equation specification
title_full_unstemmed Calculating the true level of predictors significance when carrying out the procedure of regression equation specification
title_sort calculating the true level of predictors significance when carrying out the procedure of regression equation specification
publisher Plekhanov Russian University of Economics
series Statistika i Èkonomika
issn 2500-3925
publishDate 2017-07-01
description The paper is devoted to a new randomization method that yields unbiased adjustments of p-values for linear regression models predictors by incorporating the number of potential explanatory variables, their variance-covariance matrix and its uncertainty, based on the number of observations. This adjustment helps to control type I errors in scientific studies, significantly decreasing the number of publications that report false relations to be authentic ones. Comparative analysis with such existing methods as Bonferroni correction and Shehata and White adjustments explicitly shows their imperfections, especially in case when the number of observations and the number of potential explanatory variables are approximately equal. Also during the comparative analysis it was shown that when the variance-covariance matrix of a set of potential predictors is diagonal, i.e. the data are independent, the proposed simple correction is the best and easiest way to implement the method to obtain unbiased corrections of traditional p-values. However, in the case of the presence of strongly correlated data, a simple correction overestimates the true pvalues, which can lead to type II errors. It was also found that the corrected p-values depend on the number of observations, the number of potential explanatory variables and the sample variance-covariance matrix. For example, if there are only two potential explanatory variables competing for one position in the regression model, then if they are weakly correlated, the corrected p-value will be lower than when the number of observations is smaller and vice versa; if the data are highly correlated, the case with a larger number of observations will show a lower corrected p-value. With increasing correlation, all corrections, regardless of the number of observations, tend to the original p-value. This phenomenon is easy to explain: as correlation coefficient tends to one, two variables almost linearly depend on each other, and in case if one of them is significant, the other will almost certainly show the same significance. On the other hand, if the sample variance-covariance matrix tends to be diagonal and the number of observations tends to infinity, the proposed numerical method will return corrections close to the simple correction. In the case when the number of observations is much greater than the number of potential predictors, then the Shehata and White corrections give approximately the same corrections with the proposed numerical method. However, in much more common cases, when the number of observations is comparable to the number of potential predictors, the existing methods demonstrate significant inaccuracies. When the number of potential predictors is greater than the available number of observations, it seems impossible to calculate the true p-values. Therefore, it is recommended not to consider such datasets when constructing regression models, since only the fulfillment of the above condition ensures calculation of unbiased p-value corrections. The proposed method is easy to program and can be integrated into any statistical software package.
topic regression models
p-value adjustment
significance of predictors
randomization method
wishart distribution
variance-covariance matrix
cholesky decomposition
url https://statecon.rea.ru/jour/article/view/1092
work_keys_str_mv AT nikitaamoiseev calculatingthetruelevelofpredictorssignificancewhencarryingouttheprocedureofregressionequationspecification
_version_ 1721260157671833600