Estimating R2 Shrinkage in Multiple Regression: A Comparison of Different Analytical Methods
This study investigated the effectiveness of various analytical methods used for estimating R2 shrinkage in multiple regression analysis. Two categories of analytical formulae were identified: estimators of the population squared multiple correlation coefficient (ρ2), and estimators of the populatio...
Main Author: | |
---|---|
Format: | Others |
Published: |
DigitalCommons@USU
1999
|
Subjects: | |
Online Access: | https://digitalcommons.usu.edu/etd/6147 https://digitalcommons.usu.edu/cgi/viewcontent.cgi?article=7222&context=etd |
Summary: | This study investigated the effectiveness of various analytical methods used for estimating R2 shrinkage in multiple regression analysis. Two categories of analytical formulae were identified: estimators of the population squared multiple correlation coefficient (ρ2), and estimators of the population cross-validity coefficient (ρc2). To avoid possible confounding factors that might be associated with a real data set such as data nonnormality, lack of precise population parameters, different degrees of multicollinearity among the predictor variables, and so forth, the Monte Carlo method was used to simulate multivariate normal sample data, with prespecified population parameters such as the squared multiple correlation coefficient (ρ2), number of predictors, different sample sizes, known degree of multicollinearity, and controlled data normality conditions. Five hundred replicates were simulated within each cell of the sampling conditions. Various analytical formulae were applied to the simulated data in each sampling condition, and the "adjusted" coefficients were obtained and then compared to their corresponding population parameters (ρ2 and ρc2).
Analysis of the results indicates that the currently most widely used (in both SAS and SPSS) "Wherry" formula is probably not the most effective analytical formula in estimating ρ2. Instead, the Pratt formula appeared to outperform other analytical formulae across most of these sampling conditions. Among the analytical formulae designed to estimate ρc2, the Browne formula appeared to be the most effective and stable in minimizing statistical bias across different sampling conditions. The study also concludes that it is the n/p (sample size/number of predictor variables) ratio that affects the performances of these analytical formulae the most; different degrees of multicollinearity among predictor variables do not have dramatic influence on the performances of these analytical formulae. Further replicants on both real and simulated data re still needed to investigate the effectiveness of these analytical formulae. |
---|