Estimating R2 Shrinkage in Multiple Regression: A Comparison of Different Analytical Methods

This study investigated the effectiveness of various analytical methods used for estimating R2 shrinkage in multiple regression analysis. Two categories of analytical formulae were identified: estimators of the population squared multiple correlation coefficient (ρ2), and estimators of the populatio...

Full description

Bibliographic Details
Main Author: Yin, Ping
Format: Others
Published: DigitalCommons@USU 1999
Subjects:
Online Access:https://digitalcommons.usu.edu/etd/6147
https://digitalcommons.usu.edu/cgi/viewcontent.cgi?article=7222&context=etd
id ndltd-UTAHS-oai-digitalcommons.usu.edu-etd-7222
record_format oai_dc
spelling ndltd-UTAHS-oai-digitalcommons.usu.edu-etd-72222019-10-13T05:33:09Z Estimating R2 Shrinkage in Multiple Regression: A Comparison of Different Analytical Methods Yin, Ping This study investigated the effectiveness of various analytical methods used for estimating R2 shrinkage in multiple regression analysis. Two categories of analytical formulae were identified: estimators of the population squared multiple correlation coefficient (ρ2), and estimators of the population cross-validity coefficient (ρc2). To avoid possible confounding factors that might be associated with a real data set such as data nonnormality, lack of precise population parameters, different degrees of multicollinearity among the predictor variables, and so forth, the Monte Carlo method was used to simulate multivariate normal sample data, with prespecified population parameters such as the squared multiple correlation coefficient (ρ2), number of predictors, different sample sizes, known degree of multicollinearity, and controlled data normality conditions. Five hundred replicates were simulated within each cell of the sampling conditions. Various analytical formulae were applied to the simulated data in each sampling condition, and the "adjusted" coefficients were obtained and then compared to their corresponding population parameters (ρ2 and ρc2). Analysis of the results indicates that the currently most widely used (in both SAS and SPSS) "Wherry" formula is probably not the most effective analytical formula in estimating ρ2. Instead, the Pratt formula appeared to outperform other analytical formulae across most of these sampling conditions. Among the analytical formulae designed to estimate ρc2, the Browne formula appeared to be the most effective and stable in minimizing statistical bias across different sampling conditions. The study also concludes that it is the n/p (sample size/number of predictor variables) ratio that affects the performances of these analytical formulae the most; different degrees of multicollinearity among predictor variables do not have dramatic influence on the performances of these analytical formulae. Further replicants on both real and simulated data re still needed to investigate the effectiveness of these analytical formulae. 1999-05-01T07:00:00Z text application/pdf https://digitalcommons.usu.edu/etd/6147 https://digitalcommons.usu.edu/cgi/viewcontent.cgi?article=7222&context=etd Copyright for this work is held by the author. Transmission or reproduction of materials protected by copyright beyond that allowed by fair use requires the written permission of the copyright owners. Works not in the public domain cannot be commercially exploited without permission of the copyright owner. Responsibility for any use rests exclusively with the user. For more information contact digitalcommons@usu.edu. All Graduate Theses and Dissertations DigitalCommons@USU Estimating shrinkage multiple regression comparison analytical methods Psychology
collection NDLTD
format Others
sources NDLTD
topic Estimating
shrinkage
multiple regression
comparison
analytical methods
Psychology
spellingShingle Estimating
shrinkage
multiple regression
comparison
analytical methods
Psychology
Yin, Ping
Estimating R2 Shrinkage in Multiple Regression: A Comparison of Different Analytical Methods
description This study investigated the effectiveness of various analytical methods used for estimating R2 shrinkage in multiple regression analysis. Two categories of analytical formulae were identified: estimators of the population squared multiple correlation coefficient (ρ2), and estimators of the population cross-validity coefficient (ρc2). To avoid possible confounding factors that might be associated with a real data set such as data nonnormality, lack of precise population parameters, different degrees of multicollinearity among the predictor variables, and so forth, the Monte Carlo method was used to simulate multivariate normal sample data, with prespecified population parameters such as the squared multiple correlation coefficient (ρ2), number of predictors, different sample sizes, known degree of multicollinearity, and controlled data normality conditions. Five hundred replicates were simulated within each cell of the sampling conditions. Various analytical formulae were applied to the simulated data in each sampling condition, and the "adjusted" coefficients were obtained and then compared to their corresponding population parameters (ρ2 and ρc2). Analysis of the results indicates that the currently most widely used (in both SAS and SPSS) "Wherry" formula is probably not the most effective analytical formula in estimating ρ2. Instead, the Pratt formula appeared to outperform other analytical formulae across most of these sampling conditions. Among the analytical formulae designed to estimate ρc2, the Browne formula appeared to be the most effective and stable in minimizing statistical bias across different sampling conditions. The study also concludes that it is the n/p (sample size/number of predictor variables) ratio that affects the performances of these analytical formulae the most; different degrees of multicollinearity among predictor variables do not have dramatic influence on the performances of these analytical formulae. Further replicants on both real and simulated data re still needed to investigate the effectiveness of these analytical formulae.
author Yin, Ping
author_facet Yin, Ping
author_sort Yin, Ping
title Estimating R2 Shrinkage in Multiple Regression: A Comparison of Different Analytical Methods
title_short Estimating R2 Shrinkage in Multiple Regression: A Comparison of Different Analytical Methods
title_full Estimating R2 Shrinkage in Multiple Regression: A Comparison of Different Analytical Methods
title_fullStr Estimating R2 Shrinkage in Multiple Regression: A Comparison of Different Analytical Methods
title_full_unstemmed Estimating R2 Shrinkage in Multiple Regression: A Comparison of Different Analytical Methods
title_sort estimating r2 shrinkage in multiple regression: a comparison of different analytical methods
publisher DigitalCommons@USU
publishDate 1999
url https://digitalcommons.usu.edu/etd/6147
https://digitalcommons.usu.edu/cgi/viewcontent.cgi?article=7222&context=etd
work_keys_str_mv AT yinping estimatingr2shrinkageinmultipleregressionacomparisonofdifferentanalyticalmethods
_version_ 1719266139686567936