Summary: | 碩士 === 國立成功大學 === 統計學系 === 107 === It's well known that the accuracy of MLE of the regression coefficient in logistic regression model is seriously affected by rare events. Less attention is given to the performance of variable selection in logistic regression with rare events. Therefore, this thesis studies the performance of three variable selection methods, LASSO (Least Absolute Shrinkage and Selection Operator), SCAD (Smoothly Clipper Absolute Deviation), and Adaptive LASSO, when event rate is low and the number of explanatory variables is much larger than sample sizes.
A simulation study is conducted to compare the accuracy in selecting important explanatory variables of logistic regression model. Based on limited simulation scenarios, when event rate is as low as 0.05, the simulation results recommended using Adaptive LASSO to select important explanatory variables. Consequently, Adaptive LASSO is recommended for variable selection and prediction with rare events data.
|