Summary: | Abstract Background Standard methods for analysing data from large-scale assessments (LSA) cannot merely be adopted if hierarchical (or multilevel) regression modelling should be applied. Currently various approaches exist; they all follow generally a design-based model of estimation using the pseudo maximum likelihood method and adjusted weights for the corresponding hierarchies. Specifically, several different approaches to using and scaling sampling weights in hierarchical models are promoted, yet no study has compared them to provide evidence of which method performs best and therefore should be preferred. Furthermore, different software programs implement different estimation algorithms, leading to different results. Objective and method In this study, we determine based on a simulation, the estimation procedure showing the smallest distortion to the actual population features. We consider different estimation, optimization and acceleration methods, and different approaches on using sampling weights. Three scenarios have been simulated using the statistical program R. The analyses have been performed with two software packages for hierarchical modelling of LSA data, namely Mplus and SAS. Results and conclusions The simulation results revealed three weighting approaches performing best in retrieving the true population parameters. One of them implies using only level two weights (here: final school weights) and is because of its simple implementation the most favourable one. This finding should provide a clear recommendation to researchers for using weights in multilevel modelling (MLM) when analysing LSA data, or data with a similar structure. Further, we found only little differences in the performance and default settings of the software programs used, with the software package Mplus providing slightly more precise estimates. Different algorithm starting settings or different accelerating methods for optimization could cause these distinctions. However, it should be emphasized that with the recommended weighting approach, both software packages perform equally well. Finally, two scaling techniques for student weights have been investigated. They provide both nearly identical results. We use data from the Programme for International Student Assessment (PISA) 2015 to illustrate the practical importance and relevance of weighting in analysing large-scale assessment data with hierarchical models.
|