Summary: | 碩士 === 國立中正大學 === 心理學研究所 === 101 === Differential item functioning (DIF) occurs when subgroups of test takers have equal trait levels but differ in their probabilities of a correct response. Many simulation studies have been done to examine the performance of these methods to flag DIF items. However, among these studies, there is little attention to the effects on DIF detection methods of the difference in ability variance between two groups. Thus, the aim of this study is to examine how the difference combinations of ability variance and mean between reference and focal groups affect four multiple indicators–multiple causes (MIMIC) methods, namely, the standard MIMIC method (M-ST), the MIMIC method with scale purification (M-SP), the MIMIC method with a pure anchor (M-PA), and the standard MIMIC method with a pure anchor (M-STPA). In a series of simulations, it appeared that (1) under mean difference in ability, all four methods yielded a well- controlled Type I error rate when tests did not contain any DIF items. M-ST and M-SP began to yield an inflated Type I error rate and a deflated power when tests contained 20% and 40% DIF items, respectively. M-PA and M-STPA maintained an expected Type I error rate and a high power even when tests contained as many as 40% DIF items; (2) the difference in ability variance inflates the Type I errors for all the DIF detection methods; (3) when both mean difference in ability and difference in ability variance existed: (i) M-STPA maintained an expected Type I error rate when focal groups had smaller ability variance; (ii) all the DIF detection methods yield an inflated Type I error rate when focal groups had bigger ability variance. Test length appeared to have effect in M-STPA.
|