Impact of differential item functioning on statistical conclusions
Differential item functioning (DIF), sometimes called item bias, has been widely studied in educational and psychological measurement; however, to date, research has focused on the definitions of, and the methods for, detecting DIF. It is well accepted that the presence of DIF may degrade the validi...
Main Author: | |
---|---|
Language: | English |
Published: |
University of British Columbia
2009
|
Online Access: | http://hdl.handle.net/2429/14680 |
id |
ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-14680 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-146802014-03-26T03:36:40Z Impact of differential item functioning on statistical conclusions Li, Zhen Differential item functioning (DIF), sometimes called item bias, has been widely studied in educational and psychological measurement; however, to date, research has focused on the definitions of, and the methods for, detecting DIF. It is well accepted that the presence of DIF may degrade the validity of a test. There is relatively little known, however, about the impact of DIF on later statistical decisions when one uses the observed test scores in data analyses and corresponding statistical hypothesis tests. This dissertation investigated the impact of DIF on later statistical decisions based on the observed total test (or scale) score. Very little is known in the literature about the impact of DIF on the Type I error rate and effect size of, for instance, the independent samples t-test on the observed total test scores. Five studies were conducted: studies one to three investigated the impact of unidirectional DIF (i.e., DIF amplification) on the Type I error rate and effect size of the independent samples t-test; studies four and five investigated the DIF cancellation effects on the Type I error rate and effect size of the independent samples t-test. The Type I error rate and effect size were defined in terms of latent population means rather than observed sample means. The results showed that the amplification and cancellation effects among uniform DIF items did transfer to the test level. Both the Type I error rate and effect size were inflated. The degree of inflation depends on the number of DIF items, magnitude of DIF, sample sizes, and interactions among these factors. These findings highlight the importance of screening DIF before conducting any further statistical analysis. It offers advice to practicing researchers about when and how much the presence of DIF will affect their statistical conclusions based on the total observed test scores. 2009-11-05T21:04:48Z 2009-11-05T21:04:48Z 2009 2009-11-05T21:04:48Z 2010-05 Electronic Thesis or Dissertation http://hdl.handle.net/2429/14680 eng University of British Columbia |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
description |
Differential item functioning (DIF), sometimes called item bias, has been widely studied in educational and psychological measurement; however, to date, research has focused on the definitions of, and the methods for, detecting DIF. It is well accepted that the presence of DIF may degrade the validity of a test. There is relatively little known, however, about the impact of DIF on later statistical decisions when one uses the observed test scores in data analyses and corresponding statistical hypothesis tests. This dissertation investigated the impact of DIF on later statistical decisions based on the observed total test (or scale) score. Very little is known in the literature about the impact of DIF on the Type I error rate and effect size of, for instance, the independent samples t-test on the observed total test scores. Five studies were conducted: studies one to three investigated the impact of unidirectional DIF (i.e., DIF amplification) on the Type I error rate and effect size of the independent samples t-test; studies four and five investigated the DIF cancellation effects on the Type I error rate and effect size of the independent samples t-test. The Type I error rate and effect size were defined in terms of latent population means rather than observed sample means. The results showed that the amplification and cancellation effects among uniform DIF items did transfer to the test level. Both the Type I error rate and effect size were inflated. The degree of inflation depends on the number of DIF items, magnitude of DIF, sample sizes, and interactions among these factors. These findings highlight the importance of screening DIF before conducting any further statistical analysis. It offers advice to practicing researchers about when and how much the presence of DIF will affect their statistical conclusions based on the total observed test scores. |
author |
Li, Zhen |
spellingShingle |
Li, Zhen Impact of differential item functioning on statistical conclusions |
author_facet |
Li, Zhen |
author_sort |
Li, Zhen |
title |
Impact of differential item functioning on statistical conclusions |
title_short |
Impact of differential item functioning on statistical conclusions |
title_full |
Impact of differential item functioning on statistical conclusions |
title_fullStr |
Impact of differential item functioning on statistical conclusions |
title_full_unstemmed |
Impact of differential item functioning on statistical conclusions |
title_sort |
impact of differential item functioning on statistical conclusions |
publisher |
University of British Columbia |
publishDate |
2009 |
url |
http://hdl.handle.net/2429/14680 |
work_keys_str_mv |
AT lizhen impactofdifferentialitemfunctioningonstatisticalconclusions |
_version_ |
1716655192845320192 |