Impact of differential item functioning on statistical conclusions

Differential item functioning (DIF), sometimes called item bias, has been widely studied in educational and psychological measurement; however, to date, research has focused on the definitions of, and the methods for, detecting DIF. It is well accepted that the presence of DIF may degrade the validi...

Full description

Bibliographic Details
Main Author: Li, Zhen
Format: Others
Language:English
Published: University of British Columbia 2009
Online Access:http://hdl.handle.net/2429/14680
id ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.-14680
record_format oai_dc
spelling ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.-146802013-06-05T04:18:11ZImpact of differential item functioning on statistical conclusionsLi, ZhenDifferential item functioning (DIF), sometimes called item bias, has been widely studied in educational and psychological measurement; however, to date, research has focused on the definitions of, and the methods for, detecting DIF. It is well accepted that the presence of DIF may degrade the validity of a test. There is relatively little known, however, about the impact of DIF on later statistical decisions when one uses the observed test scores in data analyses and corresponding statistical hypothesis tests. This dissertation investigated the impact of DIF on later statistical decisions based on the observed total test (or scale) score. Very little is known in the literature about the impact of DIF on the Type I error rate and effect size of, for instance, the independent samples t-test on the observed total test scores. Five studies were conducted: studies one to three investigated the impact of unidirectional DIF (i.e., DIF amplification) on the Type I error rate and effect size of the independent samples t-test; studies four and five investigated the DIF cancellation effects on the Type I error rate and effect size of the independent samples t-test. The Type I error rate and effect size were defined in terms of latent population means rather than observed sample means. The results showed that the amplification and cancellation effects among uniform DIF items did transfer to the test level. Both the Type I error rate and effect size were inflated. The degree of inflation depends on the number of DIF items, magnitude of DIF, sample sizes, and interactions among these factors. These findings highlight the importance of screening DIF before conducting any further statistical analysis. It offers advice to practicing researchers about when and how much the presence of DIF will affect their statistical conclusions based on the total observed test scores.University of British Columbia2009-11-05T21:04:48Z2009-11-05T21:04:48Z20092009-11-05T21:04:48Z2010-05Electronic Thesis or Dissertation735614 bytesapplication/pdfhttp://hdl.handle.net/2429/14680eng
collection NDLTD
language English
format Others
sources NDLTD
description Differential item functioning (DIF), sometimes called item bias, has been widely studied in educational and psychological measurement; however, to date, research has focused on the definitions of, and the methods for, detecting DIF. It is well accepted that the presence of DIF may degrade the validity of a test. There is relatively little known, however, about the impact of DIF on later statistical decisions when one uses the observed test scores in data analyses and corresponding statistical hypothesis tests. This dissertation investigated the impact of DIF on later statistical decisions based on the observed total test (or scale) score. Very little is known in the literature about the impact of DIF on the Type I error rate and effect size of, for instance, the independent samples t-test on the observed total test scores. Five studies were conducted: studies one to three investigated the impact of unidirectional DIF (i.e., DIF amplification) on the Type I error rate and effect size of the independent samples t-test; studies four and five investigated the DIF cancellation effects on the Type I error rate and effect size of the independent samples t-test. The Type I error rate and effect size were defined in terms of latent population means rather than observed sample means. The results showed that the amplification and cancellation effects among uniform DIF items did transfer to the test level. Both the Type I error rate and effect size were inflated. The degree of inflation depends on the number of DIF items, magnitude of DIF, sample sizes, and interactions among these factors. These findings highlight the importance of screening DIF before conducting any further statistical analysis. It offers advice to practicing researchers about when and how much the presence of DIF will affect their statistical conclusions based on the total observed test scores.
author Li, Zhen
spellingShingle Li, Zhen
Impact of differential item functioning on statistical conclusions
author_facet Li, Zhen
author_sort Li, Zhen
title Impact of differential item functioning on statistical conclusions
title_short Impact of differential item functioning on statistical conclusions
title_full Impact of differential item functioning on statistical conclusions
title_fullStr Impact of differential item functioning on statistical conclusions
title_full_unstemmed Impact of differential item functioning on statistical conclusions
title_sort impact of differential item functioning on statistical conclusions
publisher University of British Columbia
publishDate 2009
url http://hdl.handle.net/2429/14680
work_keys_str_mv AT lizhen impactofdifferentialitemfunctioningonstatisticalconclusions
_version_ 1716587163369340928