Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary Monte Carlo study

Abstract Background Computerized adaptive testing (CAT) is being applied to health outcome measures developed as paper-and-pencil (P&P) instruments. Differences in how respondents answer items administered by CAT vs. P&P can increase error in CA...

Full description

Bibliographic Details
Main Authors:	Riley Barth B, Carle Adam C
Format:	Article
Language:	English
Published:	BMC 2012-08-01
Series:	BMC Medical Research Methodology
Online Access:	http://www.biomedcentral.com/1471-2288/12/124

id	doaj-361f9d845e9d4e6cb75ec13461624b86
record_format	Article
spelling	doaj-361f9d845e9d4e6cb75ec13461624b862020-11-24T23:53:57ZengBMCBMC Medical Research Methodology1471-22882012-08-0112112410.1186/1471-2288-12-124Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary Monte Carlo studyRiley Barth BCarle Adam C<p>Abstract</p> <p>Background</p> <p>Computerized adaptive testing (CAT) is being applied to health outcome measures developed as paper-and-pencil (P&P) instruments. Differences in how respondents answer items administered by CAT vs. P&P can increase error in CAT-estimated measures if not identified and corrected.</p> <p>Method</p> <p>Two methods for detecting item-level mode effects are proposed using Bayesian estimation of posterior distributions of item parameters: (1) a modified robust <it>Z</it> (<it>RZ</it>) test, and (2) 95% credible intervals (<it>CrI</it>) for the CAT-P&P difference in item difficulty. A simulation study was conducted under the following conditions: (1) data-generating model (one- vs. two-parameter IRT model); (2) moderate vs. large DIF sizes; (3) percentage of DIF items (10% vs. 30%), and (4) mean difference in <it>θ</it> estimates across modes of 0 vs. 1 logits. This resulted in a total of 16 conditions with 10 generated datasets per condition.</p> <p>Results</p> <p>Both methods evidenced good to excellent false positive control, with <it>RZ</it> providing better control of false positives and with slightly higher power for <it>CrI</it>, irrespective of measurement model. False positives increased when items were very easy to endorse and when there with mode differences in mean trait level. True positives were predicted by CAT item usage, absolute item difficulty and item discrimination. <it>RZ</it> outperformed <it>CrI</it>, due to better control of false positive DIF.</p> <p>Conclusions</p> <p>Whereas false positives were well controlled, particularly for <it>RZ</it>, power to detect DIF was suboptimal. Research is needed to examine the robustness of these methods under varying prior assumptions concerning the distribution of item and person parameters and when data fail to conform to prior assumptions. False identification of DIF when items were very easy to endorse is a problem warranting additional investigation.</p> http://www.biomedcentral.com/1471-2288/12/124
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Riley Barth B Carle Adam C
spellingShingle	Riley Barth B Carle Adam C Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary Monte Carlo study BMC Medical Research Methodology
author_facet	Riley Barth B Carle Adam C
author_sort	Riley Barth B
title	Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary Monte Carlo study
title_short	Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary Monte Carlo study
title_full	Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary Monte Carlo study
title_fullStr	Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary Monte Carlo study
title_full_unstemmed	Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary Monte Carlo study
title_sort	comparison of two bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary monte carlo study
publisher	BMC
series	BMC Medical Research Methodology
issn	1471-2288
publishDate	2012-08-01
description	<p>Abstract</p> <p>Background</p> <p>Computerized adaptive testing (CAT) is being applied to health outcome measures developed as paper-and-pencil (P&P) instruments. Differences in how respondents answer items administered by CAT vs. P&P can increase error in CAT-estimated measures if not identified and corrected.</p> <p>Method</p> <p>Two methods for detecting item-level mode effects are proposed using Bayesian estimation of posterior distributions of item parameters: (1) a modified robust <it>Z</it> (<it>RZ</it>) test, and (2) 95% credible intervals (<it>CrI</it>) for the CAT-P&P difference in item difficulty. A simulation study was conducted under the following conditions: (1) data-generating model (one- vs. two-parameter IRT model); (2) moderate vs. large DIF sizes; (3) percentage of DIF items (10% vs. 30%), and (4) mean difference in <it>θ</it> estimates across modes of 0 vs. 1 logits. This resulted in a total of 16 conditions with 10 generated datasets per condition.</p> <p>Results</p> <p>Both methods evidenced good to excellent false positive control, with <it>RZ</it> providing better control of false positives and with slightly higher power for <it>CrI</it>, irrespective of measurement model. False positives increased when items were very easy to endorse and when there with mode differences in mean trait level. True positives were predicted by CAT item usage, absolute item difficulty and item discrimination. <it>RZ</it> outperformed <it>CrI</it>, due to better control of false positive DIF.</p> <p>Conclusions</p> <p>Whereas false positives were well controlled, particularly for <it>RZ</it>, power to detect DIF was suboptimal. Research is needed to examine the robustness of these methods under varying prior assumptions concerning the distribution of item and person parameters and when data fail to conform to prior assumptions. False identification of DIF when items were very easy to endorse is a problem warranting additional investigation.</p>
url	http://www.biomedcentral.com/1471-2288/12/124
work_keys_str_mv	AT rileybarthb comparisonoftwobayesianmethodstodetectmodeeffectsbetweenpaperbasedandcomputerizedadaptiveassessmentsapreliminarymontecarlostudy AT carleadamc comparisonoftwobayesianmethodstodetectmodeeffectsbetweenpaperbasedandcomputerizedadaptiveassessmentsapreliminarymontecarlostudy
_version_	1725468010162946048

Comparison of two Bayesian methods to detect mode effects between paper-based and computerized adaptive assessments: a preliminary Monte Carlo study

Similar Items