A comparison study of IRT calibration methods for mixed-format tests in vertical scaling

The purpose of this dissertation is to investigate how using different IRT calibration methods may affect student achievement growth pattern recovery. In this study, 96 vertical scales (4 × 2 × 2 × 2 ×3) are constructed using different combinations of IRT calibration methods (separate, pair-wise con...

Full description

Bibliographic Details
Main Author:	Meng, Huijuan
Other Authors:	Vispoel, Walter P.
Format:	Others
Language:	English
Published:	University of Iowa 2007
Subjects:	Education
Online Access:	https://ir.uiowa.edu/etd/338 https://ir.uiowa.edu/cgi/viewcontent.cgi?article=1523&context=etd

id	ndltd-uiowa.edu-oai-ir.uiowa.edu-etd-1523
record_format	oai_dc
spelling	ndltd-uiowa.edu-oai-ir.uiowa.edu-etd-15232019-10-13T04:55:24Z A comparison study of IRT calibration methods for mixed-format tests in vertical scaling Meng, Huijuan The purpose of this dissertation is to investigate how using different IRT calibration methods may affect student achievement growth pattern recovery. In this study, 96 vertical scales (4 × 2 × 2 × 2 ×3) are constructed using different combinations of IRT calibration methods (separate, pair-wise concurrent, semi-concurrent, & concurrent), lengths of common-item set (10 vs. 20 common items), types of common item set (dichotomous only vs. dichotomous and polytomous), and numbers of polytomous item (6 vs. 12) for 3 simulated datasets which differ in sample size (500, 1000, 5000 per grade). Three criteria (RMSE, SE and bias) are used to evaluate the performance of these calibration methods on proficiency score distribution recovery over 40 replications. The results suggest that for data used in this study, when parameters of interest are related to measuring students' growth (i.e., proficiency score mean and effect size), pair-wise concurrent calibration overall produced the most accurate results. When parameters of interest are related to performance variability (i.e., standard deviation), concurrent calibration in general produced the most stable and accurate estimates. When the emphasis is to classify students' performance accurately, with the increase of sample size, taken collectively, pair-wise concurrent and semi-concurrent calibration outperformed concurrent and separate calibration. Overall, pair-wise concurrent was more effective than the other methods in constructing a vertical scale and use of either separate or concurrent calibration to create a vertical scale seems least warranted. In addition, it is observed that (1) Larger sample size stabilized estimation results and reduced error; (2) Compared to tests containing 10 common items, errors and biases were in general smaller for tests with 20 common items; (3) Compared to tests containing a mixed-format common-item set, errors and biases were usually smaller for tests containing a dichotomous-only common-item set; (4) For tests containing a mixed-format common-item set, errors and biases were in general smaller for tests containing more polytomous items; and (5) For tests containing a dichotomous-only common-item set, increasing the number of polytomous items did not necessarily either reduce or increase errors and biases. 2007-12-01T08:00:00Z dissertation application/pdf https://ir.uiowa.edu/etd/338 https://ir.uiowa.edu/cgi/viewcontent.cgi?article=1523&context=etd Copyright 2007 Huijuan Meng Theses and Dissertations eng University of IowaVispoel, Walter P. Lee, Won-Chan Education
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Education
spellingShingle	Education Meng, Huijuan A comparison study of IRT calibration methods for mixed-format tests in vertical scaling
description	The purpose of this dissertation is to investigate how using different IRT calibration methods may affect student achievement growth pattern recovery. In this study, 96 vertical scales (4 × 2 × 2 × 2 ×3) are constructed using different combinations of IRT calibration methods (separate, pair-wise concurrent, semi-concurrent, & concurrent), lengths of common-item set (10 vs. 20 common items), types of common item set (dichotomous only vs. dichotomous and polytomous), and numbers of polytomous item (6 vs. 12) for 3 simulated datasets which differ in sample size (500, 1000, 5000 per grade). Three criteria (RMSE, SE and bias) are used to evaluate the performance of these calibration methods on proficiency score distribution recovery over 40 replications. The results suggest that for data used in this study, when parameters of interest are related to measuring students' growth (i.e., proficiency score mean and effect size), pair-wise concurrent calibration overall produced the most accurate results. When parameters of interest are related to performance variability (i.e., standard deviation), concurrent calibration in general produced the most stable and accurate estimates. When the emphasis is to classify students' performance accurately, with the increase of sample size, taken collectively, pair-wise concurrent and semi-concurrent calibration outperformed concurrent and separate calibration. Overall, pair-wise concurrent was more effective than the other methods in constructing a vertical scale and use of either separate or concurrent calibration to create a vertical scale seems least warranted. In addition, it is observed that (1) Larger sample size stabilized estimation results and reduced error; (2) Compared to tests containing 10 common items, errors and biases were in general smaller for tests with 20 common items; (3) Compared to tests containing a mixed-format common-item set, errors and biases were usually smaller for tests containing a dichotomous-only common-item set; (4) For tests containing a mixed-format common-item set, errors and biases were in general smaller for tests containing more polytomous items; and (5) For tests containing a dichotomous-only common-item set, increasing the number of polytomous items did not necessarily either reduce or increase errors and biases.
author2	Vispoel, Walter P.
author_facet	Vispoel, Walter P. Meng, Huijuan
author	Meng, Huijuan
author_sort	Meng, Huijuan
title	A comparison study of IRT calibration methods for mixed-format tests in vertical scaling
title_short	A comparison study of IRT calibration methods for mixed-format tests in vertical scaling
title_full	A comparison study of IRT calibration methods for mixed-format tests in vertical scaling
title_fullStr	A comparison study of IRT calibration methods for mixed-format tests in vertical scaling
title_full_unstemmed	A comparison study of IRT calibration methods for mixed-format tests in vertical scaling
title_sort	comparison study of irt calibration methods for mixed-format tests in vertical scaling
publisher	University of Iowa
publishDate	2007
url	https://ir.uiowa.edu/etd/338 https://ir.uiowa.edu/cgi/viewcontent.cgi?article=1523&context=etd
work_keys_str_mv	AT menghuijuan acomparisonstudyofirtcalibrationmethodsformixedformattestsinverticalscaling AT menghuijuan comparisonstudyofirtcalibrationmethodsformixedformattestsinverticalscaling
_version_	1719265024803864576

A comparison study of IRT calibration methods for mixed-format tests in vertical scaling

Similar Items