Evaluating IRT- and CTT- based methods of estimating classification consistency and accuracy indices from single administrations
Three decision consistency and accuracy (DC/DA) methods, the Livingston and Lewis (LL) method, LEE method, and the Hambleton and Han (HH) method, were evaluated. The purposes of the study were: (1) to evaluate the accuracy and robustness of these methods, especially when their assumptions were not w...
Main Author: | |
---|---|
Language: | ENG |
Published: |
ScholarWorks@UMass Amherst
2011
|
Subjects: | |
Online Access: | https://scholarworks.umass.edu/dissertations/AAI3482610 |
id |
ndltd-UMASS-oai-scholarworks.umass.edu-dissertations-6467 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UMASS-oai-scholarworks.umass.edu-dissertations-64672020-12-02T14:32:35Z Evaluating IRT- and CTT- based methods of estimating classification consistency and accuracy indices from single administrations Deng, Nina Three decision consistency and accuracy (DC/DA) methods, the Livingston and Lewis (LL) method, LEE method, and the Hambleton and Han (HH) method, were evaluated. The purposes of the study were: (1) to evaluate the accuracy and robustness of these methods, especially when their assumptions were not well satisfied, (2) to investigate the “true” DC/DA indices in various conditions, and (3) to assess the impact of choice of reliability estimate on the LL method. Four simulation studies were conducted: Study 1 looked at various test lengths. Study 2 focused on local item dependency (LID). Study 3 checked the consequences of IRT model-data misfit and Study 4 checked the impact of using different scoring metrics. Finally, a real data study was conducted where no advantages were given to any models or assumptions. The results showed that the factors of LID and model misfit had a negative impact on “true” DA index, and made all selected methods over-estimate DA index. On the contrary, the DC estimates had minimal impacts from the above factors, although the LL method had poorer estimates in short tests and the LEE and HH methods were less robust to tests with a high level of LID. Comparing the selected methods, the LEE and HH methods had nearly identical results across all conditions, while the HH method had more flexibility in complex scoring metrics. The LL method was found sensitive to the choice of test reliability estimate. The LL method with Cronbach’s alpha consistently underestimated DC estimates while LL with stratified alpha functioned noticeably better with smaller bias and more robustness in various conditions. Lastly it is hoped to make the software be available soon to permit the wider use of the HH method. The other methods in the study are already well supported by easy to use software. 2011-01-01T08:00:00Z text https://scholarworks.umass.edu/dissertations/AAI3482610 Doctoral Dissertations Available from Proquest ENG ScholarWorks@UMass Amherst Educational tests & measurements|Statistics|Quantitative psychology |
collection |
NDLTD |
language |
ENG |
sources |
NDLTD |
topic |
Educational tests & measurements|Statistics|Quantitative psychology |
spellingShingle |
Educational tests & measurements|Statistics|Quantitative psychology Deng, Nina Evaluating IRT- and CTT- based methods of estimating classification consistency and accuracy indices from single administrations |
description |
Three decision consistency and accuracy (DC/DA) methods, the Livingston and Lewis (LL) method, LEE method, and the Hambleton and Han (HH) method, were evaluated. The purposes of the study were: (1) to evaluate the accuracy and robustness of these methods, especially when their assumptions were not well satisfied, (2) to investigate the “true” DC/DA indices in various conditions, and (3) to assess the impact of choice of reliability estimate on the LL method. Four simulation studies were conducted: Study 1 looked at various test lengths. Study 2 focused on local item dependency (LID). Study 3 checked the consequences of IRT model-data misfit and Study 4 checked the impact of using different scoring metrics. Finally, a real data study was conducted where no advantages were given to any models or assumptions. The results showed that the factors of LID and model misfit had a negative impact on “true” DA index, and made all selected methods over-estimate DA index. On the contrary, the DC estimates had minimal impacts from the above factors, although the LL method had poorer estimates in short tests and the LEE and HH methods were less robust to tests with a high level of LID. Comparing the selected methods, the LEE and HH methods had nearly identical results across all conditions, while the HH method had more flexibility in complex scoring metrics. The LL method was found sensitive to the choice of test reliability estimate. The LL method with Cronbach’s alpha consistently underestimated DC estimates while LL with stratified alpha functioned noticeably better with smaller bias and more robustness in various conditions. Lastly it is hoped to make the software be available soon to permit the wider use of the HH method. The other methods in the study are already well supported by easy to use software. |
author |
Deng, Nina |
author_facet |
Deng, Nina |
author_sort |
Deng, Nina |
title |
Evaluating IRT- and CTT- based methods of estimating classification consistency and accuracy indices from single administrations |
title_short |
Evaluating IRT- and CTT- based methods of estimating classification consistency and accuracy indices from single administrations |
title_full |
Evaluating IRT- and CTT- based methods of estimating classification consistency and accuracy indices from single administrations |
title_fullStr |
Evaluating IRT- and CTT- based methods of estimating classification consistency and accuracy indices from single administrations |
title_full_unstemmed |
Evaluating IRT- and CTT- based methods of estimating classification consistency and accuracy indices from single administrations |
title_sort |
evaluating irt- and ctt- based methods of estimating classification consistency and accuracy indices from single administrations |
publisher |
ScholarWorks@UMass Amherst |
publishDate |
2011 |
url |
https://scholarworks.umass.edu/dissertations/AAI3482610 |
work_keys_str_mv |
AT dengnina evaluatingirtandcttbasedmethodsofestimatingclassificationconsistencyandaccuracyindicesfromsingleadministrations |
_version_ |
1719364499028312064 |