An investigation of alternative approaches to scoring multiple response items on a certification exam

Multiple-response (MR) items are items that have more than one correct answer. This item type is often used in licensure and achievement tests to accommodate situations where identification of a single correct answer no longer suffices or where multiple steps are required in solving a problem. MR it...

Full description

Bibliographic Details
Main Author: Ma, Xiaoying
Language:ENG
Published: ScholarWorks@UMass Amherst 2004
Subjects:
Online Access:https://scholarworks.umass.edu/dissertations/AAI3118315
Description
Summary:Multiple-response (MR) items are items that have more than one correct answer. This item type is often used in licensure and achievement tests to accommodate situations where identification of a single correct answer no longer suffices or where multiple steps are required in solving a problem. MR items can be scored either dichotomously or polytomously. Polytomous scoring of MR items often employs some type of option weighting to assign differential point values to each of the response options. Weights for each option are defined a priori by expert judgments or derived empirically from item analysis. Studies examining the reliability and validity of differential option weighting methods have been based on classical test theory. Little or no research has been done to examine the usefulness of item response theory (IRT) models for deriving empirical weights, or to compare the effectiveness of different option weighting methods. The purposes of this study, therefore, were to investigate polytomous scoring methods for MR items and to evaluate the impacts different scoring methods may have on the reliability of the test scores, item and test information functions, as well as on measurement efficiency and classification accuracy. Results from this study indicate that polytomous scoring of the MR items did not significantly increase the reliability of the test, nor did it increase the test information functions drastically, probably due to 2/3 of the items being multiple-choice items, scored the same way across comparisons. However, substantial increase in test information function at the lower end of the score scale was observed under polytomous scoring schema. With respect to classification accuracy, the results were inconsistent across different samples; therefore, further study is needed. In summary, findings from this study suggest that polytomous scoring of MR items has the potential to increase the efficiency (as shown in increase in test information functions) of measurement and the accuracy of classification. Realizing these advantages, however, will be contingent on the quality and quantity of the MR items on the test. Further research is needed to evaluate the quality of the MR items and its effect on the effectiveness of polytomous scoring.