Summary: | In order to glean more information from examinees' incorrect responses, various option weighting methods have been investigated. Option weighting involves giving partial credit for incorrect responses. The amount of credit for selection of a particular option may be determined in a number of ways. This study focuses on a method referred to as empirical option weighting. Empirical option weighting is based on the statistical relationship between the option and some criterion. Total scores are calculated by summing the points for each response. This method of scoring is referred to as polytomous scoring. Research in the area of empirical option weighting and polytomous scoring has demonstrated that use of option weighted scoring produces only slight gains in reliability, and gains in reliability are often not accompanied by gains in predictive validity. The rationale for this study was that reliability and predictive validity will be differentially affected by the choice of a criterion on which option weights are based. The hypotheses advanced in this study were: (1) Polytomous scoring using empirical option weights based on an external criterion (dependent variable) would maximize predictive validity over other methods of scoring. (2) Polytomous scoring using empirical option weights based on an internal criterion (total score) would maximize reliability. (3) Psychometrically poor items would be improved in terms of their contribution to reliability and predictive validity through the use of empirical option weighting. Although the results of this study were inconsistent with respect to increasing predictive validity by basing weights on the variable to be predicted, this was explained by the inconsistency in the criterion over cross-validations. In terms of maximizing internal consistency by basing weights on an internal criterion (total score) the results were more consistent. The results suggested that use of an internal criterion tended to maximize internal consistency compared to other methods of weighting. Finally, a comparison of psychometrically good and poor items clearly demonstrated that poor items are significantly improved, both in terms of reliability and predictive validity when scoring used weights based on the variable to be predicted.
|