Summary: | 博士 === 國立臺灣大學 === 流行病學與預防醫學研究所 === 99 === Board certification examinations for medical specialists aim to evaluate whether an examinee is competent to exceed minimum requirement for clinical practice. Although board certification examinations are of paramount importance to the quality of medical care, there is still lack of thorough investigations which focused on item response analyses of board certification examinations in a medical specialty. Item responses in a test are influenced by the examinee ability and item difficulty which require an in-depth statistical analysis. Therefore, the major goal of this thesis was to conduct comprehensive item response analyses on written tests of the Taiwanese board certification examinations in anesthesiology from 2007 to 2010 using a series of item response theory models.
Data were derived from one hundred multiple choice items with single best answer included in each certification examination. The number of examinees ranged from 34 to 37 in each year for these four years. Two analytical strategies were applied to the item response analyses on the written tests of the Taiwanese board certification examinations in anesthesiology. The maximum likelihood estimation (MLE) method was used at first to estimate the parameters of the examinee ability and item difficulty and evaluate test reliability based on the one-parameter logistic (1-PL) model, so-called the Rasch model. Bayesian item response analyses were applied to dealing with more complicated item response models, including the two-parameter logistic (2-PL, considering item discrimination) and three-parameter logistic models (3-PL, considering guessing parameter). Bayesian approach was also used to assess the effects of covariate such as age gender, and geographic area on examinee ability. Bayesian multi-level model was also adopted to consider hierarchical data resulting from the correlation of item response within the same training center.
The test reliability of written tests of board certification examination in Taiwan ranged between 0.71 and 0.75 in these four years. Both analytical approaches could estimate parameters of examinee ability and item difficulty in the one-parameter logistic item response model but the MLE methods encountered convergence problems during parameter estimation of the 2-PL and 3-PL item response models. The 3-PL model without restriction on guessing parameters based on Bayesian methods may lead to overparameterization. The common guessing parameters in the restricted 3-PL models with Bayesian approach were close to 0 in all the certification examination in anesthesiology held during the four-year study period. Model comparisons based on deviance information criteria provided evidence in favor of the 1-PL model. The effects of examinee characteristics such as gender, age and location of training centers on ability levels of examinees were not statistically significant. The application of multi-level Bayesian model to hierarchical data revealed correlation between ability levels of examinees from the same training centers. The effect of training center on examinee ability was not salient.
This thesis demonstrates that item response analyses on written tests of the Taiwanese board certification examinations can provide useful information on test development in the future. The flexibility and versatility of Bayesian item response analyses were of great value for test analysis on written tests of the Taiwanese board certification examinations in anesthesiology.
|