Psychometrics of OSCE Standardized Patient Measurements

This study examined the reliability and validity of scores taken from a series of four task simulations used to evaluate medical students. The four role-play exercises represented two different cases or scripts, yielding two pairs of exercises that are considered alternate forms. The design allowed...

Full description

Bibliographic Details
Main Author: Stilson, Frederick R. B
Format: Others
Published: Scholar Commons 2008
Subjects:
Online Access:https://scholarcommons.usf.edu/etd/36
https://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=1035&context=etd
id ndltd-USF-oai-scholarcommons.usf.edu-etd-1035
record_format oai_dc
spelling ndltd-USF-oai-scholarcommons.usf.edu-etd-10352019-10-04T05:12:12Z Psychometrics of OSCE Standardized Patient Measurements Stilson, Frederick R. B This study examined the reliability and validity of scores taken from a series of four task simulations used to evaluate medical students. The four role-play exercises represented two different cases or scripts, yielding two pairs of exercises that are considered alternate forms. The design allowed examining what is essentially the ceiling for reliability and validity of ratings taken in such role plays. A multitrait-multimethod (MTMM) matrix was computed with exercises as methods and competencies (history taking, clinical skills, and communication) as traits. The results within alternate forms (within cases) were then used as a baseline to evaluate the reliability and validity of scores between the alternate forms (between cases). There was much less of an exercise effect (method variance, monomethod bias) in this study than is typically found in MTMM matrices for performance measurement. However, the convergent validity of the dimensions across exercises was weak both within and between cases. The study also examined the reliability of ratings by training raters to watch video recordings of the same four exercises who then complete the same forms used by the standardized patients. Generalizability analysis was used to compute variance components for case, station, rater, and ratee (medical student), which allowed the computation of reliability estimates for multiple designs. Both the generalizability analysis and the MTMM analysis indicated that rather long examinations (approximately 20 to 40 exercises) would be needed to create reliable examination scores for this population of examinees. Additionally, interjudge agreement was better for more objective dimensions (history taking, physical examination) than for the more subjective dimension (communication). 2008-05-09T07:00:00Z text application/pdf https://scholarcommons.usf.edu/etd/36 https://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=1035&context=etd default Graduate Theses and Dissertations Scholar Commons Reliability Validity Assessment centers Interdisciplinary Medicine American Studies Arts and Humanities
collection NDLTD
format Others
sources NDLTD
topic Reliability
Validity
Assessment centers
Interdisciplinary
Medicine
American Studies
Arts and Humanities
spellingShingle Reliability
Validity
Assessment centers
Interdisciplinary
Medicine
American Studies
Arts and Humanities
Stilson, Frederick R. B
Psychometrics of OSCE Standardized Patient Measurements
description This study examined the reliability and validity of scores taken from a series of four task simulations used to evaluate medical students. The four role-play exercises represented two different cases or scripts, yielding two pairs of exercises that are considered alternate forms. The design allowed examining what is essentially the ceiling for reliability and validity of ratings taken in such role plays. A multitrait-multimethod (MTMM) matrix was computed with exercises as methods and competencies (history taking, clinical skills, and communication) as traits. The results within alternate forms (within cases) were then used as a baseline to evaluate the reliability and validity of scores between the alternate forms (between cases). There was much less of an exercise effect (method variance, monomethod bias) in this study than is typically found in MTMM matrices for performance measurement. However, the convergent validity of the dimensions across exercises was weak both within and between cases. The study also examined the reliability of ratings by training raters to watch video recordings of the same four exercises who then complete the same forms used by the standardized patients. Generalizability analysis was used to compute variance components for case, station, rater, and ratee (medical student), which allowed the computation of reliability estimates for multiple designs. Both the generalizability analysis and the MTMM analysis indicated that rather long examinations (approximately 20 to 40 exercises) would be needed to create reliable examination scores for this population of examinees. Additionally, interjudge agreement was better for more objective dimensions (history taking, physical examination) than for the more subjective dimension (communication).
author Stilson, Frederick R. B
author_facet Stilson, Frederick R. B
author_sort Stilson, Frederick R. B
title Psychometrics of OSCE Standardized Patient Measurements
title_short Psychometrics of OSCE Standardized Patient Measurements
title_full Psychometrics of OSCE Standardized Patient Measurements
title_fullStr Psychometrics of OSCE Standardized Patient Measurements
title_full_unstemmed Psychometrics of OSCE Standardized Patient Measurements
title_sort psychometrics of osce standardized patient measurements
publisher Scholar Commons
publishDate 2008
url https://scholarcommons.usf.edu/etd/36
https://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=1035&context=etd
work_keys_str_mv AT stilsonfrederickrb psychometricsofoscestandardizedpatientmeasurements
_version_ 1719260082677481472