Psychometrics of OSCE Standardized Patient Measurements

This study examined the reliability and validity of scores taken from a series of four task simulations used to evaluate medical students. The four role-play exercises represented two different cases or scripts, yielding two pairs of exercises that are considered alternate forms. The design allowed...

Full description

Bibliographic Details
Main Author:	Stilson, Frederick R. B
Format:	Others
Published:	Scholar Commons 2008
Subjects:	Reliability Validity Assessment centers Interdisciplinary Medicine American Studies Arts and Humanities
Online Access:	https://scholarcommons.usf.edu/etd/36 https://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=1035&context=etd

id	ndltd-USF-oai-scholarcommons.usf.edu-etd-1035
record_format	oai_dc
spelling	ndltd-USF-oai-scholarcommons.usf.edu-etd-10352019-10-04T05:12:12Z Psychometrics of OSCE Standardized Patient Measurements Stilson, Frederick R. B This study examined the reliability and validity of scores taken from a series of four task simulations used to evaluate medical students. The four role-play exercises represented two different cases or scripts, yielding two pairs of exercises that are considered alternate forms. The design allowed examining what is essentially the ceiling for reliability and validity of ratings taken in such role plays. A multitrait-multimethod (MTMM) matrix was computed with exercises as methods and competencies (history taking, clinical skills, and communication) as traits. The results within alternate forms (within cases) were then used as a baseline to evaluate the reliability and validity of scores between the alternate forms (between cases). There was much less of an exercise effect (method variance, monomethod bias) in this study than is typically found in MTMM matrices for performance measurement. However, the convergent validity of the dimensions across exercises was weak both within and between cases. The study also examined the reliability of ratings by training raters to watch video recordings of the same four exercises who then complete the same forms used by the standardized patients. Generalizability analysis was used to compute variance components for case, station, rater, and ratee (medical student), which allowed the computation of reliability estimates for multiple designs. Both the generalizability analysis and the MTMM analysis indicated that rather long examinations (approximately 20 to 40 exercises) would be needed to create reliable examination scores for this population of examinees. Additionally, interjudge agreement was better for more objective dimensions (history taking, physical examination) than for the more subjective dimension (communication). 2008-05-09T07:00:00Z text application/pdf https://scholarcommons.usf.edu/etd/36 https://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=1035&context=etd default Graduate Theses and Dissertations Scholar Commons Reliability Validity Assessment centers Interdisciplinary Medicine American Studies Arts and Humanities
collection	NDLTD
format	Others
sources	NDLTD
topic	Reliability Validity Assessment centers Interdisciplinary Medicine American Studies Arts and Humanities
spellingShingle	Reliability Validity Assessment centers Interdisciplinary Medicine American Studies Arts and Humanities Stilson, Frederick R. B Psychometrics of OSCE Standardized Patient Measurements
description	This study examined the reliability and validity of scores taken from a series of four task simulations used to evaluate medical students. The four role-play exercises represented two different cases or scripts, yielding two pairs of exercises that are considered alternate forms. The design allowed examining what is essentially the ceiling for reliability and validity of ratings taken in such role plays. A multitrait-multimethod (MTMM) matrix was computed with exercises as methods and competencies (history taking, clinical skills, and communication) as traits. The results within alternate forms (within cases) were then used as a baseline to evaluate the reliability and validity of scores between the alternate forms (between cases). There was much less of an exercise effect (method variance, monomethod bias) in this study than is typically found in MTMM matrices for performance measurement. However, the convergent validity of the dimensions across exercises was weak both within and between cases. The study also examined the reliability of ratings by training raters to watch video recordings of the same four exercises who then complete the same forms used by the standardized patients. Generalizability analysis was used to compute variance components for case, station, rater, and ratee (medical student), which allowed the computation of reliability estimates for multiple designs. Both the generalizability analysis and the MTMM analysis indicated that rather long examinations (approximately 20 to 40 exercises) would be needed to create reliable examination scores for this population of examinees. Additionally, interjudge agreement was better for more objective dimensions (history taking, physical examination) than for the more subjective dimension (communication).
author	Stilson, Frederick R. B
author_facet	Stilson, Frederick R. B
author_sort	Stilson, Frederick R. B
title	Psychometrics of OSCE Standardized Patient Measurements
title_short	Psychometrics of OSCE Standardized Patient Measurements
title_full	Psychometrics of OSCE Standardized Patient Measurements
title_fullStr	Psychometrics of OSCE Standardized Patient Measurements
title_full_unstemmed	Psychometrics of OSCE Standardized Patient Measurements
title_sort	psychometrics of osce standardized patient measurements
publisher	Scholar Commons
publishDate	2008
url	https://scholarcommons.usf.edu/etd/36 https://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=1035&context=etd
work_keys_str_mv	AT stilsonfrederickrb psychometricsofoscestandardizedpatientmeasurements
_version_	1719260082677481472

Psychometrics of OSCE Standardized Patient Measurements

Similar Items