The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations

<p>Abstract</p> <p>Background</p> <p>Cronbach's alpha is widely used as the preferred index of reliability for medical postgraduate examinations. A value of 0.8-0.9 is seen by providers and regulators alike as an adequate demonstration of acceptable reliability for...

Full description

Bibliographic Details
Main Authors: Dewhurst Neil G, Chis Liliana, McManus IC, Tighe Jane, Mucklow John
Format: Article
Language:English
Published: BMC 2010-06-01
Series:BMC Medical Education
Online Access:http://www.biomedcentral.com/1472-6920/10/40
id doaj-b4324b8d00e746c386205b793889601e
record_format Article
spelling doaj-b4324b8d00e746c386205b793889601e2020-11-25T03:11:12ZengBMCBMC Medical Education1472-69202010-06-011014010.1186/1472-6920-10-40The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinationsDewhurst Neil GChis LilianaMcManus ICTighe JaneMucklow John<p>Abstract</p> <p>Background</p> <p>Cronbach's alpha is widely used as the preferred index of reliability for medical postgraduate examinations. A value of 0.8-0.9 is seen by providers and regulators alike as an adequate demonstration of acceptable reliability for any assessment. Of the other statistical parameters, Standard Error of Measurement (SEM) is mainly seen as useful only in determining the accuracy of a pass mark. However the alpha coefficient depends both on SEM and on the ability range (standard deviation, SD) of candidates taking an exam. This study investigated the extent to which the necessarily narrower ability range in candidates taking the second of the three part MRCP(UK) diploma examinations, biases assessment of reliability and SEM.</p> <p>Methods</p> <p>a) The interrelationships of standard deviation (SD), SEM and reliability were investigated in a Monte Carlo simulation of 10,000 candidates taking a postgraduate examination. b) Reliability and SEM were studied in the MRCP(UK) Part 1 and Part 2 Written Examinations from 2002 to 2008. c) Reliability and SEM were studied in eight Specialty Certificate Examinations introduced in 2008-9.</p> <p>Results</p> <p>The Monte Carlo simulation showed, as expected, that restricting the range of an assessment only to those who had already passed it, dramatically reduced the reliability but did not affect the SEM of a simulated assessment. The analysis of the MRCP(UK) Part 1 and Part 2 written examinations showed that the MRCP(UK) Part 2 written examination had a lower reliability than the Part 1 examination, but, despite that lower reliability, the Part 2 examination also had a <it>smaller </it>SEM (indicating a more accurate assessment). The Specialty Certificate Examinations had small Ns, and as a result, wide variability in their reliabilities, but SEMs were comparable with MRCP(UK) Part 2.</p> <p>Conclusions</p> <p>An emphasis upon assessing the quality of assessments primarily in terms of reliability alone can produce a paradoxical and distorted picture, particularly in the situation where a narrower range of candidate ability is an inevitable consequence of being able to take a second part examination only after passing the first part examination. Reliability also shows problems when numbers of candidates in examinations are low and sampling error affects the range of candidate ability. SEM is not subject to such problems; it is therefore a better measure of the quality of an assessment and is recommended for routine use.</p> http://www.biomedcentral.com/1472-6920/10/40
collection DOAJ
language English
format Article
sources DOAJ
author Dewhurst Neil G
Chis Liliana
McManus IC
Tighe Jane
Mucklow John
spellingShingle Dewhurst Neil G
Chis Liliana
McManus IC
Tighe Jane
Mucklow John
The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations
BMC Medical Education
author_facet Dewhurst Neil G
Chis Liliana
McManus IC
Tighe Jane
Mucklow John
author_sort Dewhurst Neil G
title The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations
title_short The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations
title_full The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations
title_fullStr The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations
title_full_unstemmed The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations
title_sort standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of mrcp(uk) examinations
publisher BMC
series BMC Medical Education
issn 1472-6920
publishDate 2010-06-01
description <p>Abstract</p> <p>Background</p> <p>Cronbach's alpha is widely used as the preferred index of reliability for medical postgraduate examinations. A value of 0.8-0.9 is seen by providers and regulators alike as an adequate demonstration of acceptable reliability for any assessment. Of the other statistical parameters, Standard Error of Measurement (SEM) is mainly seen as useful only in determining the accuracy of a pass mark. However the alpha coefficient depends both on SEM and on the ability range (standard deviation, SD) of candidates taking an exam. This study investigated the extent to which the necessarily narrower ability range in candidates taking the second of the three part MRCP(UK) diploma examinations, biases assessment of reliability and SEM.</p> <p>Methods</p> <p>a) The interrelationships of standard deviation (SD), SEM and reliability were investigated in a Monte Carlo simulation of 10,000 candidates taking a postgraduate examination. b) Reliability and SEM were studied in the MRCP(UK) Part 1 and Part 2 Written Examinations from 2002 to 2008. c) Reliability and SEM were studied in eight Specialty Certificate Examinations introduced in 2008-9.</p> <p>Results</p> <p>The Monte Carlo simulation showed, as expected, that restricting the range of an assessment only to those who had already passed it, dramatically reduced the reliability but did not affect the SEM of a simulated assessment. The analysis of the MRCP(UK) Part 1 and Part 2 written examinations showed that the MRCP(UK) Part 2 written examination had a lower reliability than the Part 1 examination, but, despite that lower reliability, the Part 2 examination also had a <it>smaller </it>SEM (indicating a more accurate assessment). The Specialty Certificate Examinations had small Ns, and as a result, wide variability in their reliabilities, but SEMs were comparable with MRCP(UK) Part 2.</p> <p>Conclusions</p> <p>An emphasis upon assessing the quality of assessments primarily in terms of reliability alone can produce a paradoxical and distorted picture, particularly in the situation where a narrower range of candidate ability is an inevitable consequence of being able to take a second part examination only after passing the first part examination. Reliability also shows problems when numbers of candidates in examinations are low and sampling error affects the range of candidate ability. SEM is not subject to such problems; it is therefore a better measure of the quality of an assessment and is recommended for routine use.</p>
url http://www.biomedcentral.com/1472-6920/10/40
work_keys_str_mv AT dewhurstneilg thestandarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations
AT chisliliana thestandarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations
AT mcmanusic thestandarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations
AT tighejane thestandarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations
AT mucklowjohn thestandarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations
AT dewhurstneilg standarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations
AT chisliliana standarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations
AT mcmanusic standarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations
AT tighejane standarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations
AT mucklowjohn standarderrorofmeasurementisamoreappropriatemeasureofqualityforpostgraduatemedicalassessmentsthanisreliabilityananalysisofmrcpukexaminations
_version_ 1724655512628232192