Evaluation of the Measurement Properties of the Short Form 36 Version 2 Health Survey in a Sample of Patients with Multiple Sclerosis

Background: In health status assessment, patient-reported outcome (PRO) measures are tools used to elicit important and measurable information from patients to better understand the impact of health conditions on their lives. Such impacts are considered latent constructs, or variables that cannot be...

Full description

Bibliographic Details
Main Author: Khalaf, Kristin Marie
Other Authors: Malone, Daniel
Language:en_US
Published: The University of Arizona. 2016
Subjects:
Online Access:http://hdl.handle.net/10150/612133
http://arizona.openrepository.com/arizona/handle/10150/612133
id ndltd-arizona.edu-oai-arizona.openrepository.com-10150-612133
record_format oai_dc
collection NDLTD
language en_US
sources NDLTD
topic Pharmaceutical sciences
Quantitative psychology
Pharmaceutical Sciences
Mental health
spellingShingle Pharmaceutical sciences
Quantitative psychology
Pharmaceutical Sciences
Mental health
Khalaf, Kristin Marie
Evaluation of the Measurement Properties of the Short Form 36 Version 2 Health Survey in a Sample of Patients with Multiple Sclerosis
description Background: In health status assessment, patient-reported outcome (PRO) measures are tools used to elicit important and measurable information from patients to better understand the impact of health conditions on their lives. Such impacts are considered latent constructs, or variables that cannot be observed or measured directly. Instruments intended to assess latent constructs must satisfy certain development, psychometric, and scaling standards through the generation of both qualitative and quantitative evidence to demonstrate the adequacy of its measurement properties. Health-related quality of life (HRQOL), or the subjective perception of health, is a core concept within the field of PROs. The Short Form 36 (SF-36) is one of the most commonly used PROs used to assess health-related quality of life (HRQOL).Objectives: To provide a better understanding of the performance and dimensionality of the SF-36 version 2 in a cross-sectional sample of patients with multiple sclerosis (MS) on an item, subscale, and higher-order factor structure level using different measurement methods grounded in classical test theory (CTT), factor analysis, and item response theory (IRT).Methods: This was a post hoc analysis of a cross-sectional dataset. Patients with MS were recruited to participate in an online survey asking a variety of questions related to their health and treatment seeking behaviors. The SF-36 was one of the questionnaires included in the survey. Items and individual subscales were evaluated using a multi-trait/multi-item correlation matrix to assess item-to-subscale relationships, including item discriminant validity with other subscales. Unidimensionality for select SF-36 subscales was assessed through confirmatory factor analysis (CFA). Internal consistency reliability (Cronbach's alpha) was evaluated for each subscale. Patient-reported disability, depression, and current symptom exacerbation status were evaluated relative to SF-36 subscale scores to assess convergent validity, discriminant validity, and known-groups validity. Higher-order factor models of the SF-36 were tested to evaluate dimensionality of the instrument, including a two-factor second-order factor model, a bifactor model, and a statistical comparison between the bifactor model and its corresponding nested model. Unidimensionality was further evaluated through the use of graded response IRT models. The relative fit of traditional versus discrimination-constrained models was tested using a -2 loglikelihood ratio test, followed by an evaluation of item-level properties for fit (S-X² statistics), local dependence, and further assessment of model parameters (discrimination parameters, location parameters, option response functions, and test information curves). Person location parameters were also estimated to compare scale information to the location of patients along the latent construct. Results: A total of 1,052 respondents completed the survey. Unidimensionality of individual subscales evaluated via CFA all had confirmatory fit indices (CFI)>0.90, butroot mean square error of approximation [RMSEA] values all exceeded 0.08. All IRT graded response models showed a statistically significant improvement in model fit when item discrimination was freely estimated. Each subscale from the IRT models had at least one mis-fitting item across all unidimensional scales tested (S-X² p-value>0.05), and nearly all subscales tested showed item pairs with signs of local dependence. Cronbach's alpha was>0.80 for all subscales except for General Health [GH] (alpha = 0.78). SF-36 subscales most closely related to physical aspects of health status had the strongest relationship to disability status (physical functioning [PF], r = -0.82, and role physical [RP], r = -0.57). Subscales more closely related to mental health had the largest effect sizes between patients with versus without depression (0.88 for mental health [MH] subscale) and the smallest effect sizes between patients reporting currently experiencing versus not experiencing an exacerbation of their symptoms (0.48 for role emotional [RE]subscale). Both CFA and IRT analyses showed lack of compelling evidence supporting unidimensionality upon combining items from the PF, RP, bodily pain [BP], and GH subscales to form the Physical-21, and upon combining items from the VT, role emotional (RE), social functioning (SF), and MH subscales to form the Mental-14. Higher-order factor models showed good model fit, with CFI>0.90 in all cases and lower RMSEA values than seen for the individual subscales (0.077 to 0.107). The bifactor model fit significantly better than its nested second-order version, however, the best-fitting (i.e., highest CFI and lowest RMSEA) higher-order factor model was the preliminary first-order model with eight first-order factors consistent with the eight subscales of the SF-36 (CFI=0.996, RMSEA=0.077, X² = 3872.14, p<0.001). Conclusions: The SF-36 version 2 performed well when evaluated within the CTT framework, but both CFA and IRT methods revealed several limitations at the item and factor level across all subscales, due to item wording (i.e., positive versus negative), items not being sufficiently related to its latent construct, and local dependence of items within and across subscales. The appropriateness of equal weighting of responses to produce a single summary score for each subscale, as well as their further aggregation into the Physical Component Summary and Mental Component Summary scores should be reevaluated.
author2 Malone, Daniel
author_facet Malone, Daniel
Khalaf, Kristin Marie
author Khalaf, Kristin Marie
author_sort Khalaf, Kristin Marie
title Evaluation of the Measurement Properties of the Short Form 36 Version 2 Health Survey in a Sample of Patients with Multiple Sclerosis
title_short Evaluation of the Measurement Properties of the Short Form 36 Version 2 Health Survey in a Sample of Patients with Multiple Sclerosis
title_full Evaluation of the Measurement Properties of the Short Form 36 Version 2 Health Survey in a Sample of Patients with Multiple Sclerosis
title_fullStr Evaluation of the Measurement Properties of the Short Form 36 Version 2 Health Survey in a Sample of Patients with Multiple Sclerosis
title_full_unstemmed Evaluation of the Measurement Properties of the Short Form 36 Version 2 Health Survey in a Sample of Patients with Multiple Sclerosis
title_sort evaluation of the measurement properties of the short form 36 version 2 health survey in a sample of patients with multiple sclerosis
publisher The University of Arizona.
publishDate 2016
url http://hdl.handle.net/10150/612133
http://arizona.openrepository.com/arizona/handle/10150/612133
work_keys_str_mv AT khalafkristinmarie evaluationofthemeasurementpropertiesoftheshortform36version2healthsurveyinasampleofpatientswithmultiplesclerosis
_version_ 1718299381392736256
spelling ndltd-arizona.edu-oai-arizona.openrepository.com-10150-6121332016-06-09T15:01:53Z Evaluation of the Measurement Properties of the Short Form 36 Version 2 Health Survey in a Sample of Patients with Multiple Sclerosis Khalaf, Kristin Marie Malone, Daniel Slack, Marion Warholak, Terri Coyne, Karin Reeve, Bryce Malone, Daniel Pharmaceutical sciences Quantitative psychology Pharmaceutical Sciences Mental health Background: In health status assessment, patient-reported outcome (PRO) measures are tools used to elicit important and measurable information from patients to better understand the impact of health conditions on their lives. Such impacts are considered latent constructs, or variables that cannot be observed or measured directly. Instruments intended to assess latent constructs must satisfy certain development, psychometric, and scaling standards through the generation of both qualitative and quantitative evidence to demonstrate the adequacy of its measurement properties. Health-related quality of life (HRQOL), or the subjective perception of health, is a core concept within the field of PROs. The Short Form 36 (SF-36) is one of the most commonly used PROs used to assess health-related quality of life (HRQOL).Objectives: To provide a better understanding of the performance and dimensionality of the SF-36 version 2 in a cross-sectional sample of patients with multiple sclerosis (MS) on an item, subscale, and higher-order factor structure level using different measurement methods grounded in classical test theory (CTT), factor analysis, and item response theory (IRT).Methods: This was a post hoc analysis of a cross-sectional dataset. Patients with MS were recruited to participate in an online survey asking a variety of questions related to their health and treatment seeking behaviors. The SF-36 was one of the questionnaires included in the survey. Items and individual subscales were evaluated using a multi-trait/multi-item correlation matrix to assess item-to-subscale relationships, including item discriminant validity with other subscales. Unidimensionality for select SF-36 subscales was assessed through confirmatory factor analysis (CFA). Internal consistency reliability (Cronbach's alpha) was evaluated for each subscale. Patient-reported disability, depression, and current symptom exacerbation status were evaluated relative to SF-36 subscale scores to assess convergent validity, discriminant validity, and known-groups validity. Higher-order factor models of the SF-36 were tested to evaluate dimensionality of the instrument, including a two-factor second-order factor model, a bifactor model, and a statistical comparison between the bifactor model and its corresponding nested model. Unidimensionality was further evaluated through the use of graded response IRT models. The relative fit of traditional versus discrimination-constrained models was tested using a -2 loglikelihood ratio test, followed by an evaluation of item-level properties for fit (S-X² statistics), local dependence, and further assessment of model parameters (discrimination parameters, location parameters, option response functions, and test information curves). Person location parameters were also estimated to compare scale information to the location of patients along the latent construct. Results: A total of 1,052 respondents completed the survey. Unidimensionality of individual subscales evaluated via CFA all had confirmatory fit indices (CFI)>0.90, butroot mean square error of approximation [RMSEA] values all exceeded 0.08. All IRT graded response models showed a statistically significant improvement in model fit when item discrimination was freely estimated. Each subscale from the IRT models had at least one mis-fitting item across all unidimensional scales tested (S-X² p-value>0.05), and nearly all subscales tested showed item pairs with signs of local dependence. Cronbach's alpha was>0.80 for all subscales except for General Health [GH] (alpha = 0.78). SF-36 subscales most closely related to physical aspects of health status had the strongest relationship to disability status (physical functioning [PF], r = -0.82, and role physical [RP], r = -0.57). Subscales more closely related to mental health had the largest effect sizes between patients with versus without depression (0.88 for mental health [MH] subscale) and the smallest effect sizes between patients reporting currently experiencing versus not experiencing an exacerbation of their symptoms (0.48 for role emotional [RE]subscale). Both CFA and IRT analyses showed lack of compelling evidence supporting unidimensionality upon combining items from the PF, RP, bodily pain [BP], and GH subscales to form the Physical-21, and upon combining items from the VT, role emotional (RE), social functioning (SF), and MH subscales to form the Mental-14. Higher-order factor models showed good model fit, with CFI>0.90 in all cases and lower RMSEA values than seen for the individual subscales (0.077 to 0.107). The bifactor model fit significantly better than its nested second-order version, however, the best-fitting (i.e., highest CFI and lowest RMSEA) higher-order factor model was the preliminary first-order model with eight first-order factors consistent with the eight subscales of the SF-36 (CFI=0.996, RMSEA=0.077, X² = 3872.14, p<0.001). Conclusions: The SF-36 version 2 performed well when evaluated within the CTT framework, but both CFA and IRT methods revealed several limitations at the item and factor level across all subscales, due to item wording (i.e., positive versus negative), items not being sufficiently related to its latent construct, and local dependence of items within and across subscales. The appropriateness of equal weighting of responses to produce a single summary score for each subscale, as well as their further aggregation into the Physical Component Summary and Mental Component Summary scores should be reevaluated. 2016 text Electronic Dissertation http://hdl.handle.net/10150/612133 http://arizona.openrepository.com/arizona/handle/10150/612133 en_US Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. The University of Arizona.