Interrater Reliability of mHealth App Rating Measures: Analysis of Top Depression and Smoking Cessation Apps

BackgroundThere are over 165,000 mHealth apps currently available to patients, but few have undergone an external quality review. Furthermore, no standardized review method exists, and little has been done to examine the consistency of the evaluation systems themselves....

Full description

Bibliographic Details
Main Authors:	Powell, Adam C, Torous, John, Chan, Steven, Raynor, Geoffrey Stephen, Shwarts, Erik, Shanahan, Meghan, Landman, Adam B
Format:	Article
Language:	English
Published:	JMIR Publications 2016-02-01
Series:	JMIR mHealth and uHealth
Online Access:	http://mhealth.jmir.org/2016/1/e15/

id	doaj-9ddb059489174280bdbaa3e8af495e4c
record_format	Article
spelling	doaj-9ddb059489174280bdbaa3e8af495e4c2021-05-03T01:40:28ZengJMIR PublicationsJMIR mHealth and uHealth2291-52222016-02-0141e1510.2196/mhealth.5176Interrater Reliability of mHealth App Rating Measures: Analysis of Top Depression and Smoking Cessation AppsPowell, Adam CTorous, JohnChan, StevenRaynor, Geoffrey StephenShwarts, ErikShanahan, MeghanLandman, Adam B BackgroundThere are over 165,000 mHealth apps currently available to patients, but few have undergone an external quality review. Furthermore, no standardized review method exists, and little has been done to examine the consistency of the evaluation systems themselves. ObjectiveWe sought to determine which measures for evaluating the quality of mHealth apps have the greatest interrater reliability. MethodsWe identified 22 measures for evaluating the quality of apps from the literature. A panel of 6 reviewers reviewed the top 10 depression apps and 10 smoking cessation apps from the Apple iTunes App Store on these measures. Krippendorff’s alpha was calculated for each of the measures and reported by app category and in aggregate. ResultsThe measure for interactiveness and feedback was found to have the greatest overall interrater reliability (alpha=.69). Presence of password protection (alpha=.65), whether the app was uploaded by a health care agency (alpha=.63), the number of consumer ratings (alpha=.59), and several other measures had moderate interrater reliability (alphas>.5). There was the least agreement over whether apps had errors or performance issues (alpha=.15), stated advertising policies (alpha=.16), and were easy to use (alpha=.18). There were substantial differences in the interrater reliabilities of a number of measures when they were applied to depression versus smoking apps. ConclusionsWe found wide variation in the interrater reliability of measures used to evaluate apps, and some measures are more robust across categories of apps than others. The measures with the highest degree of interrater reliability tended to be those that involved the least rater discretion. Clinical quality measures such as effectiveness, ease of use, and performance had relatively poor interrater reliability. Subsequent research is needed to determine consistent means for evaluating the performance of apps. Patients and clinicians should consider conducting their own assessments of apps, in conjunction with evaluating information from reviews.http://mhealth.jmir.org/2016/1/e15/
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Powell, Adam C Torous, John Chan, Steven Raynor, Geoffrey Stephen Shwarts, Erik Shanahan, Meghan Landman, Adam B
spellingShingle	Powell, Adam C Torous, John Chan, Steven Raynor, Geoffrey Stephen Shwarts, Erik Shanahan, Meghan Landman, Adam B Interrater Reliability of mHealth App Rating Measures: Analysis of Top Depression and Smoking Cessation Apps JMIR mHealth and uHealth
author_facet	Powell, Adam C Torous, John Chan, Steven Raynor, Geoffrey Stephen Shwarts, Erik Shanahan, Meghan Landman, Adam B
author_sort	Powell, Adam C
title	Interrater Reliability of mHealth App Rating Measures: Analysis of Top Depression and Smoking Cessation Apps
title_short	Interrater Reliability of mHealth App Rating Measures: Analysis of Top Depression and Smoking Cessation Apps
title_full	Interrater Reliability of mHealth App Rating Measures: Analysis of Top Depression and Smoking Cessation Apps
title_fullStr	Interrater Reliability of mHealth App Rating Measures: Analysis of Top Depression and Smoking Cessation Apps
title_full_unstemmed	Interrater Reliability of mHealth App Rating Measures: Analysis of Top Depression and Smoking Cessation Apps
title_sort	interrater reliability of mhealth app rating measures: analysis of top depression and smoking cessation apps
publisher	JMIR Publications
series	JMIR mHealth and uHealth
issn	2291-5222
publishDate	2016-02-01
description	BackgroundThere are over 165,000 mHealth apps currently available to patients, but few have undergone an external quality review. Furthermore, no standardized review method exists, and little has been done to examine the consistency of the evaluation systems themselves. ObjectiveWe sought to determine which measures for evaluating the quality of mHealth apps have the greatest interrater reliability. MethodsWe identified 22 measures for evaluating the quality of apps from the literature. A panel of 6 reviewers reviewed the top 10 depression apps and 10 smoking cessation apps from the Apple iTunes App Store on these measures. Krippendorff’s alpha was calculated for each of the measures and reported by app category and in aggregate. ResultsThe measure for interactiveness and feedback was found to have the greatest overall interrater reliability (alpha=.69). Presence of password protection (alpha=.65), whether the app was uploaded by a health care agency (alpha=.63), the number of consumer ratings (alpha=.59), and several other measures had moderate interrater reliability (alphas>.5). There was the least agreement over whether apps had errors or performance issues (alpha=.15), stated advertising policies (alpha=.16), and were easy to use (alpha=.18). There were substantial differences in the interrater reliabilities of a number of measures when they were applied to depression versus smoking apps. ConclusionsWe found wide variation in the interrater reliability of measures used to evaluate apps, and some measures are more robust across categories of apps than others. The measures with the highest degree of interrater reliability tended to be those that involved the least rater discretion. Clinical quality measures such as effectiveness, ease of use, and performance had relatively poor interrater reliability. Subsequent research is needed to determine consistent means for evaluating the performance of apps. Patients and clinicians should consider conducting their own assessments of apps, in conjunction with evaluating information from reviews.
url	http://mhealth.jmir.org/2016/1/e15/
work_keys_str_mv	AT powelladamc interraterreliabilityofmhealthappratingmeasuresanalysisoftopdepressionandsmokingcessationapps AT torousjohn interraterreliabilityofmhealthappratingmeasuresanalysisoftopdepressionandsmokingcessationapps AT chansteven interraterreliabilityofmhealthappratingmeasuresanalysisoftopdepressionandsmokingcessationapps AT raynorgeoffreystephen interraterreliabilityofmhealthappratingmeasuresanalysisoftopdepressionandsmokingcessationapps AT shwartserik interraterreliabilityofmhealthappratingmeasuresanalysisoftopdepressionandsmokingcessationapps AT shanahanmeghan interraterreliabilityofmhealthappratingmeasuresanalysisoftopdepressionandsmokingcessationapps AT landmanadamb interraterreliabilityofmhealthappratingmeasuresanalysisoftopdepressionandsmokingcessationapps
_version_	1721485974952738816

Interrater Reliability of mHealth App Rating Measures: Analysis of Top Depression and Smoking Cessation Apps

Similar Items