Human Computation as a New Method for Evidence-Based Knowledge Transfer in Web-Based Guideline Development Groups: Proof of Concept Randomized Controlled Trial

BackgroundGuideline developers use different consensus methods to develop evidence-based clinical practice guidelines. Previous research suggests that existing guideline development techniques are subject to methodological problems and are logistically demanding. Guideline de...

Full description

Bibliographic Details
Main Authors: Heselmans, Annemie, Aertgeerts, Bert, Donceel, Peter, Van de Velde, Stijn, Vanbrabant, Peter, Ramaekers, Dirk
Format: Article
Language:English
Published: JMIR Publications 2013-01-01
Series:Journal of Medical Internet Research
Online Access:http://www.jmir.org/2013/1/e8/
Description
Summary:BackgroundGuideline developers use different consensus methods to develop evidence-based clinical practice guidelines. Previous research suggests that existing guideline development techniques are subject to methodological problems and are logistically demanding. Guideline developers welcome new methods that facilitate a methodologically sound decision-making process. Systems that aggregate knowledge while participants play a game are one class of human computation applications. Researchers have already proven that these games with a purpose are effective in building common sense knowledge databases. ObjectiveWe aimed to evaluate the feasibility of a new consensus method based on human computation techniques compared to an informal face-to-face consensus method. MethodsWe set up a randomized design to study 2 different methods for guideline development within a group of advanced students completing a master of nursing and obstetrics. Students who participated in the trial were enrolled in an evidence-based health care course. We compared the Web-based method of human-based computation (HC) with an informal face-to-face consensus method (IC). We used 4 clinical scenarios of lower back pain as the subject of the consensus process. These scenarios concerned the following topics: (1) medical imaging, (2) therapeutic options, (3) drugs use, and (4) sick leave. Outcomes were expressed as the amount of group (dis)agreement and the concordance of answers with clinical evidence. We estimated within-group and between-group effect sizes by calculating Cohen’s d. We calculated within-group effect sizes as the absolute difference between the outcome value at round 3 and the baseline outcome value, divided by the pooled standard deviation. We calculated between-group effect sizes as the absolute difference between the mean change in outcome value across rounds in HC and the mean change in outcome value across rounds in IC, divided by the pooled standard deviation. We analyzed statistical significance of within-group changes between round 1 and round 3 using the Wilcoxon signed rank test. We assessed the differences between the HC and IC groups using Mann-Whitney U tests. We used a Bonferroni adjusted alpha level of .025 in all statistical tests. We performed a thematic analysis to explore participants’ arguments during group discussion. Participants completed a satisfaction survey at the end of the consensus process. ResultsOf the 135 students completing a master of nursing and obstetrics, 120 participated in the experiment. We formed 8 HC groups (n=64) and 7 IC groups (n=56). The between-group comparison demonstrated that the human computation groups obtained a greater improvement in evidence scores compared to the IC groups, although the difference was not statistically significant. The between-group effect size was 0.56 (P=.30) for the medical imaging scenario, 0.07 (P=.97) for the therapeutic options scenario, and 0.89 (P=.11) for the drug use scenario. We found no significant differences in improvement in the degree of agreement between HC and IC groups. Between-group comparisons revealed that the HC groups showed greater improvement in degree of agreement for the medical imaging scenario (d=0.46, P=.37) and the drug use scenario (d=0.31, P=.59). Very few evidence arguments (6%) were quoted during informal group discussions. ConclusionsOverall, the use of the IC method was appropriate as long as the evidence supported participants’ beliefs or usual practice, or when the availability of the evidence was sparse. However, when some controversy about the evidence existed, the HC method outperformed the IC method. The findings of our study illustrate the importance of the choice of the consensus method in guideline development. Human computation could be an acceptable methodology for guideline development specifically for scenarios in which the evidence shows no resonance with participants’ beliefs. Future research is needed to confirm the results of this study and to establish practical significance in a controlled setting of multidisciplinary guideline panels during real-life guideline development.
ISSN:1438-8871