Statistical stopping criteria for automated screening in systematic reviews

Abstract Active learning for systematic review screening promises to reduce the human effort required to identify relevant documents for a systematic review. Machines and humans work together, with humans providing training data, and the machine optimising the documents that the humans screen. This...

Full description

Bibliographic Details
Main Authors: Max W Callaghan, Finn Müller-Hansen
Format: Article
Language:English
Published: BMC 2020-11-01
Series:Systematic Reviews
Subjects:
Online Access:https://doi.org/10.1186/s13643-020-01521-4
id doaj-3e15c7df2ff647afab5e32dacfb7a350
record_format Article
spelling doaj-3e15c7df2ff647afab5e32dacfb7a3502020-11-29T12:05:00ZengBMCSystematic Reviews2046-40532020-11-019111410.1186/s13643-020-01521-4Statistical stopping criteria for automated screening in systematic reviewsMax W Callaghan0Finn Müller-Hansen1Mercator Research Institute on Global Commons and Climate ChangeMercator Research Institute on Global Commons and Climate ChangeAbstract Active learning for systematic review screening promises to reduce the human effort required to identify relevant documents for a systematic review. Machines and humans work together, with humans providing training data, and the machine optimising the documents that the humans screen. This enables the identification of all relevant documents after viewing only a fraction of the total documents. However, current approaches lack robust stopping criteria, so that reviewers do not know when they have seen all or a certain proportion of relevant documents. This means that such systems are hard to implement in live reviews. This paper introduces a workflow with flexible statistical stopping criteria, which offer real work reductions on the basis of rejecting a hypothesis of having missed a given recall target with a given level of confidence. The stopping criteria are shown on test datasets to achieve a reliable level of recall, while still providing work reductions of on average 17%. Other methods proposed previously are shown to provide inconsistent recall and work reductions across datasets.https://doi.org/10.1186/s13643-020-01521-4Systematic reviewMachine learningActive learningStopping criteria
collection DOAJ
language English
format Article
sources DOAJ
author Max W Callaghan
Finn Müller-Hansen
spellingShingle Max W Callaghan
Finn Müller-Hansen
Statistical stopping criteria for automated screening in systematic reviews
Systematic Reviews
Systematic review
Machine learning
Active learning
Stopping criteria
author_facet Max W Callaghan
Finn Müller-Hansen
author_sort Max W Callaghan
title Statistical stopping criteria for automated screening in systematic reviews
title_short Statistical stopping criteria for automated screening in systematic reviews
title_full Statistical stopping criteria for automated screening in systematic reviews
title_fullStr Statistical stopping criteria for automated screening in systematic reviews
title_full_unstemmed Statistical stopping criteria for automated screening in systematic reviews
title_sort statistical stopping criteria for automated screening in systematic reviews
publisher BMC
series Systematic Reviews
issn 2046-4053
publishDate 2020-11-01
description Abstract Active learning for systematic review screening promises to reduce the human effort required to identify relevant documents for a systematic review. Machines and humans work together, with humans providing training data, and the machine optimising the documents that the humans screen. This enables the identification of all relevant documents after viewing only a fraction of the total documents. However, current approaches lack robust stopping criteria, so that reviewers do not know when they have seen all or a certain proportion of relevant documents. This means that such systems are hard to implement in live reviews. This paper introduces a workflow with flexible statistical stopping criteria, which offer real work reductions on the basis of rejecting a hypothesis of having missed a given recall target with a given level of confidence. The stopping criteria are shown on test datasets to achieve a reliable level of recall, while still providing work reductions of on average 17%. Other methods proposed previously are shown to provide inconsistent recall and work reductions across datasets.
topic Systematic review
Machine learning
Active learning
Stopping criteria
url https://doi.org/10.1186/s13643-020-01521-4
work_keys_str_mv AT maxwcallaghan statisticalstoppingcriteriaforautomatedscreeninginsystematicreviews
AT finnmullerhansen statisticalstoppingcriteriaforautomatedscreeninginsystematicreviews
_version_ 1724412247777738752