Efficient Identification of Patients Eligible for Clinical Studies Using Case-Based Reasoning on The Scottish Health Research Register (SHARE)

Introduction Trials often struggle to achieve their target sample size with only half doing so. Some researchers have turned to Electronic Health Records (EHRs), seeking a more efficient way of recruitment. The Scottish Health Research Register (SHARE) obtained patients’ consent for their EHRs to b...

Full description

Bibliographic Details
Main Authors: Wen Shi, Tom Kelsey, Frank Sullivan
Format: Article
Language:English
Published: Swansea University 2020-12-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/1509
id doaj-9602751547e946958698c9251e94ce11
record_format Article
spelling doaj-9602751547e946958698c9251e94ce112021-02-10T16:42:47ZengSwansea UniversityInternational Journal of Population Data Science2399-49082020-12-0155Efficient Identification of Patients Eligible for Clinical Studies Using Case-Based Reasoning on The Scottish Health Research Register (SHARE)Wen Shi0Tom Kelsey1Frank Sullivan2University of St. AndrewsUniversity of St. AndrewsUniversity of St. Andrews Introduction Trials often struggle to achieve their target sample size with only half doing so. Some researchers have turned to Electronic Health Records (EHRs), seeking a more efficient way of recruitment. The Scottish Health Research Register (SHARE) obtained patients’ consent for their EHRs to be used as a searching base from which researchers can find potential participants. However, due to the fact that EHR data is not complete, sufficient or accurate, a database search strategy may not generate the best case-finding result. Objectives and Approach A retrospective study was conducted to evaluate the performance of a case-based reasoning method in identifying participants for population-based clinical studies which had recruited through SHARE. A case-based reasoning framework was applied to nine studies with 119 total participants using two-fold cross-validation. Records of 30,000 random individuals were also merged with each test set to simulate the real-world recruitment setting. A prediction score for study participation was generated for each one in the test set through comparison of their diagnosis, procedure, pharmaceutical prescription, and laboratory test results attributes and those of the participants of a particular study. Evaluation was conducted by calculating Area Under the ROC Curve and information retrieval metrics for the ranking list of the test set by prediction score. We also compared the most likely participants as identified by searching a database to those ranked highest by our model. Results The average ROCAUC for nine projects was 81% indicating strong predictive ability. However, the derived ranking lists showed lower predictive performance. 21% of the persons ranked within top 50 positions being the same as identified by searching databases. Conclusion / Implications Case-based reasoning may be more effective than database search strategy for participant identification. This hypothesis requires a prospective study for further validation. The lower performance of ranking lists suggests improvements are needed in the collection and curation of EHRs. https://ijpds.org/article/view/1509
collection DOAJ
language English
format Article
sources DOAJ
author Wen Shi
Tom Kelsey
Frank Sullivan
spellingShingle Wen Shi
Tom Kelsey
Frank Sullivan
Efficient Identification of Patients Eligible for Clinical Studies Using Case-Based Reasoning on The Scottish Health Research Register (SHARE)
International Journal of Population Data Science
author_facet Wen Shi
Tom Kelsey
Frank Sullivan
author_sort Wen Shi
title Efficient Identification of Patients Eligible for Clinical Studies Using Case-Based Reasoning on The Scottish Health Research Register (SHARE)
title_short Efficient Identification of Patients Eligible for Clinical Studies Using Case-Based Reasoning on The Scottish Health Research Register (SHARE)
title_full Efficient Identification of Patients Eligible for Clinical Studies Using Case-Based Reasoning on The Scottish Health Research Register (SHARE)
title_fullStr Efficient Identification of Patients Eligible for Clinical Studies Using Case-Based Reasoning on The Scottish Health Research Register (SHARE)
title_full_unstemmed Efficient Identification of Patients Eligible for Clinical Studies Using Case-Based Reasoning on The Scottish Health Research Register (SHARE)
title_sort efficient identification of patients eligible for clinical studies using case-based reasoning on the scottish health research register (share)
publisher Swansea University
series International Journal of Population Data Science
issn 2399-4908
publishDate 2020-12-01
description Introduction Trials often struggle to achieve their target sample size with only half doing so. Some researchers have turned to Electronic Health Records (EHRs), seeking a more efficient way of recruitment. The Scottish Health Research Register (SHARE) obtained patients’ consent for their EHRs to be used as a searching base from which researchers can find potential participants. However, due to the fact that EHR data is not complete, sufficient or accurate, a database search strategy may not generate the best case-finding result. Objectives and Approach A retrospective study was conducted to evaluate the performance of a case-based reasoning method in identifying participants for population-based clinical studies which had recruited through SHARE. A case-based reasoning framework was applied to nine studies with 119 total participants using two-fold cross-validation. Records of 30,000 random individuals were also merged with each test set to simulate the real-world recruitment setting. A prediction score for study participation was generated for each one in the test set through comparison of their diagnosis, procedure, pharmaceutical prescription, and laboratory test results attributes and those of the participants of a particular study. Evaluation was conducted by calculating Area Under the ROC Curve and information retrieval metrics for the ranking list of the test set by prediction score. We also compared the most likely participants as identified by searching a database to those ranked highest by our model. Results The average ROCAUC for nine projects was 81% indicating strong predictive ability. However, the derived ranking lists showed lower predictive performance. 21% of the persons ranked within top 50 positions being the same as identified by searching databases. Conclusion / Implications Case-based reasoning may be more effective than database search strategy for participant identification. This hypothesis requires a prospective study for further validation. The lower performance of ranking lists suggests improvements are needed in the collection and curation of EHRs.
url https://ijpds.org/article/view/1509
work_keys_str_mv AT wenshi efficientidentificationofpatientseligibleforclinicalstudiesusingcasebasedreasoningonthescottishhealthresearchregistershare
AT tomkelsey efficientidentificationofpatientseligibleforclinicalstudiesusingcasebasedreasoningonthescottishhealthresearchregistershare
AT franksullivan efficientidentificationofpatientseligibleforclinicalstudiesusingcasebasedreasoningonthescottishhealthresearchregistershare
_version_ 1724275188055408640