Predicting Onset of Dementia Using Clinical Notes and Machine Learning: Case-Control Study
BackgroundClinical trials need efficient tools to assist in recruiting patients at risk of Alzheimer disease and related dementias (ADRD). Early detection can also assist patients with financial planning for long-term care. Clinical notes are an important, underutilized sourc...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
JMIR Publications
2020-06-01
|
Series: | JMIR Medical Informatics |
Online Access: | https://medinform.jmir.org/2020/6/e17819 |
id |
doaj-f3f47b756d474980a359a9b56e5d7089 |
---|---|
record_format |
Article |
spelling |
doaj-f3f47b756d474980a359a9b56e5d70892021-05-02T19:28:52ZengJMIR PublicationsJMIR Medical Informatics2291-96942020-06-0186e1781910.2196/17819Predicting Onset of Dementia Using Clinical Notes and Machine Learning: Case-Control StudyHane, Christopher ANori, Vijay SCrown, William HSanghavi, Darshak MBleicher, Paul BackgroundClinical trials need efficient tools to assist in recruiting patients at risk of Alzheimer disease and related dementias (ADRD). Early detection can also assist patients with financial planning for long-term care. Clinical notes are an important, underutilized source of information in machine learning models because of the cost of collection and complexity of analysis. ObjectiveThis study aimed to investigate the use of deidentified clinical notes from multiple hospital systems collected over 10 years to augment retrospective machine learning models of the risk of developing ADRD. MethodsWe used 2 years of data to predict the future outcome of ADRD onset. Clinical notes are provided in a deidentified format with specific terms and sentiments. Terms in clinical notes are embedded into a 100-dimensional vector space to identify clusters of related terms and abbreviations that differ across hospital systems and individual clinicians. ResultsWhen using clinical notes, the area under the curve (AUC) improved from 0.85 to 0.94, and positive predictive value (PPV) increased from 45.07% (25,245/56,018) to 68.32% (14,153/20,717) in the model at disease onset. Models with clinical notes improved in both AUC and PPV in years 3-6 when notes’ volume was largest; results are mixed in years 7 and 8 with the smallest cohorts. ConclusionsAlthough clinical notes helped in the short term, the presence of ADRD symptomatic terms years earlier than onset adds evidence to other studies that clinicians undercode diagnoses of ADRD. De-identified clinical notes increase the accuracy of risk models. Clinical notes collected across multiple hospital systems via natural language processing can be merged using postprocessing techniques to aid model accuracy.https://medinform.jmir.org/2020/6/e17819 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Hane, Christopher A Nori, Vijay S Crown, William H Sanghavi, Darshak M Bleicher, Paul |
spellingShingle |
Hane, Christopher A Nori, Vijay S Crown, William H Sanghavi, Darshak M Bleicher, Paul Predicting Onset of Dementia Using Clinical Notes and Machine Learning: Case-Control Study JMIR Medical Informatics |
author_facet |
Hane, Christopher A Nori, Vijay S Crown, William H Sanghavi, Darshak M Bleicher, Paul |
author_sort |
Hane, Christopher A |
title |
Predicting Onset of Dementia Using Clinical Notes and Machine Learning: Case-Control Study |
title_short |
Predicting Onset of Dementia Using Clinical Notes and Machine Learning: Case-Control Study |
title_full |
Predicting Onset of Dementia Using Clinical Notes and Machine Learning: Case-Control Study |
title_fullStr |
Predicting Onset of Dementia Using Clinical Notes and Machine Learning: Case-Control Study |
title_full_unstemmed |
Predicting Onset of Dementia Using Clinical Notes and Machine Learning: Case-Control Study |
title_sort |
predicting onset of dementia using clinical notes and machine learning: case-control study |
publisher |
JMIR Publications |
series |
JMIR Medical Informatics |
issn |
2291-9694 |
publishDate |
2020-06-01 |
description |
BackgroundClinical trials need efficient tools to assist in recruiting patients at risk of Alzheimer disease and related dementias (ADRD). Early detection can also assist patients with financial planning for long-term care. Clinical notes are an important, underutilized source of information in machine learning models because of the cost of collection and complexity of analysis.
ObjectiveThis study aimed to investigate the use of deidentified clinical notes from multiple hospital systems collected over 10 years to augment retrospective machine learning models of the risk of developing ADRD.
MethodsWe used 2 years of data to predict the future outcome of ADRD onset. Clinical notes are provided in a deidentified format with specific terms and sentiments. Terms in clinical notes are embedded into a 100-dimensional vector space to identify clusters of related terms and abbreviations that differ across hospital systems and individual clinicians.
ResultsWhen using clinical notes, the area under the curve (AUC) improved from 0.85 to 0.94, and positive predictive value (PPV) increased from 45.07% (25,245/56,018) to 68.32% (14,153/20,717) in the model at disease onset. Models with clinical notes improved in both AUC and PPV in years 3-6 when notes’ volume was largest; results are mixed in years 7 and 8 with the smallest cohorts.
ConclusionsAlthough clinical notes helped in the short term, the presence of ADRD symptomatic terms years earlier than onset adds evidence to other studies that clinicians undercode diagnoses of ADRD. De-identified clinical notes increase the accuracy of risk models. Clinical notes collected across multiple hospital systems via natural language processing can be merged using postprocessing techniques to aid model accuracy. |
url |
https://medinform.jmir.org/2020/6/e17819 |
work_keys_str_mv |
AT hanechristophera predictingonsetofdementiausingclinicalnotesandmachinelearningcasecontrolstudy AT norivijays predictingonsetofdementiausingclinicalnotesandmachinelearningcasecontrolstudy AT crownwilliamh predictingonsetofdementiausingclinicalnotesandmachinelearningcasecontrolstudy AT sanghavidarshakm predictingonsetofdementiausingclinicalnotesandmachinelearningcasecontrolstudy AT bleicherpaul predictingonsetofdementiausingclinicalnotesandmachinelearningcasecontrolstudy |
_version_ |
1721488126292000768 |