Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome

Abstract Background The growing use of Electronic Health Records (EHRs) is promoting the application of data mining in health-care. A promising use of big data in this field is to develop models to support early diagnosis and to establish natural history. Dravet Syndrome (DS) is a rare developmental...

Full description

Bibliographic Details
Main Authors: Tommaso Lo Barco, Mathieu Kuchenbuch, Nicolas Garcelon, Antoine Neuraz, Rima Nabbout
Format: Article
Language:English
Published: BMC 2021-07-01
Series:Orphanet Journal of Rare Diseases
Subjects:
Online Access:https://doi.org/10.1186/s13023-021-01936-9
id doaj-91eaadfa38a74e19a5a66fbaef91f3f0
record_format Article
spelling doaj-91eaadfa38a74e19a5a66fbaef91f3f02021-07-18T11:29:37ZengBMCOrphanet Journal of Rare Diseases1750-11722021-07-0116111210.1186/s13023-021-01936-9Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndromeTommaso Lo Barco0Mathieu Kuchenbuch1Nicolas Garcelon2Antoine Neuraz3Rima Nabbout4Department of Pediatric Neurology, Necker-Enfants Malades Hospital, APHP, Centre de Référence Épilepsies Rares, Member of ERN EPICARE, Université de ParisDepartment of Pediatric Neurology, Necker-Enfants Malades Hospital, APHP, Centre de Référence Épilepsies Rares, Member of ERN EPICARE, Université de ParisImagine Institute, INSERM, UMR 1163, Université de ParisUniversité de ParisDepartment of Pediatric Neurology, Necker-Enfants Malades Hospital, APHP, Centre de Référence Épilepsies Rares, Member of ERN EPICARE, Université de ParisAbstract Background The growing use of Electronic Health Records (EHRs) is promoting the application of data mining in health-care. A promising use of big data in this field is to develop models to support early diagnosis and to establish natural history. Dravet Syndrome (DS) is a rare developmental and epileptic encephalopathy that commonly initiates in the first year of life with febrile seizures (FS). Age at diagnosis is often delayed after 2 years, as it is difficult to differentiate DS at onset from FS. We aimed to explore if some clinical terms (concepts) are significantly more used in the electronic narrative medical reports of individuals with DS before the age of 2 years compared to those of individuals with FS. These concepts would allow an earlier detection of patients with DS resulting in an earlier orientation toward expert centers that can provide early diagnosis and care. Methods Data were collected from the Necker Enfants Malades Hospital using a document-based data warehouse, Dr Warehouse, which employs Natural Language Processing, a computer technology consisting in processing written information. Using Unified Medical Language System Meta-thesaurus, phenotype concepts can be recognized in medical reports. We selected individuals with DS (DS Cohort) and individuals with FS (FS Cohort) with confirmed diagnosis after the age of 4 years. A phenome-wide analysis was performed evaluating the statistical associations between the phenotypes of DS and FS, based on concepts found in the reports produced before 2 years and using a series of logistic regressions. Results We found significative higher representation of concepts related to seizures’ phenotypes distinguishing DS from FS in the first phases, namely the major recurrence of complex febrile convulsions (long-lasting and/or with focal signs) and other seizure-types. Some typical early onset non-seizure concepts also emerged, in relation to neurodevelopment and gait disorders. Conclusions Narrative medical reports of individuals younger than 2 years with FS contain specific concepts linked to DS diagnosis, which can be automatically detected by software exploiting NLP. This approach could represent an innovative and sustainable methodology to decrease time of diagnosis of DS and could be transposed to other rare diseases.https://doi.org/10.1186/s13023-021-01936-9Data miningNatural Language ProcessingDravet syndromeRare DiseasesEarly diagnosis
collection DOAJ
language English
format Article
sources DOAJ
author Tommaso Lo Barco
Mathieu Kuchenbuch
Nicolas Garcelon
Antoine Neuraz
Rima Nabbout
spellingShingle Tommaso Lo Barco
Mathieu Kuchenbuch
Nicolas Garcelon
Antoine Neuraz
Rima Nabbout
Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome
Orphanet Journal of Rare Diseases
Data mining
Natural Language Processing
Dravet syndrome
Rare Diseases
Early diagnosis
author_facet Tommaso Lo Barco
Mathieu Kuchenbuch
Nicolas Garcelon
Antoine Neuraz
Rima Nabbout
author_sort Tommaso Lo Barco
title Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome
title_short Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome
title_full Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome
title_fullStr Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome
title_full_unstemmed Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome
title_sort improving early diagnosis of rare diseases using natural language processing in unstructured medical records: an illustration from dravet syndrome
publisher BMC
series Orphanet Journal of Rare Diseases
issn 1750-1172
publishDate 2021-07-01
description Abstract Background The growing use of Electronic Health Records (EHRs) is promoting the application of data mining in health-care. A promising use of big data in this field is to develop models to support early diagnosis and to establish natural history. Dravet Syndrome (DS) is a rare developmental and epileptic encephalopathy that commonly initiates in the first year of life with febrile seizures (FS). Age at diagnosis is often delayed after 2 years, as it is difficult to differentiate DS at onset from FS. We aimed to explore if some clinical terms (concepts) are significantly more used in the electronic narrative medical reports of individuals with DS before the age of 2 years compared to those of individuals with FS. These concepts would allow an earlier detection of patients with DS resulting in an earlier orientation toward expert centers that can provide early diagnosis and care. Methods Data were collected from the Necker Enfants Malades Hospital using a document-based data warehouse, Dr Warehouse, which employs Natural Language Processing, a computer technology consisting in processing written information. Using Unified Medical Language System Meta-thesaurus, phenotype concepts can be recognized in medical reports. We selected individuals with DS (DS Cohort) and individuals with FS (FS Cohort) with confirmed diagnosis after the age of 4 years. A phenome-wide analysis was performed evaluating the statistical associations between the phenotypes of DS and FS, based on concepts found in the reports produced before 2 years and using a series of logistic regressions. Results We found significative higher representation of concepts related to seizures’ phenotypes distinguishing DS from FS in the first phases, namely the major recurrence of complex febrile convulsions (long-lasting and/or with focal signs) and other seizure-types. Some typical early onset non-seizure concepts also emerged, in relation to neurodevelopment and gait disorders. Conclusions Narrative medical reports of individuals younger than 2 years with FS contain specific concepts linked to DS diagnosis, which can be automatically detected by software exploiting NLP. This approach could represent an innovative and sustainable methodology to decrease time of diagnosis of DS and could be transposed to other rare diseases.
topic Data mining
Natural Language Processing
Dravet syndrome
Rare Diseases
Early diagnosis
url https://doi.org/10.1186/s13023-021-01936-9
work_keys_str_mv AT tommasolobarco improvingearlydiagnosisofrarediseasesusingnaturallanguageprocessinginunstructuredmedicalrecordsanillustrationfromdravetsyndrome
AT mathieukuchenbuch improvingearlydiagnosisofrarediseasesusingnaturallanguageprocessinginunstructuredmedicalrecordsanillustrationfromdravetsyndrome
AT nicolasgarcelon improvingearlydiagnosisofrarediseasesusingnaturallanguageprocessinginunstructuredmedicalrecordsanillustrationfromdravetsyndrome
AT antoineneuraz improvingearlydiagnosisofrarediseasesusingnaturallanguageprocessinginunstructuredmedicalrecordsanillustrationfromdravetsyndrome
AT rimanabbout improvingearlydiagnosisofrarediseasesusingnaturallanguageprocessinginunstructuredmedicalrecordsanillustrationfromdravetsyndrome
_version_ 1721296138885136384