Extracting Symptoms from Narrative Text using Artificial Intelligence

Indiana University-Purdue University Indianapolis (IUPUI) === Electronic health records collect an enormous amount of data about patients. However, the information about the patient’s illness is stored in progress notes that are in an un- structured format. It is difficult for humans to annotate sym...

Full description

Bibliographic Details
Main Author: Gandhi, Priyanka
Other Authors: Zou, Xukai
Language:en_US
Published: 2021
Subjects:
Online Access:http://hdl.handle.net/1805/24759
id ndltd-IUPUI-oai-scholarworks.iupui.edu-1805-24759
record_format oai_dc
spelling ndltd-IUPUI-oai-scholarworks.iupui.edu-1805-247592021-01-28T05:08:16Z Extracting Symptoms from Narrative Text using Artificial Intelligence Gandhi, Priyanka Zou, Xukai Luo, Xiao Xia, Yuni Artificial Intelligence Neural Network Machine Learning Medical Dataset Indiana University-Purdue University Indianapolis (IUPUI) Electronic health records collect an enormous amount of data about patients. However, the information about the patient’s illness is stored in progress notes that are in an un- structured format. It is difficult for humans to annotate symptoms listed in the free text. Recently, researchers have explored the advancements of deep learning can be applied to pro- cess biomedical data. The information in the text can be extracted with the help of natural language processing. The research presented in this thesis aims at automating the process of symptom extraction. The proposed methods use pre-trained word embeddings such as BioWord2Vec, BERT, and BioBERT to generate vectors of the words based on semantics and syntactic structure of sentences. BioWord2Vec embeddings are fed into a BiLSTM neural network with a CRF layer to capture the dependencies between the co-related terms in the sentence. The pre-trained BERT and BioBERT embeddings are fed into the BERT model with a CRF layer to analyze the output tags of neighboring tokens. The research shows that with the help of the CRF layer in neural network models, longer phrases of symptoms can be extracted from the text. The proposed models are compared with the UMLS Metamap tool that uses various sources to categorize the terms in the text to different semantic types and Stanford CoreNLP, a dependency parser, that analyses syntactic relations in the sentence to extract information. The performance of the models is analyzed by using strict, relaxed, and n-gram evaluation schemes. The results show BioBERT with a CRF layer can extract the majority of the human-labeled symptoms. Furthermore, the model is used to extract symptoms from COVID-19 tweets. The model was able to extract symptoms listed by CDC as well as new symptoms. 2021-01-05T18:36:06Z 2021-01-05T18:36:06Z 2020-12 Thesis http://hdl.handle.net/1805/24759 en_US
collection NDLTD
language en_US
sources NDLTD
topic Artificial Intelligence
Neural Network
Machine Learning
Medical Dataset
spellingShingle Artificial Intelligence
Neural Network
Machine Learning
Medical Dataset
Gandhi, Priyanka
Extracting Symptoms from Narrative Text using Artificial Intelligence
description Indiana University-Purdue University Indianapolis (IUPUI) === Electronic health records collect an enormous amount of data about patients. However, the information about the patient’s illness is stored in progress notes that are in an un- structured format. It is difficult for humans to annotate symptoms listed in the free text. Recently, researchers have explored the advancements of deep learning can be applied to pro- cess biomedical data. The information in the text can be extracted with the help of natural language processing. The research presented in this thesis aims at automating the process of symptom extraction. The proposed methods use pre-trained word embeddings such as BioWord2Vec, BERT, and BioBERT to generate vectors of the words based on semantics and syntactic structure of sentences. BioWord2Vec embeddings are fed into a BiLSTM neural network with a CRF layer to capture the dependencies between the co-related terms in the sentence. The pre-trained BERT and BioBERT embeddings are fed into the BERT model with a CRF layer to analyze the output tags of neighboring tokens. The research shows that with the help of the CRF layer in neural network models, longer phrases of symptoms can be extracted from the text. The proposed models are compared with the UMLS Metamap tool that uses various sources to categorize the terms in the text to different semantic types and Stanford CoreNLP, a dependency parser, that analyses syntactic relations in the sentence to extract information. The performance of the models is analyzed by using strict, relaxed, and n-gram evaluation schemes. The results show BioBERT with a CRF layer can extract the majority of the human-labeled symptoms. Furthermore, the model is used to extract symptoms from COVID-19 tweets. The model was able to extract symptoms listed by CDC as well as new symptoms.
author2 Zou, Xukai
author_facet Zou, Xukai
Gandhi, Priyanka
author Gandhi, Priyanka
author_sort Gandhi, Priyanka
title Extracting Symptoms from Narrative Text using Artificial Intelligence
title_short Extracting Symptoms from Narrative Text using Artificial Intelligence
title_full Extracting Symptoms from Narrative Text using Artificial Intelligence
title_fullStr Extracting Symptoms from Narrative Text using Artificial Intelligence
title_full_unstemmed Extracting Symptoms from Narrative Text using Artificial Intelligence
title_sort extracting symptoms from narrative text using artificial intelligence
publishDate 2021
url http://hdl.handle.net/1805/24759
work_keys_str_mv AT gandhipriyanka extractingsymptomsfromnarrativetextusingartificialintelligence
_version_ 1719374470389432320