Extracting Symptoms from Narrative Text using Artificial Intelligence
Indiana University-Purdue University Indianapolis (IUPUI) === Electronic health records collect an enormous amount of data about patients. However, the information about the patient’s illness is stored in progress notes that are in an un- structured format. It is difficult for humans to annotate sym...
Main Author: | |
---|---|
Other Authors: | |
Language: | en_US |
Published: |
2021
|
Subjects: | |
Online Access: | http://hdl.handle.net/1805/24759 |
id |
ndltd-IUPUI-oai-scholarworks.iupui.edu-1805-24759 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-IUPUI-oai-scholarworks.iupui.edu-1805-247592021-01-28T05:08:16Z Extracting Symptoms from Narrative Text using Artificial Intelligence Gandhi, Priyanka Zou, Xukai Luo, Xiao Xia, Yuni Artificial Intelligence Neural Network Machine Learning Medical Dataset Indiana University-Purdue University Indianapolis (IUPUI) Electronic health records collect an enormous amount of data about patients. However, the information about the patient’s illness is stored in progress notes that are in an un- structured format. It is difficult for humans to annotate symptoms listed in the free text. Recently, researchers have explored the advancements of deep learning can be applied to pro- cess biomedical data. The information in the text can be extracted with the help of natural language processing. The research presented in this thesis aims at automating the process of symptom extraction. The proposed methods use pre-trained word embeddings such as BioWord2Vec, BERT, and BioBERT to generate vectors of the words based on semantics and syntactic structure of sentences. BioWord2Vec embeddings are fed into a BiLSTM neural network with a CRF layer to capture the dependencies between the co-related terms in the sentence. The pre-trained BERT and BioBERT embeddings are fed into the BERT model with a CRF layer to analyze the output tags of neighboring tokens. The research shows that with the help of the CRF layer in neural network models, longer phrases of symptoms can be extracted from the text. The proposed models are compared with the UMLS Metamap tool that uses various sources to categorize the terms in the text to different semantic types and Stanford CoreNLP, a dependency parser, that analyses syntactic relations in the sentence to extract information. The performance of the models is analyzed by using strict, relaxed, and n-gram evaluation schemes. The results show BioBERT with a CRF layer can extract the majority of the human-labeled symptoms. Furthermore, the model is used to extract symptoms from COVID-19 tweets. The model was able to extract symptoms listed by CDC as well as new symptoms. 2021-01-05T18:36:06Z 2021-01-05T18:36:06Z 2020-12 Thesis http://hdl.handle.net/1805/24759 en_US |
collection |
NDLTD |
language |
en_US |
sources |
NDLTD |
topic |
Artificial Intelligence Neural Network Machine Learning Medical Dataset |
spellingShingle |
Artificial Intelligence Neural Network Machine Learning Medical Dataset Gandhi, Priyanka Extracting Symptoms from Narrative Text using Artificial Intelligence |
description |
Indiana University-Purdue University Indianapolis (IUPUI) === Electronic health records collect an enormous amount of data about patients. However, the information about the patient’s illness is stored in progress notes that are in an un- structured format. It is difficult for humans to annotate symptoms listed in the free text. Recently, researchers have explored the advancements of deep learning can be applied to pro- cess biomedical data. The information in the text can be extracted with the help of natural language processing. The research presented in this thesis aims at automating the process of symptom extraction. The proposed methods use pre-trained word embeddings such as BioWord2Vec, BERT, and BioBERT to generate vectors of the words based on semantics and syntactic structure of sentences. BioWord2Vec embeddings are fed into a BiLSTM neural network with a CRF layer to capture the dependencies between the co-related terms in the sentence. The pre-trained BERT and BioBERT embeddings are fed into the BERT model with a CRF layer to analyze the output tags of neighboring tokens. The research shows that with the help of the CRF layer in neural network models, longer phrases of symptoms can be extracted from the text. The proposed models are compared with the UMLS Metamap tool that uses various sources to categorize the terms in the text to different semantic types and Stanford CoreNLP, a dependency parser, that analyses syntactic relations in the sentence to extract information. The performance of the models is analyzed by using strict, relaxed, and n-gram evaluation schemes. The results show BioBERT with a CRF layer can extract the majority of the human-labeled symptoms. Furthermore, the model is used to extract symptoms from COVID-19 tweets. The model was able to extract symptoms listed by CDC as well as new symptoms. |
author2 |
Zou, Xukai |
author_facet |
Zou, Xukai Gandhi, Priyanka |
author |
Gandhi, Priyanka |
author_sort |
Gandhi, Priyanka |
title |
Extracting Symptoms from Narrative Text using Artificial Intelligence |
title_short |
Extracting Symptoms from Narrative Text using Artificial Intelligence |
title_full |
Extracting Symptoms from Narrative Text using Artificial Intelligence |
title_fullStr |
Extracting Symptoms from Narrative Text using Artificial Intelligence |
title_full_unstemmed |
Extracting Symptoms from Narrative Text using Artificial Intelligence |
title_sort |
extracting symptoms from narrative text using artificial intelligence |
publishDate |
2021 |
url |
http://hdl.handle.net/1805/24759 |
work_keys_str_mv |
AT gandhipriyanka extractingsymptomsfromnarrativetextusingartificialintelligence |
_version_ |
1719374470389432320 |