Classification of Biomedical Texts for Cardiovascular Diseases with Deep Neural Network Using a Weighted Feature Representation Method

This study aims to improve the performance of multiclass classification of biomedical texts for cardiovascular diseases by combining two different feature representation methods, i.e., bag-of-words (BoW) and word embeddings (WE). To hybridize the two feature representations, we investigated a set of...

Full description

Bibliographic Details
Main Authors: Nizar Ahmed, Fatih Dilmaç, Adil Alpkocak
Format: Article
Language:English
Published: MDPI AG 2020-10-01
Series:Healthcare
Subjects:
Online Access:https://www.mdpi.com/2227-9032/8/4/392
id doaj-b0cc4560974445739bf52492929c1753
record_format Article
spelling doaj-b0cc4560974445739bf52492929c17532020-11-25T03:35:23ZengMDPI AGHealthcare2227-90322020-10-01839239210.3390/healthcare8040392Classification of Biomedical Texts for Cardiovascular Diseases with Deep Neural Network Using a Weighted Feature Representation MethodNizar Ahmed0Fatih Dilmaç1Adil Alpkocak2Department of Computer Engineering, Dokuz Eylul University, Tinaztepe Kampusu, 35160 Izmir, TurkeyDepartment of Computer Engineering, Dokuz Eylul University, Tinaztepe Kampusu, 35160 Izmir, TurkeyDepartment of Computer Engineering, Dokuz Eylul University, Tinaztepe Kampusu, 35160 Izmir, TurkeyThis study aims to improve the performance of multiclass classification of biomedical texts for cardiovascular diseases by combining two different feature representation methods, i.e., bag-of-words (BoW) and word embeddings (WE). To hybridize the two feature representations, we investigated a set of possible statistical weighting schemes to combine with each element of WE vectors, which were term frequency (TF), inverse document frequency (IDF) and class probability (CP) methods. Thus, we built a multiclass classification model using a bidirectional long short-term memory (BLSTM) with deep neural networks for all investigated operations of feature vector combinations. We used MIMIC III and the PubMed dataset for the developing language model. To evaluate the performance of our weighted feature representation approaches, we conducted a set of experiments for examining multiclass classification performance with the deep neural network model and other state-of-the-art machine learning (ML) approaches. In all experiments, we used the OHSUMED-400 dataset, which includes PubMed abstracts related with specifically one class over 23 cardiovascular disease categories. Afterwards, we presented the results obtained from experiments and provided a comparison with related research in the literature. The results of the experiment showed that our BLSTM model with the weighting techniques outperformed the baseline and other machine learning approaches in terms of validation accuracy. Finally, our model outperformed the scores of related studies in the literature. This study shows that weighted feature representation improves the performance of the multiclass classification.https://www.mdpi.com/2227-9032/8/4/392biomedical text classificationmulticlass classificationcardiovascular diseasesdeep neural networkfeature representationbidirectional long short-term memory
collection DOAJ
language English
format Article
sources DOAJ
author Nizar Ahmed
Fatih Dilmaç
Adil Alpkocak
spellingShingle Nizar Ahmed
Fatih Dilmaç
Adil Alpkocak
Classification of Biomedical Texts for Cardiovascular Diseases with Deep Neural Network Using a Weighted Feature Representation Method
Healthcare
biomedical text classification
multiclass classification
cardiovascular diseases
deep neural network
feature representation
bidirectional long short-term memory
author_facet Nizar Ahmed
Fatih Dilmaç
Adil Alpkocak
author_sort Nizar Ahmed
title Classification of Biomedical Texts for Cardiovascular Diseases with Deep Neural Network Using a Weighted Feature Representation Method
title_short Classification of Biomedical Texts for Cardiovascular Diseases with Deep Neural Network Using a Weighted Feature Representation Method
title_full Classification of Biomedical Texts for Cardiovascular Diseases with Deep Neural Network Using a Weighted Feature Representation Method
title_fullStr Classification of Biomedical Texts for Cardiovascular Diseases with Deep Neural Network Using a Weighted Feature Representation Method
title_full_unstemmed Classification of Biomedical Texts for Cardiovascular Diseases with Deep Neural Network Using a Weighted Feature Representation Method
title_sort classification of biomedical texts for cardiovascular diseases with deep neural network using a weighted feature representation method
publisher MDPI AG
series Healthcare
issn 2227-9032
publishDate 2020-10-01
description This study aims to improve the performance of multiclass classification of biomedical texts for cardiovascular diseases by combining two different feature representation methods, i.e., bag-of-words (BoW) and word embeddings (WE). To hybridize the two feature representations, we investigated a set of possible statistical weighting schemes to combine with each element of WE vectors, which were term frequency (TF), inverse document frequency (IDF) and class probability (CP) methods. Thus, we built a multiclass classification model using a bidirectional long short-term memory (BLSTM) with deep neural networks for all investigated operations of feature vector combinations. We used MIMIC III and the PubMed dataset for the developing language model. To evaluate the performance of our weighted feature representation approaches, we conducted a set of experiments for examining multiclass classification performance with the deep neural network model and other state-of-the-art machine learning (ML) approaches. In all experiments, we used the OHSUMED-400 dataset, which includes PubMed abstracts related with specifically one class over 23 cardiovascular disease categories. Afterwards, we presented the results obtained from experiments and provided a comparison with related research in the literature. The results of the experiment showed that our BLSTM model with the weighting techniques outperformed the baseline and other machine learning approaches in terms of validation accuracy. Finally, our model outperformed the scores of related studies in the literature. This study shows that weighted feature representation improves the performance of the multiclass classification.
topic biomedical text classification
multiclass classification
cardiovascular diseases
deep neural network
feature representation
bidirectional long short-term memory
url https://www.mdpi.com/2227-9032/8/4/392
work_keys_str_mv AT nizarahmed classificationofbiomedicaltextsforcardiovasculardiseaseswithdeepneuralnetworkusingaweightedfeaturerepresentationmethod
AT fatihdilmac classificationofbiomedicaltextsforcardiovasculardiseaseswithdeepneuralnetworkusingaweightedfeaturerepresentationmethod
AT adilalpkocak classificationofbiomedicaltextsforcardiovasculardiseaseswithdeepneuralnetworkusingaweightedfeaturerepresentationmethod
_version_ 1724554718430101504