AltibbiVec: A Word Embedding Model for Medical and Health Applications in the Arabic Language

In recent years, the utilization of natural language processing (NLP) and Machine Learning (ML) techniques in clinical decision support systems have shown their ability in improving and automating the diagnosis process, and reducing potential clinical errors. NLP in the Arabic language is more intri...

Full description

Bibliographic Details
Main Authors:	Maria Habib, Mohammad Faris, Alaa Alomari, Hossam Faris
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Arabic fastText GloVe healthcare pre-trained word embedding
Online Access:	https://ieeexplore.ieee.org/document/9548088/

id	doaj-edff818189f949c1b0c60aaa0c2d6c75
record_format	Article
spelling	doaj-edff818189f949c1b0c60aaa0c2d6c752021-10-05T23:01:23ZengIEEEIEEE Access2169-35362021-01-01913387513388810.1109/ACCESS.2021.31156179548088AltibbiVec: A Word Embedding Model for Medical and Health Applications in the Arabic LanguageMaria Habib0https://orcid.org/0000-0001-9642-9597Mohammad Faris1Alaa Alomari2https://orcid.org/0000-0001-9148-3543Hossam Faris3https://orcid.org/0000-0003-4261-8127Altibbi.com, Amman, JordanAltibbi.com, Amman, JordanAltibbi.com, Amman, JordanAltibbi.com, Amman, JordanIn recent years, the utilization of natural language processing (NLP) and Machine Learning (ML) techniques in clinical decision support systems have shown their ability in improving and automating the diagnosis process, and reducing potential clinical errors. NLP in the Arabic language is more intricate due to several limitations, such as the lack of datasets and analytical resources compared to other languages like English. However, a clinical decision support system in the Arabic context is of significant importance. A fundamental process in NLP is extracting features from text-based data via text embedding. Word embedding is a representation of words in a numeric format that encodes the statistic, semantic, or context information. Building a neural word embedding model requires hundreds of thousands of data instances to find hidden patterns of relationships within sentences. Essentially, extracting relevant and informative features promotes the performance of the learning algorithms. The objective of this paper is to propose an Arabic neural-based word embedding model in the medical and healthcare context (called “AltibbiVec”). Around 1.5 million medical consultations and questions written in different dialects are obtained from Altibbi telemedicine company and used to train the embedding model. Three different embedding models are developed and compared, which are Word2Vec, fastText, and GloVe. The trained models were evaluated by different criteria, including the word clustering and the similarity of words. Besides, performing a specialty-based question classification. The results show that Word2Vec and fastText capture sufficiently the semantics of text more than GloVe. Hence, they are recommended for healthcare NLP-based applications.https://ieeexplore.ieee.org/document/9548088/ArabicfastTextGloVehealthcarepre-trainedword embedding
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Maria Habib Mohammad Faris Alaa Alomari Hossam Faris
spellingShingle	Maria Habib Mohammad Faris Alaa Alomari Hossam Faris AltibbiVec: A Word Embedding Model for Medical and Health Applications in the Arabic Language IEEE Access Arabic fastText GloVe healthcare pre-trained word embedding
author_facet	Maria Habib Mohammad Faris Alaa Alomari Hossam Faris
author_sort	Maria Habib
title	AltibbiVec: A Word Embedding Model for Medical and Health Applications in the Arabic Language
title_short	AltibbiVec: A Word Embedding Model for Medical and Health Applications in the Arabic Language
title_full	AltibbiVec: A Word Embedding Model for Medical and Health Applications in the Arabic Language
title_fullStr	AltibbiVec: A Word Embedding Model for Medical and Health Applications in the Arabic Language
title_full_unstemmed	AltibbiVec: A Word Embedding Model for Medical and Health Applications in the Arabic Language
title_sort	altibbivec: a word embedding model for medical and health applications in the arabic language
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2021-01-01
description	In recent years, the utilization of natural language processing (NLP) and Machine Learning (ML) techniques in clinical decision support systems have shown their ability in improving and automating the diagnosis process, and reducing potential clinical errors. NLP in the Arabic language is more intricate due to several limitations, such as the lack of datasets and analytical resources compared to other languages like English. However, a clinical decision support system in the Arabic context is of significant importance. A fundamental process in NLP is extracting features from text-based data via text embedding. Word embedding is a representation of words in a numeric format that encodes the statistic, semantic, or context information. Building a neural word embedding model requires hundreds of thousands of data instances to find hidden patterns of relationships within sentences. Essentially, extracting relevant and informative features promotes the performance of the learning algorithms. The objective of this paper is to propose an Arabic neural-based word embedding model in the medical and healthcare context (called “AltibbiVec”). Around 1.5 million medical consultations and questions written in different dialects are obtained from Altibbi telemedicine company and used to train the embedding model. Three different embedding models are developed and compared, which are Word2Vec, fastText, and GloVe. The trained models were evaluated by different criteria, including the word clustering and the similarity of words. Besides, performing a specialty-based question classification. The results show that Word2Vec and fastText capture sufficiently the semantics of text more than GloVe. Hence, they are recommended for healthcare NLP-based applications.
topic	Arabic fastText GloVe healthcare pre-trained word embedding
url	https://ieeexplore.ieee.org/document/9548088/
work_keys_str_mv	AT mariahabib altibbivecawordembeddingmodelformedicalandhealthapplicationsinthearabiclanguage AT mohammadfaris altibbivecawordembeddingmodelformedicalandhealthapplicationsinthearabiclanguage AT alaaalomari altibbivecawordembeddingmodelformedicalandhealthapplicationsinthearabiclanguage AT hossamfaris altibbivecawordembeddingmodelformedicalandhealthapplicationsinthearabiclanguage
_version_	1716841573646336000

AltibbiVec: A Word Embedding Model for Medical and Health Applications in the Arabic Language

Similar Items