Weighted Co-Occurrence Bio-term Graph for Unsupervised Word Sense Disambiguation in the Biomedical Domain

Word Sense Disambiguation (WSD) is a significant and challenging task for text understanding and processing. This paper presents an unsupervised approach based on weighted co-occurrence bio-term graph (WCOTG) for performing WSD in the biomedical domain. The graph is automatically created from biomed...

Full description

Bibliographic Details
Main Authors: Jia, Y. (Author), Papadopoulou, M. (Author), Roche, C. (Author), Zhang, X. (Author), Zhang, Z. (Author)
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2023
Subjects:
Online Access:View Fulltext in Publisher
View in Scopus
LEADER 02629nam a2200349Ia 4500
001 10.1109-ACCESS.2023.3272056
008 230529s2023 CNT 000 0 und d
020 |a 21693536 (ISSN) 
245 1 0 |a Weighted Co-Occurrence Bio-term Graph for Unsupervised Word Sense Disambiguation in the Biomedical Domain 
260 0 |b Institute of Electrical and Electronics Engineers Inc.  |c 2023 
300 |a 1 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1109/ACCESS.2023.3272056 
856 |z View in Scopus  |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-85159688018&doi=10.1109%2fACCESS.2023.3272056&partnerID=40&md5=6cd4b24b26e357c3f67b268bde669752 
520 3 |a Word Sense Disambiguation (WSD) is a significant and challenging task for text understanding and processing. This paper presents an unsupervised approach based on weighted co-occurrence bio-term graph (WCOTG) for performing WSD in the biomedical domain. The graph is automatically created from biomedical terms that are extracted from a corpus of downloaded scientific abstracts. Two kinds of weights are introduced on the links of the built bio-term graph and are taken as important factors in the process of disambiguation. The modified Personalised PageRank (PPR) algorithm is used for performing WSD. When evaluated on the NLM-WSD and MSH-WSD1 test datasets, and an acronym test set, the method outperforms the widely used unsupervised ones addressing the same problem, and the average result is almost equal to that of the BlueBERT_LE2-based method. In contrast, our method has no additional enhancement or training for BERT3-based models. Comparative experiments validate the positive effect of links’ weight on disambiguation efficiency. Last, the statistical experiments on the relation among system accuracy, numbers of medical abstracts in the corpus, and the corresponding extracted terms suggest an excellent minimum corpus scale when resources are limited. Author 
650 0 4 |a Biological system modeling 
650 0 4 |a Biomedical informatics 
650 0 4 |a Biomedical Natural language processing 
650 0 4 |a Bit error rate 
650 0 4 |a Natural language processing 
650 0 4 |a Neural networks 
650 0 4 |a Personalised PageRank algorithm 
650 0 4 |a Task analysis 
650 0 4 |a Transformers 
650 0 4 |a Unified medical language system 
650 0 4 |a Unified modeling language 
650 0 4 |a Word sense disambiguation 
700 1 0 |a Jia, Y.  |e author 
700 1 0 |a Papadopoulou, M.  |e author 
700 1 0 |a Roche, C.  |e author 
700 1 0 |a Zhang, X.  |e author 
700 1 0 |a Zhang, Z.  |e author 
773 |t IEEE Access