Capsule Network Improved Multi-Head Attention for Word Sense Disambiguation

Word sense disambiguation (WSD) is one of the core problems in natural language processing (NLP), which is to map an ambiguous word to its correct meaning in a specific context. There has been a lively interest in incorporating sense definition (gloss) into neural networks in recent studies, which m...

Full description

Bibliographic Details
Main Authors: Jinfeng Cheng, Weiqin Tong, Weian Yan
Format: Article
Language:English
Published: MDPI AG 2021-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/11/6/2488
id doaj-89bb76b4eab34a0e99c841269efbeaaa
record_format Article
spelling doaj-89bb76b4eab34a0e99c841269efbeaaa2021-03-11T00:06:17ZengMDPI AGApplied Sciences2076-34172021-03-01112488248810.3390/app11062488Capsule Network Improved Multi-Head Attention for Word Sense DisambiguationJinfeng Cheng0Weiqin Tong1Weian Yan2School of Computer Engineering and Science, Shanghai University, Shanghai 200444, ChinaSchool of Computer Engineering and Science, Shanghai University, Shanghai 200444, ChinaSchool of Computer Engineering and Science, Shanghai University, Shanghai 200444, ChinaWord sense disambiguation (WSD) is one of the core problems in natural language processing (NLP), which is to map an ambiguous word to its correct meaning in a specific context. There has been a lively interest in incorporating sense definition (gloss) into neural networks in recent studies, which makes great contribution to improving the performance of WSD. However, disambiguating polysemes of rare senses is still hard. In this paper, while taking gloss into consideration, we further improve the performance of the WSD system from the perspective of semantic representation. We encode the context and sense glosses of the target polysemy independently using encoders with the same structure. To obtain a better presentation in each encoder, we leverage the capsule network to capture different important information contained in multi-head attention. We finally choose the gloss representation closest to the context representation of the target word as its correct sense. We do experiments on English all-words WSD task. Experimental results show that our method achieves good performance, especially having an inspiring effect on disambiguating words of rare senses.https://www.mdpi.com/2076-3417/11/6/2488word sense disambiguationmulti-head attentioncapsule networkcapsule routing
collection DOAJ
language English
format Article
sources DOAJ
author Jinfeng Cheng
Weiqin Tong
Weian Yan
spellingShingle Jinfeng Cheng
Weiqin Tong
Weian Yan
Capsule Network Improved Multi-Head Attention for Word Sense Disambiguation
Applied Sciences
word sense disambiguation
multi-head attention
capsule network
capsule routing
author_facet Jinfeng Cheng
Weiqin Tong
Weian Yan
author_sort Jinfeng Cheng
title Capsule Network Improved Multi-Head Attention for Word Sense Disambiguation
title_short Capsule Network Improved Multi-Head Attention for Word Sense Disambiguation
title_full Capsule Network Improved Multi-Head Attention for Word Sense Disambiguation
title_fullStr Capsule Network Improved Multi-Head Attention for Word Sense Disambiguation
title_full_unstemmed Capsule Network Improved Multi-Head Attention for Word Sense Disambiguation
title_sort capsule network improved multi-head attention for word sense disambiguation
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2021-03-01
description Word sense disambiguation (WSD) is one of the core problems in natural language processing (NLP), which is to map an ambiguous word to its correct meaning in a specific context. There has been a lively interest in incorporating sense definition (gloss) into neural networks in recent studies, which makes great contribution to improving the performance of WSD. However, disambiguating polysemes of rare senses is still hard. In this paper, while taking gloss into consideration, we further improve the performance of the WSD system from the perspective of semantic representation. We encode the context and sense glosses of the target polysemy independently using encoders with the same structure. To obtain a better presentation in each encoder, we leverage the capsule network to capture different important information contained in multi-head attention. We finally choose the gloss representation closest to the context representation of the target word as its correct sense. We do experiments on English all-words WSD task. Experimental results show that our method achieves good performance, especially having an inspiring effect on disambiguating words of rare senses.
topic word sense disambiguation
multi-head attention
capsule network
capsule routing
url https://www.mdpi.com/2076-3417/11/6/2488
work_keys_str_mv AT jinfengcheng capsulenetworkimprovedmultiheadattentionforwordsensedisambiguation
AT weiqintong capsulenetworkimprovedmultiheadattentionforwordsensedisambiguation
AT weianyan capsulenetworkimprovedmultiheadattentionforwordsensedisambiguation
_version_ 1724226136418811904