Analyzing and Controlling Inter-Head Diversity in Multi-Head Attention
Multi-head attention, a powerful strategy for Transformer, is assumed to utilize information from diverse representation subspaces. However, measuring diversity between heads’ representations or exploiting the diversity has been rarely studied. In this paper, we quantitatively analyze inter-head div...
Main Authors: | Hyeongu Yun, Taegwan Kang, Kyomin Jung |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-02-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/11/4/1548 |
Similar Items
-
A Hierarchical Structured Multi-Head Attention Network for Multi-Turn Response Generation
by: Fei Lin, et al.
Published: (2020-01-01) -
Head as metaphor in Paul
by: A. Wolters
Published: (2011-06-01) -
Attention-Enhanced Graph Convolutional Networks for Aspect-Based Sentiment Classification with Multi-Head Attention
by: Guangtao Xu, et al.
Published: (2021-04-01) -
Capsule Network Improved Multi-Head Attention for Word Sense Disambiguation
by: Jinfeng Cheng, et al.
Published: (2021-03-01) -
Sentiment Analysis of Text Based on Bidirectional LSTM With Multi-Head Attention
by: Fei Long, et al.
Published: (2019-01-01)