Analyzing and Controlling Inter-Head Diversity in Multi-Head Attention

Multi-head attention, a powerful strategy for Transformer, is assumed to utilize information from diverse representation subspaces. However, measuring diversity between heads’ representations or exploiting the diversity has been rarely studied. In this paper, we quantitatively analyze inter-head div...

Full description

Bibliographic Details
Main Authors: Hyeongu Yun, Taegwan Kang, Kyomin Jung
Format: Article
Language:English
Published: MDPI AG 2021-02-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/11/4/1548