SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning

Bibliographic Details
Main Authors: Wang, Hanrui (Author), Zhang, Zhekai (Author), Han, Song (Author)
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers (IEEE), 2022-07-12T14:08:01Z.
Subjects:
Online Access:Get fulltext
LEADER 00626 am a22001693u 4500
001 143674
042 |a dc 
100 1 0 |a Wang, Hanrui  |e author 
700 1 0 |a Zhang, Zhekai  |e author 
700 1 0 |a Han, Song  |e author 
245 0 0 |a SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning 
260 |b Institute of Electrical and Electronics Engineers (IEEE),   |c 2022-07-12T14:08:01Z. 
856 |z Get fulltext  |u https://hdl.handle.net/1721.1/143674 
546 |a en 
655 7 |a Article 
773 |t 10.1109/HPCA51647.2021.00018 
773 |t 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)