A large dimensional matrix chain matrix multiplier for extremely low IO bandwidth requirements

Large-dimensional matrix multiplication is often implemented by submatrix block method. The maximum size of the submatrix determines the speed of the entire matrix multiplication. Concerning the problem that the matrix size directly processed by the classical systolic structure is severely limited b...

Full description

Bibliographic Details
Main Authors: Song Yukun, Zheng Qiangqiang, Wang Zezhong, Zhang Duoli
Format: Article
Language:zho
Published: National Computer System Engineering Research Institute of China 2019-09-01
Series:Dianzi Jishu Yingyong
Subjects:
Online Access:http://www.chinaaet.com/article/3000108356