A large dimensional matrix chain matrix multiplier for extremely low IO bandwidth requirements
Large-dimensional matrix multiplication is often implemented by submatrix block method. The maximum size of the submatrix determines the speed of the entire matrix multiplication. Concerning the problem that the matrix size directly processed by the classical systolic structure is severely limited b...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
National Computer System Engineering Research Institute of China
2019-09-01
|
Series: | Dianzi Jishu Yingyong |
Subjects: | |
Online Access: | http://www.chinaaet.com/article/3000108356 |