HAQ: Hardware-Aware Automated Quantization With Mixed Precision

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for...

Full description

Bibliographic Details
Main Authors:	Wang, Kuan (Author), Liu, Zhijian (Author), Lin, Yujun (Author), Lin, Ji (Author), Han, Song (Author)
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor)
Format:	Article
Language:	English
Published:	Institute of Electrical and Electronics Engineers (IEEE), 2021-01-22T13:26:59Z.
Subjects:	Article
Online Access:	Get fulltext

Internet

Get fulltext

HAQ: Hardware-Aware Automated Quantization With Mixed Precision

Internet

Similar Items