HAQ: Hardware-Aware Automated Quantization With Mixed Precision
Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for...
Main Authors: | Wang, Kuan (Author), Liu, Zhijian (Author), Lin, Yujun (Author), Lin, Ji (Author), Han, Song (Author) |
---|---|
Other Authors: | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor) |
Format: | Article |
Language: | English |
Published: |
Institute of Electrical and Electronics Engineers (IEEE),
2021-01-22T13:26:59Z.
|
Subjects: | |
Online Access: | Get fulltext |
Similar Items
-
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
by: Wang, Tianzhe, et al.
Published: (2021) -
Al-Haq A Global History of the First Palestinian Human Rights Organization
Published: (2021) -
HAQ-DI Italian version in systemic sclerosis
by: I. Chiarolanza, et al.
Published: (2011-09-01) -
Using Quantization-Aware Training Technique with Post-Training Fine-Tuning Quantization to Implement a MobileNet Hardware Accelerator
by: CHEN, WEI-TING, et al.
Published: (2019) -
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
by: Wang, Hanrui, et al.
Published: (2022)