HAQ: Hardware-Aware Automated Quantization With Mixed Precision
Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for...
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
Institute of Electrical and Electronics Engineers (IEEE),
2021-01-22T13:26:59Z.
|
Subjects: | |
Online Access: | Get fulltext |