Mixed-precision quantized neural networks with progressively decreasing bitwidth

作者:

Highlights:

• We address the trade-off issue between aggressive model compression and the superior performance of quantized neural networks.

• Based on the observation on internal feature distributions, a mixed-precision QNN with progressively decreasing bitwidth is proposed.

• A heuristic of bitwidth assignment based on the quantitative separability for feature representation is given.

• Several typical CNNs including AlexNex, ResNet and Faster R-CNN are quantized based on the proposed mixed-precision method.

• The experimental results demonstrate that the mixed-precision networks could achieve preferable performance with less memory space.

摘要

•We address the trade-off issue between aggressive model compression and the superior performance of quantized neural networks.•Based on the observation on internal feature distributions, a mixed-precision QNN with progressively decreasing bitwidth is proposed.•A heuristic of bitwidth assignment based on the quantitative separability for feature representation is given.•Several typical CNNs including AlexNex, ResNet and Faster R-CNN are quantized based on the proposed mixed-precision method.•The experimental results demonstrate that the mixed-precision networks could achieve preferable performance with less memory space.

论文关键词:Model compression,Quantized neural networks,Mixed-precision

论文评审过程:Received 2 November 2019, Revised 3 May 2020, Accepted 6 September 2020, Available online 24 September 2020, Version of Record 1 October 2020.

论文官网地址:https://doi.org/10.1016/j.patcog.2020.107647