Mixed-precision quantized neural networks with progressively decreasing bitwidth
作者:
Highlights:
• We address the trade-off issue between aggressive model compression and the superior performance of quantized neural networks.
• Based on the observation on internal feature distributions, a mixed-precision QNN with progressively decreasing bitwidth is proposed.
• A heuristic of bitwidth assignment based on the quantitative separability for feature representation is given.
• Several typical CNNs including AlexNex, ResNet and Faster R-CNN are quantized based on the proposed mixed-precision method.
• The experimental results demonstrate that the mixed-precision networks could achieve preferable performance with less memory space.
摘要
•We address the trade-off issue between aggressive model compression and the superior performance of quantized neural networks.•Based on the observation on internal feature distributions, a mixed-precision QNN with progressively decreasing bitwidth is proposed.•A heuristic of bitwidth assignment based on the quantitative separability for feature representation is given.•Several typical CNNs including AlexNex, ResNet and Faster R-CNN are quantized based on the proposed mixed-precision method.•The experimental results demonstrate that the mixed-precision networks could achieve preferable performance with less memory space.
论文关键词:Model compression,Quantized neural networks,Mixed-precision
论文评审过程:Received 2 November 2019, Revised 3 May 2020, Accepted 6 September 2020, Available online 24 September 2020, Version of Record 1 October 2020.
论文官网地址:https://doi.org/10.1016/j.patcog.2020.107647