DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING

孟氏使阳肤为士师，问于曾子。曾子曰：“上失其道，民散久矣。如得其情，则哀矜而勿喜。”

DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING

这个还是压缩网络的方法，但是不是从重新搞模型的角度，而是从怎么存储，怎么优化的角度出发，像是研一学的信息检索的好多东西。

=====================

整体流程是这样的

来来来一步步的讲：

首先是 prouning：

we start by learning the connectivity via normal network training. Next, we prune the small-weight connections: all connections with weights below a threshold are removed from the network. Finally, we retrain the network to learn the ﬁnal weights for the remaining sparse connections.

也就是先训练一个大模型，然后将（绝对）值小的weight给去掉，最后在剩下的这些还保留的weight上面继续训练

一般这一步可以将参数减小10倍左右

然后第二步是量子化

也就是把所有的weight分层，把各个层之间的weight统一表示如下图：

其中上面是表示的过程，然后下面是更新的过程，

注意上面我们把weight分成四类，然后每一类都有一个公共的值，然后再更新的时候，每一个类的更新值也是各自的更新值的求和。

for a network with n connections and each connection is represented with b bits, constraining the connections to have only k shared weights will result in a compression rate of: