## 摘要

Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections. Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections. On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9x, from 61 million to 6.7 million, without incurring accuracy loss. Similar experiments with VGG-16 found that the number of parameters can be reduced by 13x, from 138 million to 10.3 million, again with no loss of accuracy.

## 引言

• 对芯片内SRAM进行一次32位系数访问需要5pJ
• 在芯片外DRAM进行一次32位系数访问需要640pJ

## 3阶段剪枝

• 第一步：正常训练。学习有效的连接；
• 第二步：修剪低权重连接。使用一个全局阈值修剪权重低于该阈值的连接；
• 第三步：微调训练。恢复网络剪枝后损失的性能。

### L1/L2正则化

1. 使用L1还是L2正则化；
2. 是否进行重训练；
3. 交替使用L1L2正则化方法；

### 随机失活率调整

• $$C_{i}$$表示第$$i$$层的连接数；
• $$C_{i0}$$表示原始网络；
• $$C_{ir}$$表示重训练后的网络；
• $$N_{i}$$表示第$$i$$层的神经元个数；

$$i$$层的连接数$$C_{i}$$由第$$i$$层和第$$i+1$$层的神经元个数决定：

$C_{i}=N_{i}N_{i-1}$

$D_{r}=D_{0}\sqrt{\frac{C_{ir}}{C_{i0}}}$

• $$D_{0}$$表示原始随机失活率；
• $$D_{r}$$表示重训练阶段的随机失活率；
• $$C_{ir}$$表示第$$i$$层剪枝后的连接数；
• $$C_{i0}$$表示第$$i$$层原始的连接数。

## 实验

### 逐层剪枝效果

1. 卷积层的剪枝率小于全连接层；
2. 越靠近输入的卷积层的剪枝率越低。