ACNet

论文地址:ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks

官方实现:DingXiaoH/ACNet

自定义实现:ZJCV/ZCls

摘要

1
As designing appropriate Convolutional Neural Network (CNN) architecture in the context of a given application usually involves heavy human works or numerous GPU hours, the research community is soliciting the architecture-neutral CNN structures, which can be easily plugged into multiple mature architectures to improve the performance on our real-world applications. We propose Asymmetric Convolution Block (ACB), an architecture-neutral structure as a CNN building block, which uses 1D asymmetric convolutions to strengthen the square convolution kernels. For an off-the-shelf architecture, we replace the standard square-kernel convolutional layers with ACBs to construct an Asymmetric Convolutional Network (ACNet), which can be trained to reach a higher level of accuracy. After training, we equivalently convert the ACNet into the same original architecture, thus requiring no extra computations anymore. We have observed that ACNet can improve the performance of various models on CIFAR and ImageNet by a clear margin. Through further experiments, we attribute the effectiveness of ACB to its capability of enhancing the model's robustness to rotational distortions and strengthening the central skeleton parts of square convolution kernels. 

在给定应用场景的情况下去设计合适的卷积神经网络通常包含了繁重的人工工作以及大量的GPU时间,当前研究社区正在寻求架构中立的CNN结构,这些结构可以很容易地插入到多个成熟的架构中,以提高实际应用的性能。我们提出了非对称卷积块(ACB),一个架构中性的结构作为CNN构造块,它使用1D非对称卷积来加强平方卷积核。对于现有架构而言,我们用ACB代替标准的平方核卷积层来构建非对称卷积网络(ACNet),它可以被训练以达到更高的精度水平。经过训练后,我们等效地将ACNet转换成相同的原始架构,因此不再需要额外的计算。我们观察到ACNet可以显著提高CIFAR和ImageNet上各种模型的性能。通过进一步的实验,我们将ACB算法的有效性归因于它能够增强模型对旋转失真的鲁棒性和增强方形卷积核的中心骨架部分。

解读

ACNet的核心在于提出了一个构建块 - 非对称卷积块(Asymmtric Convolution Block, ACB)。在训练阶段,将ACB替代标准卷积进行训练;在测试阶段,将ACB还原回标准卷积。通过分离训练阶段和测试阶段的模型,在训练时提高了算法泛化能力,在测试时保留了原始模型的执行速度

公式推理

卷积层操作

  • \(M\in R^{U\times V\times C}\)表示输入特征,其空间尺寸为\(U\times V\),通道数为\(C\)
  • \(F\in R^{H\times W\times C}\)表示滤波器,其核大小为\(H\times W\),通道数为\(C\)
  • \(O\in R^{R\times T\times D}\)表示输出特征,其空间尺寸为\(R\times T\),通道数为\(D\)

\(j\)个滤波器的特征提取实现如下:

\[ O_{:,:,j}=\sum_{k=1}^{C}M_{:,:,:k}\ast F_{:,:,k}^{(j)} \]

其中\(\ast\)表示2D卷积操作,\(M_{:,:,k}\)表示输入特征的第\(K\)个通道,是一个\(U\times V\)大小矩阵,\(F_{:,:,k}^{(j)}\)表示滤波器\(F^{(j)}\)的第\(k\)个输入通道,其核大小为\(H\times W\)

批量归一化层操作

批量归一化操作通常位于卷积层之后,其计算公式如下:

\[ O_{:,:,j} = (\sum_{k=1}^{C}M_{:,:,k}\ast F_{:,:,k}^{(j)} - \mu_{j})\frac{\gamma_{j}}{\sigma_{j}}+\beta_{j} \]

  • \(\mu_{j}\)表示逐通道均值
  • \(\sigma_{j}\)表示逐通道标准差
  • \(\gamma_{j}\)表示线性缩放权重
  • \(\beta_{j}\)表示偏差

卷积可加性

1
if several 2D kernels with compatible sizes operate on the same input with the same stride to produce outputs of the same resolution, and their outputs are summed up, we can add up these kernels on the corresponding positions to obtain an equivalent kernel which will produce the same output

简单地说,如果多个卷积滤波器能够对同一输入计算得到相同大小输出,那么这些滤波器的卷积核可以相加在一起,其计算得到的输出和分开计算的输出求和后的结果相同。计算公式如下所示

\[ I\ast K^{(1)} + I\ast K^{(2)} = I\ast (K^{(1)}\bigoplus K^{(2)}) \]

  • \(I\)表示矩阵
  • \(K^{(1)}\)\(K^{(2)}\)表示2D卷积核
  • \(\bigoplus\)表示逐元素加法

注意一:不同卷积核需要兼容。满足如下等式即可

\[ M^{(p)} = M^{(q)}, H_{p} \leq H_{q}, W_{p} \leq W_{q}, D_{p}=D_{q} \]

\(M\)表示输入特征图大小,\(H/W/D\)分别表示长、宽和深度。比如\(3\times 1\)\(1\times 3\)卷积核均与\(3\times 3\)卷积核兼容

注意二:输入矩阵\(I\)有可能需要进行裁剪或者填充。如下图所示

  • 对于大小为\(1\times d\)的滤波器而言,其输入特征图应该对维度\(H\)进行裁剪,大小为\(d/2\)
  • 同理,对于大小为\(d\times 1\)的滤波器而言,其输入特征图应该对维度\(W\)进行裁剪,大小为\(d/2\)

ACB

训练阶段

  • 假定标准卷积的核大小为\(d\times d\)
  • ACB构造了3个并行卷积,其核大小分别为\(d\times d, 1\times d, d\times 1\)
  • 每个卷积层之后跟随着一个BN层;
  • 输入数据分别经过3个卷积+BN的特征提取后,最后进行求和操作(以丰富特征空间)

推理阶段

完成训练后,将ACB模块重新转换成标准卷积,从而在不增加额外计算量的情况下提高模型泛化性能。共分为两步实现:

  1. BN融合(BN fusion):融合同一分支下的Conv+BN -> Conv
  2. 分支融合(branch fusion):融合多条分支的Conv

BN融合

\[ O_{:,:,j} = (\sum_{k=1}^{C}M_{:,:,k}\ast F_{:,:,k}^{(j)} - \mu_{j})\frac{\gamma_{j}}{\sigma_{j}}+\beta_{j}\\ =\sum_{k=1}^{C}M_{:,:,k}\ast (\frac{\gamma_{j}}{\sigma_{j}}F_{:,:,k}^{(j)}) - \mu_{j}\times \frac{\gamma_{j}}{\sigma_{j}}+\beta_{j} \]

设置\(\frac{\gamma_{j}}{\sigma_{j}}F_{:,:,k}^{(j)}\)为核大小,设置\(- \mu_{j}\times \frac{\gamma_{j}}{\sigma_{j}}+\beta_{j}\)为偏置大小,即可完成Conv+BN的融合

分支融合

\[ {F}^{'(j)} = \frac{\gamma_{j}}{\sigma_{j}}F^{(j)} \bigoplus \frac{\bar{\gamma}_{j}}{\bar{\sigma}_{j}}\bar{F}^{(j)}\bigoplus \frac{\hat{\gamma}_{j}}{\hat{\sigma}_{j}}\hat{F}^{(j)}\\ b_{j}=- \mu_{j}\times \frac{\gamma_{j}}{\sigma_{j}}- \bar{\mu}_{j}\times \frac{\bar{\gamma}_{j}}{\bar{\sigma}_{j}}- \hat{\mu}_{j}\times \frac{\hat{\gamma}_{j}}{\hat{\sigma}_{j}}+\beta_{j}+\bar{\beta}_{j} +\hat{\beta}_{j}\]

  • ${F}^{'(j)} $是融合后的卷积核
  • \(b_{j}\)是融合后的偏置值
  • \(\bar{F}^{(j)}\)\(\hat{F}^{(j)}\)分别是相对应的\(1\times 3\)\(3\times 1\)大小卷积核

通过上述转换后,可以得到

\[ O_{:,:,j} + \bar{O}_{:,:,j}+\hat{O}_{:,:,j} = \sum_{k=1}^{C}M_{:,:,k}\ast {F}^{'(j)}_{:,:,k}+ b_{j} \]

实验

通过实验证明了ACB能够有效的在多个不同数据集(CIFAR10/CIFAR100/ImageNet)上提升模型(VGG/ResNet/WRN/DenseNet)性能

消融研究

通过实验证明了水平卷积核、垂直卷积核以及BN的有效性

相关阅读