## 摘要

We present a simple, highly modularized network architecture for image classification. Our network is constructed by repeating a building block that aggregates a set of transformations with the same topology. Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy exposes a new dimension, which we call “cardinality” (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width. On the ImageNet-1K dataset, we empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy. Moreover, increasing cardinality is more effective than going deeper or wider when we increase the capacity. Our models, named ResNeXt, are the foundations of our entry to the ILSVRC 2016 classification task in which we secured 2nd place. We further investigate ResNeXt on an ImageNet-5K set and the COCO detection set, also showing better results than its ResNet counterpart. The code and models are publicly available online$$^{1}$$.

$$^{1}$$https://github.com/facebookresearch/ResNeXt

## 模板

1. 对于保持相同（空间尺寸和维度）特征图输出的块，使用一样的滤波器个数/大小
2. 每次特征图下采样（空间尺寸除以2）后，输出特征图的宽度（通道数）乘以2

## splitting, transforming, and aggregating

$\sum_{i=1}^{D}w_{i}x_{i}$

1. 分离：将向量$$x$$分离为低维嵌入，对于神经元而言，就是一维子空间$$x_{i}$$
2. 转换：对低维嵌入进行转换，对于神经元而言，就是加权缩放：$$w_{i}x_{i}$$
3. 聚合：聚合所有的转换结果：$$\sum_{i=1}^{D}$$

$F(x) = \sum_{i=1}^{C} \tau_{i}(x)$

• $$C$$表示转换集的大小，论文设置其为超参数cardinality
• $$\tau$$表示转换函数

## 构建块

ResNetXt的残差构建块参考了Inception模块，在残差连接的基础上增加了多分支操作

• 对于Basic Block

• 对于Bottleneck Block

• 下图显示了训练曲线

• 下表显示了训练结果

## 小结

ResNetXtResNet的基础上，结合了Inception模块的多分支（split-transform-merge）思想以及VGGNet的模板模块（stacking building blocks of the same shape）思想，在残差块内部执行多分支计算，并通过实验证明了在给定复杂度的条件下，增加基数（cardinality，就是分支个数）比增加网络层数（go deeper）或者提高模块宽度（Wide ResNet，提高每层维数）更有效