摘要

To obtain excellent deep neural architectures, a series of techniques are carefully designed in EfficientNets. The giant formula for simultaneously enlarging the resolution, depth and width provides us a Rubik's cube for neural networks. So that we can find networks with high efficiency and excellent performance by twisting the three dimensions. This paper aims to explore the twisting rules for obtaining deep neural networks with minimum model sizes and computational costs. Different from the network enlarging, we observe that resolution and depth are more important than width for tiny networks. Therefore, the original method, i.e., the compound scaling in EfficientNet is no longer suitable. To this end, we summarize a tiny formula for downsizing neural architectures through a series of smaller models derived from the EfficientNet-B0 with the FLOPs constraint. Experimental results on the ImageNet benchmark illustrate that our TinyNet performs much better than the smaller version of EfficientNets using the inversed giant formula. For instance, our TinyNet-E achieves a 59.9% Top-1 accuracy with only 24M FLOPs, which is about 1.9% higher than that of the previous best MobileNetV3 with similar computational cost. Code will be available at this https URL, and this https URL.

维度评估

• $$C_{0}$$表示基准模型FLOPs
• $$R_{0}\times R_{0}$$表示基准模型输入图像分辨率大小；
• $$W_{0}$$表示基准模型宽度；
• $$D_{0}$$表示基准模型深度。

• 分辨率对于模型精度有更大影响；
• 从图2（a）可观察到，最佳精度模型分布在分辨率比例$$(0.8, 1.4)$$之间；
• $$r<0.8$$时，模型分辨率更大，精度更高；
• $$r>1.4$$时， 模型分辨率更大，精度相对下降。
• 从图2（b）可观察到，最佳精度模型分布在深度比例为$$(0.5, 2)$$之间；
• 如果约束深度维度大小，有可能限制了更好模型的探索。
• 从图2（c）可观察到，宽度和模型精度有负相关性
• 最好精度的模型均分布在$$w<1$$
• 也就是说，宽度越小，模型精度越高。

Tiny Formula

$w = \sqrt{c / (r^{2}d)}, s.t.\ \ 0 \lt c \lt 1$

实验设置

• 轮数：450
• 优化器：RMSProp，动量0.9，衰减0.9
• 权重衰减：1e-5
• 批量归一化动量：0.99
• 初始学习率：0.048，每隔2.4轮衰减0.97
• warmup：前3
• 批量大小：8V100，每张卡128张图像，共1024每次
• 随机失活：0.2，应用于最后的全连接层
• 指数滑动平均：0.9999

• 轮数：90
• 批量大小：1024
• 优化器：SGD，动量0.9
• 权重衰减1e-4
• 初始学习率：0.4，每隔30轮衰减0.1

TinyNet

ImageNet1000训练中，TinyNet一致性超越了其他小模型系列，论文同时指出RandAugment的有效性。设置RandAugment量级为9，标准方差为0.5，可以提高TinyNet-AEfficientNet-B0的性能。整体训练结果如下图所示

小结

1. 证明了EfficientNet手动设计的放大公式不适用于反向缩放；
2. 证明了模型的分辨率和深度比模型宽度更重要；
3. 提供了TinyNetPytorch实现和GhostNet-A的设计架构。