GoogLeNet_BN

论文Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift将批量归一化方法作用于卷积神经网络,通过校正每层输入数据的数据分布,从而达到更快的训练目的。在文章最后,添加批量归一化层到GoogLeNet网络,得到了更好的检测效果

参数解析

论文中以表格方式给出了GoogLeNet_BN的参数设置

其相对于GoogLeNet的修改如下:

  1. Inception模块中,\(5\times 5\)卷积层通过两个\(3\times 3\)卷积层进行替代。该实现使得网络增加了9个权重层,从而使得参数数量提高了25%,计算耗时增加了30%
  2. 增加了Inception (3c)
  3. Inception模块中,使用平均池化(average pooling)或者最大池化(max pooling
  4. 在各个Inception模块之间不再使用池化层进行操作,而是在Inception 3c/4e模块中使用步长进行减半操作

同时GoogLeNet_BN在第一个卷积层使用了深度乘数为8的可分离卷积,以此来加速计算

Our model employed separable convolution with depth multiplier 8 on the first convolutional layer. This reduces the computational cost while increasing the memory consumption at training time

Note:经过计算后发现,Inception (4c/d/e)的输出深度有错误,应该分别为\(608/608/1056\)

推导

Inception 3(a/b/c)模块为例,尝试推导修改后的模块实现

假定输入大小为\(128\times 192\times 28\times 28\)

Inception (3a)

1x1

  • 输入数据体:\(128\times 192\times 28\times 28\)
  • 卷积核大小为\(1\times 1\),步长为\(1\),零填充为\(0\)
  • 滤波器个数:\(64\)
  • 输出数据体:\(128\times 64\times 28\times 28\)

3x3

先执行\(1\times 1\)大小卷积操作

  • 输入数据体:\(128\times 192\times 28\times 28\)
  • 卷积核大小为\(1\times 1\),步长为\(1\),零填充为\(0\)
  • 滤波器个数:\(64\)
  • 输出数据体:\(128\times 64\times 28\times 28\)

再执行\(3\times 3\)大小卷积操作

  • 输入数据体:\(128\times 64\times 28\times 28\)
  • 卷积核大小为\(3\times 3\),步长为\(1\),零填充为\(1\)
  • 滤波器个数:\(64\)
  • 输出数据体:\(128\times 64\times 28\times 28\)

double 3x3

先执行\(1\times 1\)大小卷积操作

  • 输入数据体:\(128\times 192\times 28\times 28\)
  • 卷积核大小为\(1\times 1\),步长为\(1\),零填充为\(0\)
  • 滤波器个数:\(64\)
  • 输出数据体:\(128\times 64\times 28\times 28\)

第一次执行\(3\times 3\)大小卷积操作

  • 输入数据体:\(128\times 64\times 28\times 28\)
  • 卷积核大小为\(3\times 3\),步长为\(1\),零填充为\(1\)
  • 滤波器个数:\(96\)
  • 输出数据体:\(128\times 96\times 28\times 28\)

第二次执行\(3\times 3\)大小卷积操作

  • 输入数据体:\(128\times 96\times 28\times 28\)
  • 卷积核大小为\(3\times 3\),步长为\(1\),零填充为\(1\)
  • 滤波器个数:\(96\)
  • 输出数据体:\(128\times 96\times 28\times 28\)

avg pooling

先执行\(Average Pooling\)操作

  • 输入数据体:\(128\times 192\times 28\times 28\)
  • 卷积核大小为\(3\times 3\),步长为\(1\),零填充为\(1\)
  • 输出数据体:\(128\times 192\times 28\times 28\)

再执行\(1\times 1\)大小卷积操作

  • 输入数据体:\(128\times 192\times 28\times 28\)
  • 卷积核大小为\(1\times 1\),步长为\(1\),零填充为\(0\)
  • 滤波器个数:\(32\)
  • 输出数据体:\(128\times 32\times 28\times 28\)

连接

上述4个子模块计算得到了相同的空间尺寸的输出书具体,然后按深度通道进行连接,最后得到\(128\times 256\times 28\times 28\)大小的输出数据体

Inception (3b)

1x1

  • 输入数据体:\(128\times 256\times 28\times 28\)
  • 卷积核大小为\(1\times 1\),步长为\(1\),零填充为\(0\)
  • 滤波器个数:\(64\)
  • 输出数据体:\(128\times 64\times 28\times 28\)

3x3

先执行\(1\times 1\)大小卷积操作

  • 输入数据体:\(128\times 256\times 28\times 28\)
  • 卷积核大小为\(1\times 1\),步长为\(1\),零填充为\(0\)
  • 滤波器个数:\(64\)
  • 输出数据体:\(128\times 64\times 28\times 28\)

再执行\(3\times 3\)大小卷积操作

  • 输入数据体:\(128\times 64\times 28\times 28\)
  • 卷积核大小为\(3\times 3\),步长为\(1\),零填充为\(1\)
  • 滤波器个数:\(96\)
  • 输出数据体:\(128\times 96\times 28\times 28\)

double 3x3

先执行\(1\times 1\)大小卷积操作

  • 输入数据体:\(128\times 256\times 28\times 28\)
  • 卷积核大小为\(1\times 1\),步长为\(1\),零填充为\(0\)
  • 滤波器个数:\(64\)
  • 输出数据体:\(128\times 64\times 28\times 28\)

第一次执行\(3\times 3\)大小卷积操作

  • 输入数据体:\(128\times 64\times 28\times 28\)
  • 卷积核大小为\(3\times 3\),步长为\(1\),零填充为\(1\)
  • 滤波器个数:\(96\)
  • 输出数据体:\(128\times 96\times 28\times 28\)

第二次执行\(3\times 3\)大小卷积操作

  • 输入数据体:\(128\times 96\times 28\times 28\)
  • 卷积核大小为\(3\times 3\),步长为\(1\),零填充为\(1\)
  • 滤波器个数:\(96\)
  • 输出数据体:\(128\times 96\times 28\times 28\)

avg pooling

先执行\(Average Pooling\)操作

  • 输入数据体:\(128\times 256\times 28\times 28\)
  • 卷积核大小为\(3\times 3\),步长为\(1\),零填充为\(1\)
  • 输出数据体:\(128\times 256\times 28\times 28\)

再执行\(1\times 1\)大小卷积操作

  • 输入数据体:\(128\times 256\times 28\times 28\)
  • 卷积核大小为\(1\times 1\),步长为\(1\),零填充为\(0\)
  • 滤波器个数:\(64\)
  • 输出数据体:\(128\times 64\times 28\times 28\)

连接

上述4个子模块计算得到了相同的空间尺寸的输出书具体,然后按深度通道进行连接,最后得到\(128\times 320\times 28\times 28\)大小的输出数据体

Inception (3c)

其步长为\(2\),执行空间尺寸减半操作,所以在此模块中不单独执行\(1\times 1\)大小卷积层操作

3x3

先执行\(1\times 1\)大小卷积操作

  • 输入数据体:\(128\times 320\times 28\times 28\)
  • 卷积核大小为\(1\times 1\),步长为\(1\),零填充为\(0\)
  • 滤波器个数:\(128\)
  • 输出数据体:\(128\times 128\times 28\times 28\)

再执行\(3\times 3\)大小卷积操作

  • 输入数据体:\(128\times 128\times 28\times 28\)
  • 卷积核大小为\(3\times 3\),步长为\(2\),零填充为\(1\)
  • 滤波器个数:\(160\)
  • 输出数据体:\(128\times 160\times 14\times 14\)

double 3x3

先执行\(1\times 1\)大小卷积操作

  • 输入数据体:\(128\times 320\times 28\times 28\)
  • 卷积核大小为\(1\times 1\),步长为\(1\),零填充为\(0\)
  • 滤波器个数:\(64\)
  • 输出数据体:\(128\times 64\times 28\times 28\)

第一次执行\(3\times 3\)大小卷积操作

  • 输入数据体:\(128\times 64\times 28\times 28\)
  • 卷积核大小为\(3\times 3\),步长为\(1\),零填充为\(1\)
  • 滤波器个数:\(96\)
  • 输出数据体:\(128\times 96\times 28\times 28\)

第二次执行\(3\times 3\)大小卷积操作

  • 输入数据体:\(128\times 96\times 28\times 28\)
  • 卷积核大小为\(3\times 3\),步长为\(2\),零填充为\(1\)
  • 滤波器个数:\(96\)
  • 输出数据体:\(128\times 96\times 14\times 14\)

max pooling

先执行\(Max Pooling\)操作

  • 输入数据体:\(128\times 320\times 28\times 28\)
  • 卷积核大小为\(3\times 3\),步长为\(2\),零填充为\(1\)
  • 输出数据体:\(128\times 320\times 14\times 14\)

连接

上述4个子模块计算得到了相同的空间尺寸的输出数据,然后按深度通道进行连接,最后得到\(128\times 576\times 28\times 28\)大小的输出数据体(???,没有理解stride=2的目的,抑或者是参数表的错误。当前具体实现中不使用stride=2进行减半,还是通过Max Pooling

PyTorch

BasicConv2d

在卷积操作后执行批量归一化

1
2
3
4
5
6
7
8
9
10
11
class BasicConv2d(nn.Module):

def __init__(self, in_channels, out_channels, **kwargs):
super(BasicConv2d, self).__init__()
self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)
self.bn = nn.BatchNorm2d(out_channels, eps=0.001)

def forward(self, x):
x = self.conv(x)
# x = self.bn(x)
return F.relu(x, inplace=True)

Inception

  1. \(1\times 1\)大小卷积层可能不存在
  2. 修改\(5\times 5\)卷积操作为两个\(3\times 3\)卷积操作
  3. 根据输入选择最大池化或者平均池化操作
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
class Inception(nn.Module):
__constants__ = ['branch2', 'branch3', 'branch4']

def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, dch3x3red, dch3x3, pool_proj,
conv_block=None, stride_num=1, pool_type='max'):
super(Inception, self).__init__()
if conv_block is None:
conv_block = BasicConv2d
if ch1x1 == 0:
self.branch1 = None
else:
self.branch1 = conv_block(in_channels, ch1x1, kernel_size=1, stride=1, padding=0)

self.branch2 = nn.Sequential(
conv_block(in_channels, ch3x3red, kernel_size=1, stride=1, padding=0),
conv_block(ch3x3red, ch3x3, kernel_size=3, stride=stride_num, padding=1)
)

self.branch3 = nn.Sequential(
conv_block(in_channels, dch3x3red, kernel_size=1, stride=1, padding=0),
conv_block(dch3x3red, dch3x3, kernel_size=5, stride=stride_num, padding=1),
conv_block(dch3x3, dch3x3, kernel_size=5, stride=stride_num, padding=1),
)

if pool_proj != 0:
if pool_type == 'max':
self.branch4 = nn.Sequential(
nn.MaxPool2d(kernel_size=3, stride=stride_num, padding=1, ceil_mode=True),
conv_block(in_channels, pool_proj, kernel_size=1, stride=1, padding=0)
)
else:
# avg pooling
self.branch4 = nn.Sequential(
nn.AvgPool2d(kernel_size=3, stride=1, padding=1, ceil_mode=True),
conv_block(in_channels, pool_proj, kernel_size=1, stride=1, padding=0)
)
else:
# only max pooling
self.branch4 = nn.MaxPool2d(kernel_size=3, stride=stride_num, padding=1, ceil_mode=True)

def _forward(self, x):
branch2 = self.branch2(x)
branch3 = self.branch3(x)
branch4 = self.branch4(x)

if self.branch1 is not None:
branch1 = self.branch1(x)
outputs = [branch1, branch2, branch3, branch4]
else:
outputs = [branch2, branch3, branch4]
return outputs

def forward(self, x):
outputs = self._forward(x)
return torch.cat(outputs, 1)

GoogLeNet_BN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
class GoogLeNet_BN(nn.Module):
__constants__ = ['aux_logits', 'transform_input']

def __init__(self, num_classes=1000, aux_logits=True, transform_input=False, init_weights=True,
blocks=None):
"""
GoogLeNet实现
:param num_classes: 输出类别数
:param aux_logits: 是否使用辅助分类器
:param transform_input:
:param init_weights:
:param blocks:
"""
super(GoogLeNet_BN, self).__init__()
if blocks is None:
blocks = [BasicConv2d, Inception, InceptionAux]
assert len(blocks) == 3
conv_block = blocks[0]
inception_block = blocks[1]
inception_aux_block = blocks[2]

self.aux_logits = aux_logits
self.transform_input = transform_input

self.conv1 = conv_block(3, 64, kernel_size=7, stride=2, padding=3)
self.maxpool1 = nn.MaxPool2d(3, stride=2, padding=0, ceil_mode=True)
self.conv2 = conv_block(64, 64, kernel_size=1, stride=1, padding=0)
self.conv3 = conv_block(64, 192, kernel_size=3, stride=1, padding=1)
self.maxpool2 = nn.MaxPool2d(3, stride=2, padding=0, ceil_mode=True)

self.inception3a = inception_block(192, 64, 64, 64, 64, 96, 32, pool_type='avg')
self.inception3b = inception_block(256, 64, 64, 96, 64, 96, 64, pool_type='avg')
self.inception3c = inception_block(320, 0, 128, 160, 64, 96, 0, pool_type='max')
self.maxpool3 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

self.inception4a = inception_block(576, 224, 64, 96, 96, 128, 128, pool_type='avg')
self.inception4b = inception_block(576, 192, 96, 128, 96, 128, 128, pool_type='avg')
self.inception4c = inception_block(576, 160, 128, 160, 128, 160, 128, pool_type='avg')
self.inception4d = inception_block(608, 96, 128, 192, 160, 192, 128, pool_type='avg')
self.inception4e = inception_block(608, 0, 128, 192, 192, 256, 0, pool_type='max')
self.maxpool4 = nn.MaxPool2d(2, stride=2, ceil_mode=True)

self.inception5a = inception_block(1056, 352, 192, 320, 160, 224, 128, pool_type='avg')
self.inception5b = inception_block(1024, 352, 192, 320, 192, 224, 128, pool_type='max')

if aux_logits:
# 辅助分类器
# inception (4a) 输出 14x14x576
self.aux1 = inception_aux_block(576, num_classes)
# inception (4d) 输出 14x14x608
self.aux2 = inception_aux_block(608, num_classes)

self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.dropout = nn.Dropout(0.2)
self.fc = nn.Linear(1024, num_classes)

if init_weights:
self._initialize_weights()

def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
import scipy.stats as stats
X = stats.truncnorm(-2, 2, scale=0.01)
values = torch.as_tensor(X.rvs(m.weight.numel()), dtype=m.weight.dtype)
values = values.view(m.weight.size())
with torch.no_grad():
m.weight.copy_(values)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)

def _transform_input(self, x):
# type: (Tensor) -> Tensor
if self.transform_input:
x_ch0 = torch.unsqueeze(x[:, 0], 1) * (0.229 / 0.5) + (0.485 - 0.5) / 0.5
x_ch1 = torch.unsqueeze(x[:, 1], 1) * (0.224 / 0.5) + (0.456 - 0.5) / 0.5
x_ch2 = torch.unsqueeze(x[:, 2], 1) * (0.225 / 0.5) + (0.406 - 0.5) / 0.5
x = torch.cat((x_ch0, x_ch1, x_ch2), 1)
return x

def _forward(self, x):
# type: (Tensor) -> Tuple[Tensor, Optional[Tensor], Optional[Tensor]]
# N x 3 x 224 x 224
x = self.conv1(x)
# N x 64 x 112 x 112
x = self.maxpool1(x)
# N x 64 x 56 x 56
x = self.conv2(x)
# N x 64 x 56 x 56
x = self.conv3(x)
# N x 192 x 56 x 56
x = self.maxpool2(x)

# N x 192 x 28 x 28
x = self.inception3a(x)
# N x 256 x 28 x 28
x = self.inception3b(x)
# N x 320 x 28 x 28
x = self.inception3c(x)
# N x 576 x 28 x 28
x = self.maxpool3(x)
# N x 576 x 14 x 14
x = self.inception4a(x)
# N x 576 x 14 x 14
aux_defined = self.training and self.aux_logits
if aux_defined:
aux1 = self.aux1(x)
else:
aux1 = None

x = self.inception4b(x)
# N x 576 x 14 x 14
x = self.inception4c(x)
# N x 608 x 14 x 14
x = self.inception4d(x)
# N x 608 x 14 x 14
if aux_defined:
aux2 = self.aux2(x)
else:
aux2 = None

x = self.inception4e(x)
# N x 1056 x 14 x 14
x = self.maxpool4(x)
# N x 1024 x 7 x 7
x = self.inception5a(x)
# N x 1024 x 7 x 7
x = self.inception5b(x)
# N x 1024 x 7 x 7

x = self.avgpool(x)
# N x 1024 x 1 x 1
x = torch.flatten(x, 1)
# N x 1024
x = self.dropout(x)
x = self.fc(x)
# N x 1000 (num_classes)
return x, aux2, aux1

def forward(self, x):
x = self._transform_input(x)
x, aux1, aux2 = self._forward(x)
aux_defined = self.training and self.aux_logits
if aux_defined:
# 训练阶段返回3个分类器结果
return x, aux2, aux1
else:
# 测试阶段仅使用最后一个分类器
return x

测试

比较GoogLeNet_BNGoogLeNet.具体测试代码参考test_googlenet_bn.py

参数个数

1
2
3
[googlenet_bn] param num: 17683640
[googlenet] param num: 13370744
num_googlenet_bn / num_googlenet: 1.32

GoogLeNet1768万个参数,GoogLeNet1337万个,两者相差1.32

测试时间

1
2
3
[googlenet_bn] time: 0.0596
[googlenet] time: 0.0602
time_googlenet / time_googlenet_bn: 1.010

计算100次测试图像平均使用时间:

  1. GoogLeNet_BN:0.0596
  2. GoogLeNet:0.0602

两者的计算时间相近

训练

比对GoogLeNet_BNGoogLeNet训练,训练参数如下:

  1. 数据集:PASCAL VOC 07+1220类共40058个训练样本和12032个测试样本
  2. 批量大小:128
  3. 优化器:Adam,学习率为1e-3
  4. 随步长衰减:每隔8轮衰减4%,学习因子为0.96
  5. 迭代次数:100

训练100次结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
{'train': 40058, 'test': 12032}
Epoch 0/99
----------
train Loss: 4.2452 Acc: 0.2644
test Loss: 2.4459 Acc: 0.3763
Epoch 1/99
----------
...
...
----------
train Loss: 0.9129 Acc: 0.8467
test Loss: 0.9284 Acc: 0.7454
Epoch 98/99
----------
train Loss: 0.8963 Acc: 0.8524
test Loss: 0.9539 Acc: 0.7406
Epoch 99/99
----------
train Loss: 0.8869 Acc: 0.8526
test Loss: 0.9968 Acc: 0.7409
Training complete in 194m 38s
Best test Acc: 0.747839
train googlenet_bn done

Epoch 0/99
----------
train Loss: 4.2141 Acc: 0.2787
test Loss: 2.4076 Acc: 0.3763
Epoch 1/99
----------
train Loss: 3.9860 Acc: 0.3354
test Loss: 2.2959 Acc: 0.3969
Epoch 2/99
----------
...
...
----------
train Loss: 0.9720 Acc: 0.8304
test Loss: 0.9777 Acc: 0.7278
Epoch 98/99
----------
train Loss: 0.9744 Acc: 0.8279
test Loss: 0.9249 Acc: 0.7358
Epoch 99/99
----------
train Loss: 0.9632 Acc: 0.8336
test Loss: 0.9337 Acc: 0.7350
Training complete in 152m 5s
Best test Acc: 0.742852
train googlenet done

100轮迭代后,GoogLeNet_BN实现了74.78%的最好测试精度;GoogLeNet实现了74.23%的最好测试精度

相关阅读