GoogLeNet

参考: [Going deeper with convolutions]进一步深入卷积操作

学习了论文Going deeper with convolutions,尝试进一步推导其模型,并使用PyTorch实现该网络

Inception模块

Inception模块是GoogLeNet有别于之前网络架构的最重要的地方。从某种意义上来说,Inception模块并没有创造新的网络层,它只是在同一层中并行执行多个独立的卷积操作($1\times 1, 3\times 3, 5\times 5$),这些卷积操作后的输出数据体拥有相同大小的空间尺寸,可以基于深度维度进行连接,再输出到下一层

在$3\times 3$和$5\times 5$卷积核操作前以及max pooling操作后执行$1\times 1$大小卷积操作,通过控制滤波器个数来减少输出数据体深度,从而实现数据降维的目的,还能够通过激活函数提高卷积表达能力

Inception模块的实现具有如下几个优点:

  1. 允许在每个阶段显著增加单元的数量,而不会导致计算复杂性的失控膨胀
  2. 该操作与计算直觉一致,即视觉信息应该在不同的尺度上进行处理,然后进行汇总,以便下一阶段可以同时从不同的尺度提取特征

参数解析

论文中以表格方式给出了GoogLeNet的参数设置

不过里面并没有很详细的列出各层参数,比如padding,下面从头开始推导一遍。假定输入大小为$N\times C\times H\times W = 128\times 3\times 224\times 224$,其各层计算如下:

convolution

  • 输入数据体:$128\times 3\times 224\times 224$
  • 卷积核大小为$7\times 7$,步长为$2$,零填充为$3$
  • 滤波器个数:$64$
  • 输出数据体:$128\times 64\times 112\times 112$

max pool

  • 输入数据体:$128\times 64\times 112\times 112$
  • 卷积核大小为$3\times 3$,步长为$2$,零填充为$1$
  • 输出数据体:$128\times 64\times 56\times 56$

convolution

先执行$1\times 1$大小卷积操作

  • 输入数据体:$128\times 64\times 56\times 56$
  • 卷积核大小为$1\times 1$,步长为$1$,零填充为$0$
  • 滤波器个数:$64$
  • 输出数据体:$128\times 64\times 56\times 56$

再执行$3\times 3$大小卷积操作

  • 输入数据体:$128\times 64\times 56\times 56$
  • 卷积核大小为$3\times 3$,步长为$1$,零填充为$1$
  • 滤波器个数:$192$
  • 输出数据体:$128\times 192\times 56\times 56$

max pool

  • 输入数据体:$128\times 192\times 56\times 56$
  • 卷积核大小为$3\times 3$,步长为$2$,零填充为$1$
  • 输出数据体:$128\times 192\times 28\times 28$

inception (3a)

1x1

  • 输入数据体:$128\times 192\times 28\times 28$
  • 卷积核大小为$1\times 1$,步长为$1$,零填充为$0$
  • 滤波器个数:$64$
  • 输出数据体:$128\times 64\times 28\times 28$

3x3

先执行$1\times 1$大小卷积操作

  • 输入数据体:$128\times 192\times 28\times 28$
  • 卷积核大小为$1\times 1$,步长为$1$,零填充为$0$
  • 滤波器个数:$96$
  • 输出数据体:$128\times 96\times 28\times 28$

再执行$3\times 3$大小卷积操作

  • 输入数据体:$128\times 96\times 28\times 28$
  • 卷积核大小为$3\times 3$,步长为$1$,零填充为$1$
  • 滤波器个数:$128$
  • 输出数据体:$128\times 128\times 28\times 28$

5x5

先执行$1\times 1$大小卷积操作

  • 输入数据体:$128\times 192\times 28\times 28$
  • 卷积核大小为$1\times 1$,步长为$1$,零填充为$0$
  • 滤波器个数:$16$
  • 输出数据体:$128\times 16\times 28\times 28$

再执行$3\times 3$大小卷积操作

  • 输入数据体:$128\times 16\times 28\times 28$
  • 卷积核大小为$5\times 5$,步长为$1$,零填充为$2$
  • 滤波器个数:$32$
  • 输出数据体:$128\times 32\times 28\times 28$

max pooling

先执行$Max Pooling$操作

  • 输入数据体:$128\times 192\times 28\times 28$
  • 卷积核大小为$3\times 3$,步长为$1$,零填充为$1$
  • 输出数据体:$128\times 192\times 28\times 28$

再执行$1\times 1$大小卷积操作

  • 输入数据体:$128\times 192\times 28\times 28$
  • 卷积核大小为$1\times 1$,步长为$1$,零填充为$0$
  • 滤波器个数:$32$
  • 输出数据体:$128\times 32\times 28\times 28$

连接

上述4个子模块计算得到了相同的空间尺寸的输出书具体,然后按深度通道进行连接,最后得到$128\times 256\times 28\times 28$大小的输出数据体

后续操作

后续网络层的实现和上述操作类似

PyTorch

PyTorch 1.4提供了GoogleNet-v3模型实现 - vision/torchvision/models/googlenet.py ,参考其实现自定义GoogLeNet

Note:完整实现参考 zjZSTU/GoogLeNet

首先定义3个子类,分别用于实现卷积层、Inception模块以及辅助分类器

  • BasicConv2d
  • Inception
  • InceptionAux

Note:对于GoogLeNet中的MaxPooling层,其需要额外设置padding=1,在PyTorch GoogLeNet-v3实现中,通过设置ceil_mode=True得到预计的输出尺寸

BasicConv2d

用于封装卷积层操作,以便网络的进一步调整(比如,添加批量归一化层)

1
2
3
4
5
6
7
8
9
10
11
class BasicConv2d(nn.Module):

def __init__(self, in_channels, out_channels, **kwargs):
super(BasicConv2d, self).__init__()
self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)
# self.bn = nn.BatchNorm2d(out_channels, eps=0.001)

def forward(self, x):
x = self.conv(x)
# x = self.bn(x)
return F.relu(x, inplace=True)

Inception

对于每个Inception模块,需要输入

  1. $1\times 1$大小滤波器个数
  2. $3\times 3$大小滤波器个数以及作用于其之前的$1\times 1$大小滤波器个数
  3. $5\times 5$大小滤波器个数以及作用于其之前的$1\times 1$大小滤波器个数
  4. 作用于Max Pooling之后的$1\times 1$大小滤波器个数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
class Inception(nn.Module):
__constants__ = ['branch2', 'branch3', 'branch4']

def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj,
conv_block=None):
super(Inception, self).__init__()
if conv_block is None:
conv_block = BasicConv2d
self.branch1 = conv_block(in_channels, ch1x1, kernel_size=1, stride=1, padding=0)

self.branch2 = nn.Sequential(
conv_block(in_channels, ch3x3red, kernel_size=1, stride=1, padding=0),
conv_block(ch3x3red, ch3x3, kernel_size=3, stride=1, padding=1)
)

self.branch3 = nn.Sequential(
conv_block(in_channels, ch5x5red, kernel_size=1, stride=1, padding=0),
conv_block(ch5x5red, ch5x5, kernel_size=5, stride=1, padding=2)
)

self.branch4 = nn.Sequential(
nn.MaxPool2d(kernel_size=3, stride=1, padding=1, ceil_mode=True),
conv_block(in_channels, pool_proj, kernel_size=1, stride=1, padding=0)
)

def _forward(self, x):
branch1 = self.branch1(x)
branch2 = self.branch2(x)
branch3 = self.branch3(x)
branch4 = self.branch4(x)

outputs = [branch1, branch2, branch3, branch4]
return outputs

def forward(self, x):
outputs = self._forward(x)
return torch.cat(outputs, 1)

InceptionAux

辅助分类器在Inception (4a)得到的输入大小是$14\times 14\times 512$,在Inception (4d)得到的输入大小是$14\times 14\times 528$

  1. 首先使用全局平均池化操作(滤波器大小为$5\times 5$,步长为$3$),保证输出数据体的空间尺寸为$4\times 4$
  2. 使用128个$1\times 1$大小卷积滤波器,用于维度衰减和整流线性激活。此时输出数据体大小为$4\times 4\times 128$
  3. 使用1024个滤波器的全连接层
  4. 随机失活层:失活概率70%
  5. softmax分类器,用于分类1000
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class InceptionAux(nn.Module):

def __init__(self, in_channels, num_classes, conv_block=None):
super(InceptionAux, self).__init__()
if conv_block is None:
conv_block = BasicConv2d
self.conv = conv_block(in_channels, 128, kernel_size=1, stride=1, padding=0)

self.fc1 = nn.Linear(2048, 1024)
self.fc2 = nn.Linear(1024, num_classes)

def forward(self, x):
# aux1: N x 512 x 14 x 14, aux2: N x 528 x 14 x 14
x = F.adaptive_avg_pool2d(x, (4, 4))
# aux1: N x 512 x 4 x 4, aux2: N x 528 x 4 x 4
x = self.conv(x)
# N x 128 x 4 x 4
x = torch.flatten(x, 1)
# N x 2048
x = F.relu(self.fc1(x), inplace=True)
# N x 1024
x = F.dropout(x, 0.7, training=self.training)
# N x 1024
x = self.fc2(x)
# N x 1000 (num_classes)

return x

GoogLeNet

结合上述3个子类,实现GoogLeNet网络

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
class GoogLeNet(nn.Module):
__constants__ = ['aux_logits', 'transform_input']

def __init__(self, num_classes=1000, aux_logits=True, transform_input=False, init_weights=True,
blocks=None):
"""
GoogLeNet实现
:param num_classes: 输出类别数
:param aux_logits: 是否使用辅助分类器
:param transform_input:
:param init_weights:
:param blocks:
"""
super(GoogLeNet, self).__init__()
if blocks is None:
blocks = [BasicConv2d, Inception, InceptionAux]
assert len(blocks) == 3
conv_block = blocks[0]
inception_block = blocks[1]
inception_aux_block = blocks[2]

self.aux_logits = aux_logits
self.transform_input = transform_input

self.conv1 = conv_block(3, 64, kernel_size=7, stride=2, padding=3)
self.maxpool1 = nn.MaxPool2d(3, stride=2, padding=0, ceil_mode=True)
self.conv2 = conv_block(64, 64, kernel_size=1, stride=1, padding=0)
self.conv3 = conv_block(64, 192, kernel_size=3, stride=1, padding=1)
self.maxpool2 = nn.MaxPool2d(3, stride=2, padding=0, ceil_mode=True)

self.inception3a = inception_block(192, 64, 96, 128, 16, 32, 32)
self.inception3b = inception_block(256, 128, 128, 192, 32, 96, 64)
self.maxpool3 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

self.inception4a = inception_block(480, 192, 96, 208, 16, 48, 64)
self.inception4b = inception_block(512, 160, 112, 224, 24, 64, 64)
self.inception4c = inception_block(512, 128, 128, 256, 24, 64, 64)
self.inception4d = inception_block(512, 112, 144, 288, 32, 64, 64)
self.inception4e = inception_block(528, 256, 160, 320, 32, 128, 128)
self.maxpool4 = nn.MaxPool2d(2, stride=2, ceil_mode=True)

self.inception5a = inception_block(832, 256, 160, 320, 32, 128, 128)
self.inception5b = inception_block(832, 384, 192, 384, 48, 128, 128)

if aux_logits:
# 辅助分类器
self.aux1 = inception_aux_block(512, num_classes)
self.aux2 = inception_aux_block(528, num_classes)

self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.dropout = nn.Dropout(0.2)
self.fc = nn.Linear(1024, num_classes)

if init_weights:
self._initialize_weights()

def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
import scipy.stats as stats
X = stats.truncnorm(-2, 2, scale=0.01)
values = torch.as_tensor(X.rvs(m.weight.numel()), dtype=m.weight.dtype)
values = values.view(m.weight.size())
with torch.no_grad():
m.weight.copy_(values)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)

def _transform_input(self, x):
# type: (Tensor) -> Tensor
if self.transform_input:
x_ch0 = torch.unsqueeze(x[:, 0], 1) * (0.229 / 0.5) + (0.485 - 0.5) / 0.5
x_ch1 = torch.unsqueeze(x[:, 1], 1) * (0.224 / 0.5) + (0.456 - 0.5) / 0.5
x_ch2 = torch.unsqueeze(x[:, 2], 1) * (0.225 / 0.5) + (0.406 - 0.5) / 0.5
x = torch.cat((x_ch0, x_ch1, x_ch2), 1)
return x

def _forward(self, x):
# type: (Tensor) -> Tuple[Tensor, Optional[Tensor], Optional[Tensor]]
# N x 3 x 224 x 224
x = self.conv1(x)
# N x 64 x 112 x 112
x = self.maxpool1(x)
# N x 64 x 56 x 56
x = self.conv2(x)
# N x 64 x 56 x 56
x = self.conv3(x)
# N x 192 x 56 x 56
x = self.maxpool2(x)

# N x 192 x 28 x 28
x = self.inception3a(x)
# N x 256 x 28 x 28
x = self.inception3b(x)
# N x 480 x 28 x 28
x = self.maxpool3(x)
# N x 480 x 14 x 14
x = self.inception4a(x)
# N x 512 x 14 x 14
aux_defined = self.training and self.aux_logits
if aux_defined:
aux1 = self.aux1(x)
else:
aux1 = None

x = self.inception4b(x)
# N x 512 x 14 x 14
x = self.inception4c(x)
# N x 512 x 14 x 14
x = self.inception4d(x)
# N x 528 x 14 x 14
if aux_defined:
aux2 = self.aux2(x)
else:
aux2 = None

x = self.inception4e(x)
# N x 832 x 14 x 14
x = self.maxpool4(x)
# N x 832 x 7 x 7
x = self.inception5a(x)
# N x 832 x 7 x 7
x = self.inception5b(x)
# N x 1024 x 7 x 7

x = self.avgpool(x)
# N x 1024 x 1 x 1
x = torch.flatten(x, 1)
# N x 1024
x = self.dropout(x)
x = self.fc(x)
# N x 1000 (num_classes)
return x, aux2, aux1

def forward(self, x):
x = self._transform_input(x)
x, aux1, aux2 = self._forward(x)
aux_defined = self.training and self.aux_logits
if aux_defined:
# 训练阶段返回3个分类器结果
return x, aux2, aux1
else:
# 测试阶段仅使用最后一个分类器
return x

测试

比较GoogLeNetAlexNet.具体测试代码参考test_googlenet.py

参数个数

1
2
3
[alexnet] param num: 61100840
[googlenet] param num: 13370744
num_alexnet / num_googlenet: 4.57

AlexNet6千万个参数,GoogLeNet仅有1337万个,两者相差4.57

测试时间

1
2
3
[alexnet] time: 0.0193
[googlenet] time: 0.0592
time_googlenet / time_alexnet: 3.070

计算100次测试图像平均使用时间:

  1. AlexNet:0.0252
  2. GoogLeNet:0.0764

两者相差近3倍的计算时间

训练一

训练GoogLeNet模型,训练参数如下:

  1. 数据集:PASCAL VOC 07+1220类共40058个训练样本和12032个测试样本
  2. 批量大小:128
  3. 优化器:Adam,学习率为1e-3
  4. 随步长衰减:每隔8轮衰减4%,学习因子为0.96
  5. 迭代次数:100

训练100次结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{'train': 40058, 'test': 12032}
Epoch 0/99
----------
train Loss: 4.1952 Acc: 0.3144
test Loss: 2.3778 Acc: 0.3763
Epoch 1/99
----------
train Loss: 3.8275 Acc: 0.3430
test Loss: 2.4141 Acc: 0.4031
...
...
----------
train Loss: 0.9364 Acc: 0.8376
test Loss: 0.9345 Acc: 0.7376
Epoch 98/99
----------
train Loss: 0.9406 Acc: 0.8383
test Loss: 0.9211 Acc: 0.7417
Epoch 99/99
----------
train Loss: 0.9241 Acc: 0.8415
test Loss: 0.9402 Acc: 0.7420
Training complete in 152m 46s
Best val Acc: 0.742603
train googlenet done

训练二

比对GoogLeNetAlexNet训练,训练参数如下:

  1. 数据集:PASCAL VOC 07+1220类共40058个训练样本和12032个测试样本
  2. 批量大小:128
  3. 优化器:Adam,学习率为1e-3
  4. 随步长衰减:每隔4轮衰减10%,学习因子为0.9
  5. 迭代次数:100

训练100次结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
{'train': 40058, 'test': 12032}
Epoch 0/99
----------
train Loss: 4.2119 Acc: 0.2785
test Loss: 2.4186 Acc: 0.3763
Epoch 1/99
----------
train Loss: 3.8248 Acc: 0.3500
test Loss: 2.1310 Acc: 0.4271
Epoch 2/99
----------
...
...
----------
train Loss: 0.8651 Acc: 0.8511
test Loss: 0.9694 Acc: 0.7302
Epoch 98/99
----------
train Loss: 0.8523 Acc: 0.8514
test Loss: 0.9841 Acc: 0.7325
Epoch 99/99
----------
train Loss: 0.8524 Acc: 0.8520
test Loss: 0.9624 Acc: 0.7271
Training complete in 152m 21s
Best val Acc: 0.742354
train googlenet done

Epoch 0/99
----------
train Loss: 2.5005 Acc: 0.3253
test Loss: 2.1131 Acc: 0.4341
Epoch 1/99
----------
train Loss: 2.1525 Acc: 0.3861
test Loss: 1.8203 Acc: 0.4796
Epoch 2/99
----------
...
...
----------
train Loss: 0.8249 Acc: 0.7347
test Loss: 0.9704 Acc: 0.7027
Epoch 98/99
----------
train Loss: 0.8132 Acc: 0.7400
test Loss: 0.9865 Acc: 0.6968
Epoch 99/99
----------
train Loss: 0.8181 Acc: 0.7370
test Loss: 0.9812 Acc: 0.7037
Training complete in 62m 33s
Best val Acc: 0.707613
train alexnet done

100轮迭代后,GoogLeNet实现了74.23%的最好测试精度,AlexNet实现了70.37%的最好测试精度

Appendix

坚持原创技术分享,您的支持将鼓励我继续创作!