## 总体架构

• $$Input = 299\times 299\times 3$$
• $$Stem = 35\times 35\times 384$$
• $$4\times Inception-A = 35\times 35\times 384$$
• $$Reduction-A = 17\times 17\times 1024$$
• $$7\times Inception-B = 17\times 17\times 1024$$
• $$Reduction-B = 8\times 8\times 1536$$
• $$3\times Inception-C = 8\times 8\times 1536$$
• $$Average Pooling = 1\times 1\times 1536$$
• $$Dropout(keep 0.8) = 1536$$
• $$Softmax = 1000$$

All the convolutions not marked with “V” in the figures are same-padded meaning that their output grid matches the size of their input. Convolutions marked with “V” are valid padded, meaning that input patch of each unit is fully contained in the previous layer and the grid size of the output activation map is reduced accordingly

## Stem

Stem模块实现了网络早期运算，推导如下：

• $$Input = 299\times 299\times 3$$
• $$Conv$$
• $$3\times 3, S=2, N=32$$
• $$Output = 149\times 149\times 32$$
• $$Conv$$
• $$3\times 3, N=32$$
• $$Output = 147\times 147\times 32$$
• $$Conv$$
• $$3\times 3, N=64, P=1$$
• $$Output = 147\times 147\times 64$$
• $$Concat$$
• $$Max Pool$$
• $$S=2$$
• $$Output = 73\times 73\times 64$$
• $$Conv$$
• $$3\times 3, S=2, N=96$$
• $$Output=73\times 73\times 96$$
• $$Cat$$
• $$Output = 73\times 73\times 160$$
• $$Concat$$
• $$One$$
• $$Conv$$
• $$1\times 1, N=64$$
• $$Output = 73\times 73\times 64$$
• $$Conv$$
• $$3\times 3, N=96$$
• $$Output = 71\times 71\times 96$$
• $$Two$$
• $$Conv$$
• $$1\times 1, N=64$$
• $$Output = 73\times 73\times 64$$
• $$Conv$$
• $$7\times 1, N=64, P=(3, 0)$$
• $$Output = 73\times 73\times 64$$
• $$Conv$$
• $$1\times 7, N=64, P=(0, 3)$$
• $$Output = 73\times 73\times 64$$
• $$Conv$$
• $$3\times 3, N=96$$
• $$Output = 71\times 71\times 96$$
• $$Cat$$
• $$Output = 71\times 71\times 192$$
• $$Concat$$
• $$Conv$$
• $$3\times 3, S=2, N=192$$
• $$Output = 35\times 35\times 192$$
• $$Max Pool$$
• $$S=2$$
• $$Output = 35\times 35\times 192$$
• $$Cat$$
• $$Output = 35\times 35\times 384$$

## Inception-A

• $$Input = 35\times 35\times 384$$
• $$1\times 1$$
• $$Conv$$
• $$1\times 1, N=96$$
• $$Output = 35\times 35\times 96$$
• $$3\times 3$$
• $$Conv$$
• $$1\times 1, N=64$$
• $$Output = 35\times 35\times 64$$
• $$Conv$$
• $$3\times 3, N=96, P=1$$
• $$Output = 35\times 35\times 96$$
• $$double 3\times 3$$
• $$Conv$$
• $$1\times 1, N=64$$
• $$Output = 35\times 35\times 64$$
• $$Conv$$
• $$3\times 3, N=96, P=1$$
• $$Output = 35\times 35\times 96$$
• $$Conv$$
• $$3\times 3, N=96, P=1$$
• $$Output = 35\times 35\times 96$$
• $$Pool$$
• $$Avg Pool$$
• $$3\times 3, P=1$$
• $$Output = 35\times 35\times 384$$
• $$Conv$$
• $$1\times 1, N=96$$
• $$Output = 35\times 35\times 96$$
• $$Concat$$
• $$Output = 35\times 35\times 384$$

## Reduction-A

• $$Input = 35\times 35\times 384$$
• $$3\times 3$$
• $$Conv$$
• $$3\times 3, S=2, N=384$$
• $$Output = 17\times 17\times 384$$
• $$double 3\times 3$$
• $$Conv$$
• $$1\times 1, N=192$$
• $$Output = 35\times 35\times 192$$
• $$Conv$$
• $$3\times 3, N=224, P=1$$
• $$Output = 35\times 35\times 224$$
• $$Conv$$
• $$3\times 3, S=2, N=256$$
• $$17\times 17\times 256$$
• $$Pool$$
• $$MaxPool$$
• $$S=2$$
• $$Output = 17\times 17\times 384$$
• $$Concat$$
• $$Output = 17\times 17\times 1024$$

## Inception-B

typepatch size/strideinput sizeoutput sizedepth#1x1#1x7#7x1#1x7#7x1
conv17x17x102417x17x3841384
conv17x17x102417x17x2563192224256
conv17x17x102417x17x2565192192224224256
avg pooling3x3/117x17x102417x17x1282128

## Reduction-B

typepatch size/strideinput sizeoutput sizedepth#1x1#1x7#7x1#3x3
convstride=217x17x10248x8x1922192192
convstride=217x17x10248x8x3204256256320320
max pooling3x3/217x17x10248x8x10241

## Inception-C

• $$Input = 8\times 8\times 1536$$
• $$Output = 8\times 8\times 1536$$