本文复现一下google提出的GoogLeNet网络，注意不是GoogleNet，为什么呢？据说是google为了致敬LeNet~大佬们的世界就是这么有趣。

摘要

We propose a deep convolutional neural network ar- chitecture codenamed Inception that achieves the new state of the art for classification and detection in the Im- ageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the compu- tational budget constant. To optimize quality, the architec- tural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular in- carnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

googLeNet是google一群人在2015年CVPR上发表的文章，摘要如上所示。大概的意思就是说本文提出了一种深度卷积网络结构（Inception）并且取得了SOTA的效果，这种网络通过精心设计，在计算量不变的情况下，提升了网络的深度和广度。

网络架构

Inception网络的主要特点是“深”，这个深有两层含义：

该网络架构使得对卷积深网络的研究迈入一个新的阶段（个人理解）；
网络层数更多，也就是更深。

googLeNet的架构图太长了这里就不放了（可以看参考链接【1】），但是其本质上是基于Inception模块搭建出来的一个网络，所以复现一下核心的Inception模块。

Inception模块的具体架构如下图所示：

就像是沐神说的，其核心思想就是我全都要，一个模块的输入到输出中间有四个路径，然后输出就是把这四个路径的输出在channel维度进行拼接，为什么这么做？暂时不知道。这更说明了深度学习是玄学啊。。。

代码

有了上面Inception的模块图，照着写就完事儿，需要注意的是：

四条路径输出的尺寸大小是一致的；
在输出的地方沿着channel维度将四个path的输出进行拼接；

注意以上两点之后写其代码如下：

class Inception(nn.Module):
    def __init__(self,in_channels, c1, c2, c3, c4):
        super(Inception, self).__init__()

        self.p1 = nn.Sequential(
            nn.Conv2d(in_channels=in_channels ,out_channels=c1, kernel_size=(1, 1), strides=(1, 1), padding=(0, 0)),
            nn.ReLU()
        )
        self.p2 = nn.Sequential(
            nn.Conv2d(in_channels=in_channels ,out_channels=c2[0], kernel_size=(1, 1), strides=(1, 1), padding=(0, 0)),
            nn.ReLU(),
            nn.Conv2d(in_channels=c2[0] ,out_channels=c2[1], kernel_size=(3, 3), strides=(1, 1), padding=(1, 1)),
            nn.ReLU(),
        )
        self.p3 = nn.Sequential(
            nn.Conv2d(in_channels=in_channels ,out_channels=c3[0], kernel_size=(1, 1), strides=(1, 1), padding=(0, 0)),
            nn.ReLU(),
            nn.Conv2d(in_channels=c3[0],out_channels=c3[1], kernel_size=(3, 3), strides=(1, 1), padding=(1, 1)),
            nn.ReLU(),
        )
        self.p4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=(3,3), stride=(1,1), padding=(1,1)),
            nn.Conv2d(in_channels=in_channels, out_channels=c4, kernel_size=(1, 1), strides=(1, 1), padding=(0, 0)),
            nn.ReLU()
        )

    def forward(self, x):
        y1 = self.p1(x)
        y2 = self.p2(x)
        y3 = self.p3(x)
        y4 = self.p4(x)

        y = torch.cat([y1, y2, y3, y4], dim=1)

        return y

关于代码还有几点需要注意：

在网络中不同的Inception块的输入通道数是不一致的，所以其输入通道数也是一个参数；
四条路径中，因为第2条路径和第3条路径都是有两个可学习层，所以其参数应该为一个list，包括两个数，分别是两个卷积层的输出通道数；
最大值池化层没有可学习参数，所以后面无需激活层；

总结

上面复现了一下Inception模块，但是只是其中一种类型，实际上有好几个版本（具体百度下，或者看对应的文章），个人感觉也可以根据自己的研究对其进行魔改（就是瞎改）。。哪种效果好就用哪种。

个人感觉Inception这个模块发展来的网络就是来证明深度学习是玄学的。。真就是“试验”科学，硬件条件不足的话也很难搞。

不过Inception的模块思想值的我们学习，其本质上感觉是地学中的多尺度融合思想，nice~

参考

【1】Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1-9
【2】https://www.bilibili.com/video/BV1b5411g7Xo?p=3&vd_source=878f5382c5ebdd5fc558a620040e965f