感受野的计算

感受野的计算

1 感受野的概念

在卷积神经网络中,感受野的定义是 卷积神经网络每一层输出的特征图(feature map)上的像素点在原始图像上映射的区域大小。

img

RCNN论文中有一段描述,Alexnet网络pool5输出的特征图上的像素在输入图像上有很大的感受野(have very large receptive fields ($$195 × 195 pixels$$))和步长(strides$$ (32×32 pixels) $$), 这两个变量的数值是如何得出的呢?

2 感受野大小的计算

感受野计算时有下面的几个情况需要说明:

(1)第一层卷积层的输出特征图像素的感受野的大小等于滤波器的大小

(2)深层卷积层的感受野大小和它之前所有层的滤波器大小和步长有关系

(3)计算感受野大小时,忽略了图像边缘的影响,即不考虑$padding$的大小,$stride$只影响下一层featuremap的感受野,$fsize$影响的是该层的感受野

这里的每一个卷积层还有一个$strides$的概念,这个$strides$是之前所有层$stride$的乘积。

即$$strides(i) = stride(1) stride(2) …* stride(i-1)$$

关于感受野大小的计算采用top to down的方式, 即先计算最深层在前一层上的感受野,然后逐渐传递到第一层,使用的公式可以表示如下:

$$RF = 1$$ #待计算的feature map上的感受野大小  

for layer in (top layer To down layer):  

$$RF = ((RF -1)* stride) + fsize$$

$stride$ 表示卷积的步长; $fsize$表示卷积层滤波器的大小

用python3实现了计算Alexnet zf-5和VGG16网络每层输出feature map的感受野大小,实现代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# -*- coding:utf-8 -*-


net_struct = {'alexnet': {'net': [[11,4,0], [3,2,0], [5,1,2], [3,2,0],[3,1,1], [3,1,1], [3,1,1], [3,2,0]],
'name': ['conv1','pool1','conv2','pool2','conv3','conv4','conv5','pool5']},
'vgg16': {'net': [[3,1,1],[3,1,1],[2,2,0],[3,1,1],[3,1,1],[2,2,0],[3,1,1],[3,1,1],[3,1,1],
[2,2,0],[3,1,1],[3,1,1],[3,1,1],[2,2,0],[3,1,1],[3,1,1],[3,1,1],[2,2,0]],
'name': ['conv1_1','conv1_2','pool1','conv2_1','conv2_2','pool2','conv3_1','conv3_2',
'conv3_3', 'pool3','conv4_1','conv4_2','conv4_3','pool4','conv5_1','conv5_2','conv5_3','pool5']},
'zf-5': {'net': [[7,2,3],[3,2,1],[5,2,2],[3,2,1],[3,1,1],[3,1,1],[3,1,1]],
'name': ['conv1','pool1','conv2','pool2','conv3','conv4','conv5']}}


imsize = 224




def outFromIn(isz, net, layernum):
totstride = 1
insize = isz
for layer in range(layernum):
fsize, stride, pad = net[layer]
outsize = (insize - fsize + 2*pad) / stride + 1
insize = outsize
totstride = totstride * stride
return outsize, totstride




def inFromOut(net, layernum):
RF = 1
for layer in reversed(range(layernum)):
# print(net[layer])
fsize, stride, pad = net[layer]
RF = ((RF - 1) * stride) + fsize
return RF


if __name__ == '__main__':
print("layer output sizes given image = %dx%d" % (imsize, imsize))


for net in net_struct.keys():
print('************net structrue name is %s**************'% net)
for i in range(len(net_struct[net]['net'])):
p = outFromIn(imsize, net_struct[net]['net'], i+1)
rf = inFromOut(net_struct[net]['net'], i+1)
print("Layer Name = %s, Output size = %3d, Strides = % 3d, RF size = %3d" % (net_struct[net]['name'][i], p[0], p[1], rf))

输出结果如下:

layer output sizes given image = 224x224

**net structrue name is vgg16**

Layer Name = conv1_1, Output size = 224, Strides = 1, RF size = 3

Layer Name = conv1_2, Output size = 224, Strides = 1, RF size = 5

Layer Name = pool1, Output size = 112, Strides = 2, RF size = 6

Layer Name = conv2_1, Output size = 112, Strides = 2, RF size = 10

Layer Name = conv2_2, Output size = 112, Strides = 2, RF size = 14

Layer Name = pool2, Output size = 56, Strides = 4, RF size = 16

Layer Name = conv3_1, Output size = 56, Strides = 4, RF size = 24

Layer Name = conv3_2, Output size = 56, Strides = 4, RF size = 32

Layer Name = conv3_3, Output size = 56, Strides = 4, RF size = 40

Layer Name = pool3, Output size = 28, Strides = 8, RF size = 44

Layer Name = conv4_1, Output size = 28, Strides = 8, RF size = 60

Layer Name = conv4_2, Output size = 28, Strides = 8, RF size = 76

Layer Name = conv4_3, Output size = 28, Strides = 8, RF size = 92

Layer Name = pool4, Output size = 14, Strides = 16, RF size = 100

Layer Name = conv5_1, Output size = 14, Strides = 16, RF size = 132

Layer Name = conv5_2, Output size = 14, Strides = 16, RF size = 164

Layer Name = conv5_3, Output size = 14, Strides = 16, RF size = 196

Layer Name = pool5, Output size = 7, Strides = 32, RF size = 212

**net structrue name is zf-5**

Layer Name = conv1, Output size = 112, Strides = 2, RF size = 7

Layer Name = pool1, Output size = 56, Strides = 4, RF size = 11

Layer Name = conv2, Output size = 28, Strides = 8, RF size = 27

Layer Name = pool2, Output size = 14, Strides = 16, RF size = 43

Layer Name = conv3, Output size = 14, Strides = 16, RF size = 75

Layer Name = conv4, Output size = 14, Strides = 16, RF size = 107

Layer Name = conv5, Output size = 14, Strides = 16, RF size = 139

**net structrue name is alexnet**

Layer Name = conv1, Output size = 54, Strides = 4, RF size = 11

Layer Name = pool1, Output size = 26, Strides = 8, RF size = 19

Layer Name = conv2, Output size = 26, Strides = 8, RF size = 51

Layer Name = pool2, Output size = 12, Strides = 16, RF size = 67

Layer Name = conv3, Output size = 12, Strides = 16, RF size = 99

Layer Name = conv4, Output size = 12, Strides = 16, RF size = 131

Layer Name = conv5, Output size = 12, Strides = 16, RF size = 163

Layer Name = pool5, Output size = 5, Strides = 32, RF size = 195

Process finished with exit code 0

做个例子说明,对于alexnet的pool1层,计算为$(1-1)2+3=3$,$ (3-1)4+11=19$

对alexnet的pool2层,计算为$(1-1)2+3=3$,$ (3-1)1+5=7$,$(7-1)2+3=15$,$(15-1)4+11=67$

参考:http://www.cnblogs.com/objectDetect/p/5947169.html