1. 卷积
卷积运算是用一个卷积核(小矩阵)与输入矩阵的子区域做点积,得到一个标量;然后在子区域上进行位移(依照stride参数),计算得到一个新的矩阵,称作feature map。
一个动图(同样源自网络)做更完整说明:
在卷积运算y=conv(x,w)+b
中,x通常是4维Tensor:NCHW
(cuDNN中的默认格式,下同)。其中N
是batch size;C
是input channel,或者叫input depth;H*W
则是每张输入image的大小。在上面的动图中,N=1,C=3,H=W=5
。
相应地,filter也是4维Tensor:OIHW
。其中O
是Filter的数量;I
是input depth,与x的C
对应;H*W
则是每个Filter矩阵的大小。在上面的动图中,O=2,I=3,H=W=3
。batch size不对filter的维度产生任何影响,它反映的是计算过程中的“并行度”。在并行之间,用的是仅有的同一份filter数据。
值得注意的是,在常见网络中,H=W
,且以3,5,7三个值最常见,其中又以3×3的filter最常见。所以FIlter的H*W
经常会表示为K*K
,这是业界惯例。
2. 简单的脚手架
为了方便程序联调,我实现了一个简单的脚手架,用以导出或导入float类型数组。基本思路非常简单:1. 以二进制方式存储float数组到文件中;2. 从保存好的二进制文件中读取内容并初始化float数组。
以二进制格式保存,避免以文本格式保存带来的精度损失以及蹩脚的通用性。
两个核心函数定义在arrstream.c
中:
#include "arrstream.h"
/*** Write float array ARRAY of size SIZE into file FILENAME in binary format.
The file open and close operations are handled inside the function body.
*/
void rtarray2file_float(const float * array, int size, const char* filename){
FILE * fp = fopen(filename, "wb");
if(fp == NULL){
fprintf(stderr, "Error: file %s cannot be opened\n", filename);
exit(0);
}
int bytes = fwrite((const char*) array, 1, sizeof(float)*size, fp);
fprintf(stderr, "%d bytes written into file %s\n", bytes, filename);
fclose(fp);
}
/*** Read float array ARRAY of size SIZE from file FILENAME in binary format.
The ARRAY must be malloced before calling this function.
*/
void rdarray4file_float(float * array, int size, const char* filename){
FILE * fp = fopen(filename, "rb+");
if(fp == NULL){
fprintf(stderr, "Error: file %s cannot be opened\n", filename);
exit(0);
}
int bytes = fread((char*) array, 1, sizeof(float)*size, fp);
fprintf(stderr, "Read %d bytes from file %s\n", bytes, filename);
fclose(fp);
}
而对比两个二进制文件中保存的float数组的主要代码为:
rdarray4file_float(rdarray, sizearray, fname);
rdarray4file_float(rdarray2, sizearray, fname2);
static int num_err = 0;
float max_diff = 0.0;
float diff;
for (i = 0; i < sizearray; i++) {
diff = fabs(rdarray[i] - rdarray2[i]);
if(diff > 0.000001f){
if(max_diff < diff){
max_diff = diff;
}
num_err ++;
}
}
printf("Total diffs: %d, max diff: %f\n", num_err, max_diff);
其中直接以1e-6f作为两个float值是否相等的判断标准,算是很粗糙的实现了。
3. 卷积操作实现
用Python+SciPy可以很方便地实现卷积运算。其中用到卷积运算convolve2d
,官方文档参考:convolve2d官方文档
需要注意,在卷积时要先对每次filter做值的顺序翻转(flipping),这是因为,卷积网络中的卷积与数学中的卷积定义是不同的(数学中X卷积Y是第i个X元素与第n-i个Y元素计算)。
只考虑一种简单情况:padding=”SAME”(有关padding更多信息可参看1.4卷积神经网络笔记—Padding 。所以调用卷积接口为: conv_y = signal.convolve2d(conv_in, conv_filter, mode="same")
。
逻辑简单,所以直接贴上完整的Python代码。其中二进制文件都是从TensorFlow中用上面的脚手架导出的。可以看到python中有现成的从二进制文件初始化数组的接口array.fromfile()
。
#/usr/bin/env python
import array
import numpy as np
import math
from scipy import signal
batfilename_x="sw_bats/Ni=512_Ri=14_Ci=14_No=512_K=3_Ro=14_Co=14_B=4_pad=1_iter=10_NCHW_in.bat"
batfilename_w="sw_bats/Ni=512_Ri=14_Ci=14_No=512_K=3_Ro=14_Co=14_B=4_pad=1_iter=10_NCHW_weight.bat"
batfilename_y_tf="sw_bats/Ni=512_Ri=14_Ci=14_No=512_K=3_B=4_pad=1_size=401408_outiter=10_NCHW_out.bat"
N = 4
C = 512
H = 14
W = 14
Co = 512
K = 3
length_x=N*C*H*W
length_w=Co*C*K*K
x = array.array('f')
w = array.array('f')
y_tf = array.array('f')
with open(batfilename_x, "rb") as f:
x.fromfile(f, length_x)
with open(batfilename_w, "rb") as f:
w.fromfile(f, length_w)
with open(batfilename_y_tf, "rb") as f:
y_tf.fromfile(f, length_x)
whole_max_diff=0
for n in range(0, N): # Batch as the most outer loop
for co in range(0, Co): # Output channels as the second outer loop
conv_y_final = np.zeros((H, W))
print "n:", n, "co:",co
for c in range(0, C): # For every input channel, calculate a H*W output
conv_in = np.array(x[(n*C*H*W+c*H*W):(n*C*H*W+c*H*W+H*W)])
conv_in = conv_in.reshape((H, W))
conv_filter = np.array(w[co*C*K*K+c*K*K:co*C*K*K+c*K*K+K*K])
conv_filter_ori = conv_filter.reshape((K, K))
# Reverse filter since the definition of convolution in CNN
# is differnt from that in math:
conv_filter = conv_filter[::-1]
conv_filter = conv_filter.reshape((K, K))
conv_y = signal.convolve2d(conv_in, conv_filter, mode="same",boundary='fill', fillvalue=0)
conv_y_final += conv_y
conv_y_final1d = conv_y_final.reshape((H*W))
tf_y = np.array(y_tf[(n*C*H*W+co*H*W):(n*C*H*W+co*H*W+H*W)])
diff_nums = 0
max_diff=0.0
for i in range(0, H*W):
diff = math.fabs(tf_y[i] - conv_y_final1d[i])
max_diff = max(max_diff, diff)
if diff > 0.000001:
diff_nums += 1
#print "i:", i, "tf_y:", tf_y[i], "conv_y:", conv_y[i]
print " Total diffs:", diff_nums, "max diff:", max_diff
whole_max_diff = max(whole_max_diff, max_diff)
print "whole max:", whole_max_diff
输出结果(看得出用1e-6作比较标准并不合理):