2D卷积:原理、脚手架、实现

1. 卷积

卷积运算是用一个卷积核(小矩阵)与输入矩阵的子区域做点积,得到一个标量;然后在子区域上进行位移(依照stride参数),计算得到一个新的矩阵,称作feature map。

图源自网络:

一个动图(同样源自网络)做更完整说明:

在卷积运算y=conv(x,w)+b中,x通常是4维Tensor:NCHW(cuDNN中的默认格式,下同)。其中N是batch size;C是input channel,或者叫input depth;H*W则是每张输入image的大小。在上面的动图中,N=1,C=3,H=W=5

相应地,filter也是4维Tensor:OIHW。其中O是Filter的数量;I是input depth,与x的C对应;H*W则是每个Filter矩阵的大小。在上面的动图中,O=2,I=3,H=W=3。batch size不对filter的维度产生任何影响,它反映的是计算过程中的“并行度”。在并行之间,用的是仅有的同一份filter数据。

值得注意的是,在常见网络中,H=W,且以3,5,7三个值最常见,其中又以3×3的filter最常见。所以FIlter的H*W经常会表示为K*K,这是业界惯例。

2. 简单的脚手架

为了方便程序联调,我实现了一个简单的脚手架,用以导出或导入float类型数组。基本思路非常简单:1. 以二进制方式存储float数组到文件中;2. 从保存好的二进制文件中读取内容并初始化float数组。

以二进制格式保存,避免以文本格式保存带来的精度损失以及蹩脚的通用性。

两个核心函数定义在arrstream.c中:

#include "arrstream.h"

/*** Write float array ARRAY of size SIZE into file FILENAME in binary format.
     The file open and close operations are handled inside the function body.
*/
void rtarray2file_float(const float * array, int size, const char* filename){
  FILE * fp = fopen(filename, "wb");
  if(fp == NULL){
    fprintf(stderr, "Error: file %s cannot be opened\n", filename);
    exit(0);
  }

  int bytes = fwrite((const char*) array, 1, sizeof(float)*size, fp);

  fprintf(stderr, "%d bytes written into file %s\n", bytes, filename);

  fclose(fp);
}

/*** Read float array ARRAY of size SIZE from file FILENAME in binary format.
     The ARRAY must be malloced before calling this function.
*/
void rdarray4file_float(float * array, int size, const char* filename){
  FILE * fp = fopen(filename, "rb+");
  if(fp == NULL){
    fprintf(stderr, "Error: file %s cannot be opened\n", filename);
    exit(0);
  }

  int bytes = fread((char*) array, 1, sizeof(float)*size, fp);

  fprintf(stderr, "Read %d bytes from file %s\n", bytes, filename);

  fclose(fp);
}

而对比两个二进制文件中保存的float数组的主要代码为:

  rdarray4file_float(rdarray, sizearray, fname);
  rdarray4file_float(rdarray2, sizearray, fname2);

  static int num_err = 0;
  float max_diff = 0.0;
  float diff;
  for (i = 0; i < sizearray; i++) {
    diff = fabs(rdarray[i] - rdarray2[i]);
    if(diff > 0.000001f){
      if(max_diff < diff){
    max_diff = diff;
      }
      num_err ++;
    }
  }
  printf("Total diffs: %d, max diff: %f\n", num_err, max_diff);

其中直接以1e-6f作为两个float值是否相等的判断标准,算是很粗糙的实现了。

3. 卷积操作实现

用Python+SciPy可以很方便地实现卷积运算。其中用到卷积运算convolve2d,官方文档参考:convolve2d官方文档

需要注意,在卷积时要先对每次filter做值的顺序翻转(flipping),这是因为,卷积网络中的卷积与数学中的卷积定义是不同的(数学中X卷积Y是第i个X元素与第n-i个Y元素计算)。

只考虑一种简单情况:padding=”SAME”(有关padding更多信息可参看1.4卷积神经网络笔记—Padding 。所以调用卷积接口为: conv_y = signal.convolve2d(conv_in, conv_filter, mode="same")

逻辑简单,所以直接贴上完整的Python代码。其中二进制文件都是从TensorFlow中用上面的脚手架导出的。可以看到python中有现成的从二进制文件初始化数组的接口array.fromfile()

#/usr/bin/env python

import array
import numpy as np
import math
from scipy import signal

batfilename_x="sw_bats/Ni=512_Ri=14_Ci=14_No=512_K=3_Ro=14_Co=14_B=4_pad=1_iter=10_NCHW_in.bat"
batfilename_w="sw_bats/Ni=512_Ri=14_Ci=14_No=512_K=3_Ro=14_Co=14_B=4_pad=1_iter=10_NCHW_weight.bat"
batfilename_y_tf="sw_bats/Ni=512_Ri=14_Ci=14_No=512_K=3_B=4_pad=1_size=401408_outiter=10_NCHW_out.bat"

N = 4
C = 512
H = 14
W = 14
Co = 512
K = 3
length_x=N*C*H*W
length_w=Co*C*K*K

x = array.array('f')
w = array.array('f')
y_tf = array.array('f')
with open(batfilename_x, "rb") as f:
    x.fromfile(f, length_x)

with open(batfilename_w, "rb") as f:
    w.fromfile(f, length_w)

with open(batfilename_y_tf, "rb") as f:
    y_tf.fromfile(f, length_x)

whole_max_diff=0
for n in range(0, N): # Batch as the most outer loop
    for co in range(0, Co): # Output channels as the second outer loop
        conv_y_final = np.zeros((H, W))
        print "n:", n, "co:",co
        for c in range(0, C): # For every input channel, calculate a H*W output
            conv_in = np.array(x[(n*C*H*W+c*H*W):(n*C*H*W+c*H*W+H*W)])
            conv_in = conv_in.reshape((H, W))
            conv_filter = np.array(w[co*C*K*K+c*K*K:co*C*K*K+c*K*K+K*K])
            conv_filter_ori = conv_filter.reshape((K, K))

            # Reverse filter since the definition of convolution in CNN
            # is differnt from that in math:
            conv_filter =  conv_filter[::-1]
            conv_filter = conv_filter.reshape((K, K))

            conv_y = signal.convolve2d(conv_in, conv_filter, mode="same",boundary='fill', fillvalue=0)
            conv_y_final += conv_y

        conv_y_final1d = conv_y_final.reshape((H*W))
        tf_y = np.array(y_tf[(n*C*H*W+co*H*W):(n*C*H*W+co*H*W+H*W)])
        diff_nums = 0
        max_diff=0.0
        for i in range(0, H*W):
            diff = math.fabs(tf_y[i] - conv_y_final1d[i])
            max_diff = max(max_diff, diff)
            if diff > 0.000001:
                diff_nums += 1
                #print "i:", i, "tf_y:", tf_y[i], "conv_y:", conv_y[i]
        print "  Total diffs:", diff_nums, "max diff:", max_diff
        whole_max_diff = max(whole_max_diff, max_diff)
print "whole max:", whole_max_diff

输出结果(看得出用1e-6作比较标准并不合理):
输出结果

4. 参考

SciPy.signal.convolve2d

Padding

Filter翻转问题

发表评论

电子邮件地址不会被公开。 必填项已用*标注