CAFFE源码学习笔记之内积层-inner_product_layer_caffe innerproduct-程序员宅基地

一、前言
内积层实际就是全连接。经过之前的卷积层、池化层和非线性变换层，样本已经被映射到隐藏层的特征空间之中，而全连接层就是将学习到的特征又映射到样本分类空间。虽然已经出现了全局池化可以替代全连接，但是仍然不能说全连接就不能用了。
二、源码分析
1、成员变量

全连接的输入时一个M*K的矩阵，权重是K*N的矩阵，所以输出是一个M*N的矩阵

  int M_;//num_input
  int K_;//C*H*W
  int N_;//N_ = num_output
  bool bias_term_;
  Blob<Dtype> bias_multiplier_;
  bool transpose_;  ///< if true, assume transposed weights```

2、layersetup
权重是K*N的矩阵：（C*H*W）*num_output

template <typename Dtype>
void InnerProductLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  const int num_output = this->layer_param_.inner_product_param().num_output();//输出
  bias_term_ = this->layer_param_.inner_product_param().bias_term();
  transpose_ = this->layer_param_.inner_product_param().transpose();
  N_ = num_output;
  const int axis = bottom[0]->CanonicalAxisIndex(
      this->layer_param_.inner_product_param().axis());
  // Dimensions starting from "axis" are "flattened" into a single
  // length K_ vector. For example, if bottom[0]'s shape is (N, C, H, W),
  // and axis == 1, N inner products with dimension CHW are performed.
  K_ = bottom[0]->count(axis);
  // Check if we need to set up the weights
  if (this->blobs_.size() > 0) {
    LOG(INFO) << "Skipping parameter initialization";
  } else {
    if (bias_term_) {
      this->blobs_.resize(2);
    } else {
      this->blobs_.resize(1);
    }
    // Initialize the weights
    vector<int> weight_shape(2);
    if (transpose_) {
      weight_shape[0] = K_;
      weight_shape[1] = N_;
    } else {
      weight_shape[0] = N_;
      weight_shape[1] = K_;
    }
    this->blobs_[0].reset(new Blob<Dtype>(weight_shape));
    // fill the weights
    shared_ptr<Filler<Dtype> > weight_filler(GetFiller<Dtype>(
        this->layer_param_.inner_product_param().weight_filler()));
    weight_filler->Fill(this->blobs_[0].get());
    // If necessary, intiialize and fill the bias term
    if (bias_term_) {
      vector<int> bias_shape(1, N_);
      this->blobs_[1].reset(new Blob<Dtype>(bias_shape));
      shared_ptr<Filler<Dtype> > bias_filler(GetFiller<Dtype>(
          this->layer_param_.inner_product_param().bias_filler()));
      bias_filler->Fill(this->blobs_[1].get());
    }
  }  // parameter initialization
  this->param_propagate_down_.resize(this->blobs_.size(), true);
}

3、reshape
输入是一个M*K的矩阵：num_input*（C*H*W）
经过转换：top_shape.resize(axis + 1)
输出是M*N的矩阵：

$num\_input*num\_output$

template <typename Dtype>
void InnerProductLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  // Figure out the dimensions
  const int axis = bottom[0]->CanonicalAxisIndex(
      this->layer_param_.inner_product_param().axis());
  const int new_K = bottom[0]->count(axis);
  CHECK_EQ(K_, new_K)
      << "Input size incompatible with inner product parameters.";
  // The first "axis" dimensions are independent inner products; the total
  // number of these is M_, the product over these dimensions.
  M_ = bottom[0]->count(0, axis);//num_input
  // The top shape will be the bottom shape with the flattened axes dropped,
  // and replaced by a single axis with dimension num_output (N_).
  vector<int> top_shape = bottom[0]->shape();
  top_shape.resize(axis + 1);
  top_shape[axis] = N_;
  top[0]->Reshape(top_shape);
  // Set up the bias multiplier
  if (bias_term_) {
    vector<int> bias_shape(1, M_);
    bias_multiplier_.Reshape(bias_shape);
    caffe_set(M_, Dtype(1), bias_multiplier_.mutable_cpu_data());
  }
}

3、前向计算

这里写图片描述
直接调用矩阵内积函数caffe_gpu_gemm

template <typename Dtype>
void InnerProductLayer<Dtype>::Forward_gpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
  const Dtype* bottom_data = bottom[0]->gpu_data();
  Dtype* top_data = top[0]->mutable_gpu_data();
  const Dtype* weight = this->blobs_[0]->gpu_data();
  if (M_ == 1) {
    caffe_gpu_gemv<Dtype>(CblasNoTrans, N_, K_, (Dtype)1.,weight, bottom_data, (Dtype)0., top_data);//这个是向量内积函数
    if (bias_term_)
      caffe_gpu_axpy<Dtype>(N_, bias_multiplier_.cpu_data()[0],this->blobs_[1]->gpu_data(), top_data);
  } else {
    caffe_gpu_gemm<Dtype>(CblasNoTrans,transpose_ ? CblasNoTrans : CblasTrans,M_, N_, K_, (Dtype)1.,bottom_data, weight, (Dtype)0., top_data);
    if (bias_term_)
      caffe_gpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, M_, N_, 1, (Dtype)1.,bias_multiplier_.gpu_data(),this->blobs_[1]->gpu_data(), (Dtype)1., top_data);
  }
}

4、反向传播
对参数求偏导
$\frac{\partial loss}{\partial w_kj}=\frac{\partial loss}{\partial z_k}*\frac{\partial z_k}{\partial w_kj}=\frac{\partial loss}{\partial z_k}*u_j$

转换成向量：
$\frac{\partial loss}{\partial W_j}==\frac{\partial loss}{\partial Z}*u_j$

转换成矩阵：
$\frac{\partial loss}{\partial W}==\frac{\partial loss}{\partial Z^T}*U$

即 $layer\_blobs\_ = top_diff*bottom\_data$

对输出求偏导：
公式：
$\frac{\partial loss}{\partial u_j}=\sum_k^{n=M}\frac{\partial loss}{\partial z_k}*\frac{\partial z_k}{\partial u_j}$

转化为向量
$\frac{\partial loss}{\partial U^T}=\frac{\partial loss}{\partial Z^T}*W$

M为需要分的类别数

转换成矩阵的形式：
$\frac{\partial loss}{\partial U}=\frac{\partial loss}{\partial Z}*W$

即
$bottom\_diff = top\_diff*layer\_blobs\_$

template <typename Dtype>
void InnerProductLayer<Dtype>::Backward_gpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down,
    const vector<Blob<Dtype>*>& bottom) {
  if (this->param_propagate_down_[0]) {
    const Dtype* top_diff = top[0]->gpu_diff();
    const Dtype* bottom_data = bottom[0]->gpu_data();
    // Gradient with respect to weight
    if (transpose_) {
      caffe_gpu_gemm<Dtype>(CblasTrans, CblasNoTrans,
          K_, N_, M_,
          (Dtype)1., bottom_data, top_diff,
          (Dtype)1., this->blobs_[0]->mutable_gpu_diff());
    } else {
      caffe_gpu_gemm<Dtype>(CblasTrans, CblasNoTrans,
          N_, K_, M_,
          (Dtype)1., top_diff, bottom_data,
          (Dtype)1., this->blobs_[0]->mutable_gpu_diff());
    }
  }
  if (bias_term_ && this->param_propagate_down_[1]) {
    const Dtype* top_diff = top[0]->gpu_diff();
    // Gradient with respect to bias
    caffe_gpu_gemv<Dtype>(CblasTrans, M_, N_, (Dtype)1., top_diff,
        bias_multiplier_.gpu_data(), (Dtype)1.,
        this->blobs_[1]->mutable_gpu_diff());
  }
  if (propagate_down[0]) {
    const Dtype* top_diff = top[0]->gpu_diff();
    // Gradient with respect to bottom data
    if (transpose_) {
      caffe_gpu_gemm<Dtype>(CblasNoTrans, CblasTrans,
          M_, K_, N_,
          (Dtype)1., top_diff, this->blobs_[0]->gpu_data(),
          (Dtype)0., bottom[0]->mutable_gpu_diff());
    } else {
      caffe_gpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans,
          M_, K_, N_,
         (Dtype)1., top_diff, this->blobs_[0]->gpu_data(),
         (Dtype)0., bottom[0]->mutable_gpu_diff());
    }
  }
}

本文链接：https://blog.csdn.net/sinat_22336563/article/details/70053720

原作者删帖不实内容删帖广告或垃圾文章投诉

智能推荐

51单片机的中断系统_51单片机中断篇-程序员宅基地

文章浏览阅读3.3k次，点赞7次，收藏39次。CPU 执行现行程序的过程中，出现某些急需处理的异常情况或特殊请求，CPU暂时中止现行程序，而转去对异常情况或特殊请求进行处理，处理完毕后再返回现行程序断点处，继续执行原程序。void 函数名(void) interrupt n using m {中断函数内容 //尽量精简 }编译器会把该函数转化为中断函数，表示中断源编号为n，中断源对应一个中断入口地址，而中断入口地址的内容为跳转指令，转入本函数。using m用于指定本函数内部使用的工作寄存器组，m取值为0~3。该修饰符可省略，由编译器自动分配。_51单片机中断篇

oracle项目经验求职,网络工程师简历中的项目经验怎么写-程序员宅基地

文章浏览阅读396次。项目经验(案例一)项目时间：2009-10 - 2009-12项目名称：中驰别克信息化管理整改完善项目描述：项目介绍一，建立中驰别克硬件档案(PC,服务器，网络设备，办公设备等)二，建立中驰别克软件档案(每台PC安装的软件，财务，HR，OA，专用系统等)三，能过建立的档案对中驰别克信息化办公环境优化(合理使用ADSL宽带资源，对域进行调整，对文件服务器进行优化，对共享打印机进行调整)四，优化完成后..._网络工程师项目经历

LVS四层负载均衡集群-程序员宅基地

文章浏览阅读1k次，点赞31次，收藏30次。LVS：Linux Virtual Server，负载调度器，内核集成，阿里的四层SLB(Server Load Balance)是基于LVS+keepalived实现。NATTUNDR优点端口转换WAN性能最好缺点性能瓶颈服务器支持隧道模式不支持跨网段真实服务器要求anyTunneling支持网络private（私网）LAN/WAN（私网/公网）LAN（私网）真实服务器数量High (100)High (100)真实服务器网关lvs内网地址。

「技术综述」一文道尽传统图像降噪方法_噪声很大的图片可以降噪吗-程序员宅基地

文章浏览阅读899次。https://www.toutiao.com/a6713171323893318151/作者 | 黄小邪/言有三编辑 | 黄小邪/言有三图像预处理算法的好坏直接关系到后续图像处理的效果，如图像分割、目标识别、边缘提取等，为了获取高质量的数字图像，很多时候都需要对图像进行降噪处理，尽可能的保持原始信息完整性（即主要特征）的同时，又能够去除信号中无用的信息。并且，降噪还引出了一..._噪声很大的图片可以降噪吗

Effective Java 【对于所有对象都通用的方法】第13条谨慎地覆盖clone_为继承设计类有两种选择,但无论选择其中的-程序员宅基地

文章浏览阅读152次。目录谨慎地覆盖cloneCloneable接口并没有包含任何方法，那么它到底有什么作用呢？Object类中的clone()方法如何重写好一个clone()方法1.对于数组类型我可以采用clone()方法的递归2.如果对象是非数组，建议提供拷贝构造器（copy constructor）或者拷贝工厂（copy factory）3.如果为线程安全的类重写clone()方法4.如果为需要被继承的类重写clone()方法总结谨慎地覆盖cloneCloneable接口地目的是作为对象的一个mixin接口（详见第20_为继承设计类有两种选择,但无论选择其中的

毕业设计基于协同过滤的电影推荐系统-程序员宅基地

文章浏览阅读958次，点赞21次，收藏24次。今天学长向大家分享一个毕业设计项目基于协同过滤的电影推荐系统项目运行效果：项目获取：https://gitee.com/assistant-a/project-sharing21世纪是信息化时代，随着信息技术和网络技术的发展，信息化已经渗透到人们日常生活的各个方面，人们可以随时随地浏览到海量信息，但是这些大量信息千差万别，需要费事费力的筛选、甄别自己喜欢或者感兴趣的数据。对网络电影服务来说，需要用到优秀的协同过滤推荐功能去辅助整个系统。系统基于Python技术，使用UML建模，采用Django框架组合进行设

随便推点

你想要的10G SFP+光模块大全都在这里-程序员宅基地

文章浏览阅读614次。10G SFP+光模块被广泛应用于10G以太网中，在下一代移动网络、固定接入网、城域网、以及数据中心等领域非常常见。下面易天光通信（ETU-LINK）就为大家一一盘点下10G SFP+光模块都有哪些吧。一、10G SFP+双纤光模块10G SFP+双纤光模块是一种常规的光模块，有两个LC光纤接口，传输距离最远可达100公里，常用的10G SFP+双纤光模块有10G SFP+ SR、10G SFP+ LR，其中10G SFP+ SR的传输距离为300米，10G SFP+ LR的传输距离为10公里。_10g sfp+

计算机毕业设计Node.js+Vue基于Web美食网站设计(程序+源码+LW+部署)_基于vue美食网站源码-程序员宅基地

文章浏览阅读239次。该项目含有源码、文档、程序、数据库、配套开发软件、软件安装教程。欢迎交流项目运行环境配置：项目技术：Express框架 + Node.js+ Vue 等等组成，B/S模式 +Vscode管理+前后端分离等等。环境需要1.运行环境：最好是Nodejs最新版，我们在这个版本上开发的。其他版本理论上也可以。2.开发环境：Vscode或HbuilderX都可以。推荐HbuilderX;3.mysql环境：建议是用5.7版本均可4.硬件环境：windows 7/8/10 1G内存以上；_基于vue美食网站源码