神经网络推导-批量数据

输入批量数据到神经网络,进行前向传播和反向传播的推导

TestNet网络

TestNet是一个2层神经网络,结构如下:

  • 输入层有3个神经元
  • 隐藏层有4个神经元
  • 输出层有2个神经元

  • 激活函数为relu函数
  • 评分函数为softmax回归
  • 代价函数为交叉熵损失

网络符号定义

规范神经网络的计算符号

关于神经元和层数

  • $L$表示网络层数(不计入输入层)
    • $L=2$,其中输入层是第0层,隐藏层是第1层,输出层是第2
  • $n^{(l)}$表示第$l$层的神经元个数(不包括偏置神经元)
    • $n^{(0)}=3$,表示输入层神经元个数为3
    • $n^{(1)}=4$,表示隐藏层神经元个数为4
    • $n^{(2)}=2$,表示输出层神经元个数为2

关于权重矩阵和偏置值

  • $W^{(l)}$表示第$l-1$层到第$l$层的权重矩阵,矩阵行数为第$l-1$层的神经元个数,列数为第$l$层神经元个数
    • $W^{(1)}$表示输入层到隐藏层的权重矩阵,大小为$R^{3\times 4}$
    • $W^{(2)}$表示隐藏层到输出层的权重矩阵,大小为$R^{4\times 2}$
  • $W^{(l)}_{i,j}$表示第$l-1$层第$i$个神经元到第$l$第$j$个神经元的权值
    • $i$的取值范围是$[1,n^{(l-1)}]$
    • $j$的取值范围是$[1, n^{(l)}]$
  • $W^{(l)}_{i}$表示第$l-1$层第$i$个神经元对应的权重向量,大小为$n^{(l)}$
  • $W^{(l)}_{,j}$表示第$l$层第$j$个神经元对应的权重向量,大小为$n^{(l-1)}$
  • $b^{(l)}$表示第$l$层的偏置向量
    • $b^{(1)}$表示输入层到隐藏层的偏置向量,大小为$R^{1\times 4}$
    • $b^{(2)}$表示隐藏层到输出层的偏置向量,大小为$R^{1\times 2}$
  • $b^{(l)}_{i}$表示第$l$层第$i$个神经元的偏置值
    • $b^{(1)}_{2}$表示第$1$层隐藏层第$2$个神经元的偏置值

关于神经元输入向量和输出向量

  • $a^{(l)}$表示第$l$层输出向量,$a^{(l)}=[a^{(l)}{1},a^{(l)}{2},…,a^{(l)}_{m}]^{T}$

    • $a^{(0)}$表示输入层输出向量,大小为$R^{m\times 3}$
    • $a^{(1)}$表示隐藏层输出向量,大小为$R^{m\times 4}$
    • $a^{(2)}$表示输出层输出向量,大小为$R^{m\times 2}$
  • $a^{(l)}_{i}$表示第$l$层第$i$个单元的输出值,其是输入向量经过激活计算后的值

    • $a^{(1)}{3}$表示隐含层第$3$个神经元的输入值,$a^{(1)}{3}=g(z^{(1)}_{3})$
  • $z^{(l)}$表示第$l$层输入向量,$z^{(l)}=[z^{(l)}{1},z^{(l)}{2},…,z^{(l)}_{m}]^{T}$

    • $z^{(1)}$表示隐藏层的输入向量,大小为$R^{m\times 4}$
    • $z^{(2)}$表示输出层的输入向量,大小为$R^{m\times 2}$
  • $z^{(l)}_{i,j}$表示第$l$层第$j$个单元的输入值,其是上一层输出向量第$i$个数据和该层第$j$个神经元权重向量的加权累加和

    • $z^{(1)}{1,2}$表示隐藏层第$2$个神经元的输入值,$z^{(1)}{1,2}=b^{(2)}{2}+a^{(0)}{1,1}\cdot W^{(1)}{1,2}+a^{(0)}{1,2}\cdot W^{(1)}{2,2}+a^{(0)}{1,3}\cdot W^{(1)}_{3,2}$

关于神经元激活函数

  • $g()$表示激活函数操作

关于评分函数和损失函数

  • $h()$表示评分函数操作
  • $J()$表示代价函数操作

神经元执行步骤

神经元操作分为2步计算:

  1. 输入向量$z^{(l)}$=前一层神经元输出向量$a^{(l-1)}$与权重矩阵$W^{(l)}$的加权累加和+偏置向量

$$
z^{(l)}{i,j}=a^{(l-1)}{i}\cdot W^{(l)}{,j} + b^{(l)}{j} \Rightarrow
z^{(l)}=a^{(l-1)}\cdot W^{(l)} + b^{(l)}
$$

  1. 输出向量$a^{(l)}$=对输入向量$z^{(l)}$进行激活函数操作

$$
a^{(l)}{i}=g(z{i}^{(l)})
\Rightarrow
a^{(l)}=g(z^{(l)})
$$

网络结构

对输入层

$$
a^{(0)}
=\begin{bmatrix}
a^{(0)}{1}\
\vdots\
a^{(0)}
{m}
\end{bmatrix}
=\begin{bmatrix}
a^{(0)}{1,1} & a^{(0)}{1,2} & a^{(0)}{1,3}\
\vdots & \vdots & \vdots\
a^{(0)}
{m,1} & a^{(0)}{m,2} & a^{(0)}{m,3}
\end{bmatrix}\in R^{m\times 3}
$$

对隐藏层

$$
W^{(1)}
=\begin{bmatrix}
W^{(1)}{1,1} & W^{(1)}{1,2} & W^{(1)}{1,3} & W^{(1)}{1,4}\
W^{(1)}{2,1} & W^{(1)}{2,2} & W^{(1)}{2,3} & W^{(1)}{2,4}\
W^{(1)}{3,1} & W^{(1)}{3,2} & W^{(1)}{3,3} & W^{(1)}{3,4}
\end{bmatrix}
\in R^{3\times 4}
$$

$$
b^{(1)}=[[b^{(1)}{1},b^{(1)}{2},b^{(1)}{3},b^{(1)}{4}]]\in R^{1\times 4}
$$

$$
z^{(1)}
=\begin{bmatrix}
z^{(0)}{1,1} & z^{(0)}{1,2} & z^{(0)}{1,3} & z^{(0)}{1,4}\
\vdots & \vdots & \vdots & \vdots\
z^{(0)}{m,1} & z^{(0)}{m,2} & z^{(0)}{m,3} & z^{(0)}{m,4}
\end{bmatrix}\in R^{m\times 4}
$$

$$
a^{(1)}
=\begin{bmatrix}
a^{(0)}{1,1} & a^{(0)}{1,2} & a^{(0)}{1,3} & a^{(0)}{1,4}\
\vdots & \vdots & \vdots & \vdots\
a^{(0)}{m,1} & a^{(0)}{m,2} & a^{(0)}{m,3} & a^{(0)}{m,4}
\end{bmatrix}\in R^{m\times 4}
$$

对输出层

$$
W^{(2)}
=\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{1,2}\
W^{(2)}{2,1} & W^{(2)}{2,2}\
W^{(2)}{3,1} & W^{(2)}{3,2}\
W^{(2)}{4,1} & W^{(2)}{4,2}
\end{bmatrix}
\in R^{4\times 2}
$$

$$
b^{(2)}=[[b^{(2)}{1},b^{(2)}{2}]]\in R^{1\times 2}
$$

$$
z^{(2)}
=\begin{bmatrix}
z^{(2)}{1,1} & z^{(0)}{1,2}\
\vdots & \vdots\
z^{(2)}{m,1} & z^{(0)}{m,2}
\end{bmatrix}\in R^{m\times 2}
$$

评分值

$$
h(z^{(2)})
=\begin{bmatrix}
p(y_{1}=1) & p(y_{1}=2)\
\vdots & \vdots\
p(y_{m}=1) & p(y_{m}=2)
\end{bmatrix}\in R^{m\times 2}
$$

损失值

$$
J(z^{(2)})=(-1)\sum_{i=1}^{m} \sum_{j=1}^{2}\cdot 1(y_{m,j}=1)\ln p(y_{m,j}=1)\in R^{1}
$$

前向传播

输入层到隐藏层计算

$$
z^{(1)}{i,1}=a^{(0)}{i}\cdot W^{(1)}{,1}+b^{(1)}{1}
=a^{(0)}{i,1}\cdot W^{(1)}{1,1}
+a^{(0)}{i,2}\cdot W^{(1)}{2,1}
+a^{(0)}{i,3}\cdot W^{(1)}{3,1}
+b^{(1)}_{1,1}
$$

$$
z^{(1)}{i,2}=a^{(0)}{i}\cdot W^{(1)}{,2}+b^{(1)}{2}
=a^{(0)}{i,1}\cdot W^{(1)}{1,2}
+a^{(0)}{i,2}\cdot W^{(1)}{2,2}
+a^{(0)}{i,3}\cdot W^{(1)}{3,2}
+b^{(1)}_{1,2}
$$

$$
z^{(1)}{i,3}=a^{(0)}{i}\cdot W^{(1)}{,3}+b^{(1)}{3}
=a^{(0)}{i,1}\cdot W^{(1)}{1,3}
+a^{(0)}{i,2}\cdot W^{(1)}{2,3}
+a^{(0)}{i,3}\cdot W^{(1)}{3,3}
+b^{(1)}_{1,3}
$$

$$
z^{(1)}{i,4}=a^{(0)}{i}\cdot W^{(1)}{,4}+b^{(1)}{4}
=a^{(0)}{i,1}\cdot W^{(1)}{1,4}
+a^{(0)}{i,2}\cdot W^{(1)}{2,4}
+a^{(0)}{i,3}\cdot W^{(1)}{3,4}
+b^{(1)}_{1,4}
$$

$$
\Rightarrow z^{(1)}{i}
=[z^{(1)}
{i,1},z^{(1)}{i,2},z^{(1)}{i,3},z^{(1)}{i,4}]
=a^{(0)}
{i}\cdot W^{(1)}+b^{(1)}
$$

$$
\Rightarrow z^{(1)}
=a^{(0)}\cdot W^{(1)}+b^{(1)}
$$

隐藏层输入向量到输出向量

$$
a^{(1)}{i,1}=relu(z^{(1)}{i,1}) \
a^{(1)}{i,2}=relu(z^{(1)}{i,2}) \
a^{(1)}{i,3}=relu(z^{(1)}{i,3}) \
a^{(1)}{i,4}=relu(z^{(1)}{i,4})
$$

$$
\Rightarrow
a^{(1)}{i}=[a^{(1)}{i,1},a^{(1)}{i,2},a^{(1)}{i,3},a^{(1)}{i,4}]
=relu(z^{(1)}
{i})
$$

$$
\Rightarrow
a^{(1)}=relu(z^{(1)})
$$

隐藏层到输出层计算

$$
z^{(2)}{i,1}=a^{(1)}{i}\cdot W^{(2)}{,1}+b^{(2)}{1,1}
=a^{(1)}{i,1}\cdot W^{(2)}{1,1}
+a^{(1)}{i,2}\cdot W^{(2)}{2,1}
+a^{(1)}{i,3}\cdot W^{(2)}{3,1}
+a^{(1)}{i,4}\cdot W^{(2)}{4,1}
+b^{(2)}_{1,1}
$$

$$
z^{(2)}{i,2}=a^{(1)}{i}\cdot W^{(2)}{,2}+b^{(2)}{1,2}
=a^{(1)}{i,1}\cdot W^{(2)}{1,2}
+a^{(1)}{i,2}\cdot W^{(2)}{2,2}
+a^{(1)}{i,3}\cdot W^{(2)}{3,2}
+a^{(1)}{i,4}\cdot W^{(2)}{4,2}
+b^{(2)}_{1,2}
$$

$$
\Rightarrow z^{(2)}{i}
=[z^{(2)}
{i,1},z^{(2)}{i,2}]
=a^{(1)}
{i}\cdot W^{(2)}+b^{(2)}
$$

$$
\Rightarrow z^{(2)}
=a^{(1)}\cdot W^{(2)}+b^{(2)}
$$

评分操作

$$
p(y_{i}=1)=\frac {exp(z^{(2)}{i,1})}{\sum exp(z^{(2)}{i})} \
p(y_{i}=2)=\frac {exp(z^{(2)}{i,2})}{\sum exp(z^{(2)}{i})}
$$

$$
\Rightarrow h(z^{(2)}{i})
=[p(y
{i}=1),p(y_{i}=2)]
=[\frac {exp(z^{(2)}{i,1})}{\sum exp(z^{(2)}{i})}, \frac {exp(z^{(2)}{i,2})}{\sum exp(z^{(2)}{i})}]
$$

$$
\Rightarrow h(z^{(2)})
=\begin{bmatrix}
p(y_{1}=1) & p(y_{1}=2) \
\vdots & \vdots\
p(y_{m}=1) & p(y_{m}=2)
\end{bmatrix}
$$

损失值

$$
J(z^{(2)})=(-1)\sum_{i=1}^{m} \sum_{j=1}^{2}\cdot 1(y_{m,j}=1)\ln p(y_{m,j}=1)
$$

反向传播

计算输出层输入向量梯度

$$
\frac {\partial J}{\partial z^{(2)}{i,1}}=
(-1)\cdot \frac {1(y
{i}=1)}{p(y_{i}=1)}\cdot \frac {\partial p(y_{i}=1)}{\partial z^{(2)}{i,1}}
+(-1)\cdot \frac {1(y
{i}=2)}{p(y_{i}=2)}\cdot \frac {\partial p(y_{i}=2)}{\partial z^{(2)}_{i,1}}
$$

$$
\frac {\partial p(y_{i}=1)}{\partial z^{(2)}{i,1}}
=\frac {exp(z^{(2)}
{i,1})\cdot \sum exp(z^{(2)}{i})-exp(z^{(2)}{i,1})\cdot exp(z^{(2)}{i,1})}{(\sum exp(z^{(2)}{i}))^2}
=\frac {exp(z^{(2)}{i,1})}{\sum exp(z^{(2)}{i})}
-(\frac {exp(z^{(2)}{i,1})}{\sum exp(z^{(2)}{i})})^2
=p(y_{i}=1)-(p(y_{i}=1))^2
$$

$$
\frac {\partial p(y_{i}=2)}{\partial z^{(2)}{i,1}}
=\frac {-exp(z^{(2)}
{i,2})\cdot exp(z^{(2)}{i,1})}{(\sum exp(z^{(2)}{i}))^2}
=(-1)\cdot \frac {exp(z^{(2)}{i,1})}{\sum exp(z^{(2)}{i})}\cdot \frac {exp(z^{(2)}{i,2})}{\sum exp(z^{(2)}{i})}
=(-1)\cdot p(y_{i}=1)p(y_{i}=2)
$$

$$
\Rightarrow \frac {\partial J}{\partial z^{(2)}{i,1}}
=(-1)\cdot \frac {1(y
{i}=1)}{p(y_{i}=1)}\cdot (p(y_{i}=1)-(p(y_{i}=1))^2)
+(-1)\cdot \frac {1(y_{i}=2)}{p(y_{i}=2)}\cdot (-1)\cdot p(y_{i}=1)p(y_{i}=2) \
=(-1)\cdot 1(y_{i}=1)\cdot (1-p(y_{i}=1))
+1(y_{i}=2)\cdot p(y_{i}=1)
=p(y_{i}=1)-1(y_{i}=1)
$$

$$
\Rightarrow \frac {\partial J}{\partial z^{(2)}{i,2}}
=p(y
{i}=2)-1(y_{i}=2)
$$

$$
\Rightarrow \frac {\partial J}{\partial z^{(2)}{i}}
=[p(y
{i}=1)-1(y_{i}=1), p(y_{i}=2)-1(y_{i}=2)]
$$

$$
\Rightarrow \frac {\partial J}{\partial z^{(2)}}
=\begin{bmatrix}
p(y_{1}=1)-1(y_{1}=1) & p(y_{1}=2)-1(y_{1}=2)\
\vdots & \vdots\
p(y_{m}=1)-1(y_{m}=1) & p(y_{m}=2)-1(y_{m}=2)
\end{bmatrix}
$$

计算输出层权重向量梯度

$$
\frac {\partial J}{\partial W^{(2)}{1,1}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial W^{(2)}{1,1}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}_{i,1})
$$

$$
\frac {\partial J}{\partial W^{(2)}{2,1}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial W^{(2)}{2,1}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}_{i,2})
$$

$$
\frac {\partial J}{\partial W^{(2)}{3,1}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial W^{(2)}{3,1}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}_{i,3})
$$

$$
\frac {\partial J}{\partial W^{(2)}{4,1}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial W^{(2)}{4,1}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}_{i,4})
$$

$$
\frac {\partial J}{\partial W^{(2)}{1,2}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{2}}{\partial W^{(2)}{1,2}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}_{i,1})
$$

$$
\frac {\partial J}{\partial W^{(2)}{2,2}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{2}}{\partial W^{(2)}{2,2}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}_{i,2})
$$

$$
\frac {\partial J}{\partial W^{(2)}{3,2}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{2}}{\partial W^{(2)}{3,2}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}_{i,3})
$$

$$
\frac {\partial J}{\partial W^{(2)}{4,2}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{2}}{\partial W^{(2)}{4,2}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}_{i,4})
$$

$$
\Rightarrow \frac {\partial J}{\partial W^{(2)}}
=\begin{bmatrix}
\frac {\partial J}{\partial W^{(2)}{1,1}} & \frac {\partial J}{\partial W^{(2)}{1,2}}\
\frac {\partial J}{\partial W^{(2)}{2,1}} & \frac {\partial J}{\partial W^{(2)}{2,2}}\
\frac {\partial J}{\partial W^{(2)}{3,1}} & \frac {\partial J}{\partial W^{(2)}{3,2}}\
\frac {\partial J}{\partial W^{(2)}{4,1}} & \frac {\partial J}{\partial W^{(2)}{4,2}}
\end{bmatrix}
$$

$$
=\begin{bmatrix}
\frac {1}{m}\sum_{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}{i,1}) & \frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}{i,2})\
\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}{i,3}) & \frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot a^{(1)}{i,4})\
\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}{i,1}) & \frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}{i,2})\
\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}{i,3}) & \frac {1}{m}\sum{i=1}^{m} ((p(y_{i}=2)-1(y_{i}=2))\cdot a^{(1)}_{i,4})
\end{bmatrix}
$$

$$
=\frac {1}{m}\sum_{i=1}^{m}
\begin{bmatrix}
a^{(1)}{i,1}\
a^{(1)}
{i,2}\
a^{(1)}{i,3}\
a^{(1)}
{i,4}
\end{bmatrix}
\begin{bmatrix}
p(y_{i}=1)-1(y_{i}=1) & p(y_{i}=2)-1(y_{i}=2)
\end{bmatrix}\
=\frac {1}{m}\sum_{i=1}^{m} ((a^{(1)}{i})^{T}\cdot \frac {\partial J}{\partial z^{(2)}{i}})
=\frac {1}{m} (a^{(1)})^{T}\cdot \frac {\partial J}{\partial z^{(2)}}
=\frac {1}{m}\sum_{i=1}^{m} (R^{4\times m}\cdot R^{m\times 2})
=R^{4\times 2}
$$

计算隐藏层输出向量梯度

$$
\frac {\partial J}{\partial a^{(1)}{i,1}}
=\frac {\partial J}{\partial z^{(2)}
{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial a^{(1)}{i,1}}
+\frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{i,2}}{\partial a^{(1)}{i,1}}
=(p(y
{i}=1)-1(y_{i}=1))\cdot W^{(2)}{1,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}_{1,2}
$$

$$
\frac {\partial J}{\partial a^{(1)}{i,2}}
=\frac {\partial J}{\partial z^{(2)}
{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial a^{(1)}{i,2}}
+\frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{i,2}}{\partial a^{(1)}{i,2}}
=(p(y
{i}=1)-1(y_{i}=1))\cdot W^{(2)}{2,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}_{2,2}
$$

$$
\frac {\partial J}{\partial a^{(1)}{i,3}}
=\frac {\partial J}{\partial z^{(2)}
{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial a^{(1)}{i,3}}
+\frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{i,2}}{\partial a^{(1)}{i,3}}
=(p(y
{i}=1)-1(y_{i}=1))\cdot W^{(2)}{3,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}_{3,2}
$$

$$
\frac {\partial J}{\partial a^{(1)}{i,4}}
=\frac {\partial J}{\partial z^{(2)}
{i,1}}\cdot \frac {\partial z^{(2)}{i,1}}{\partial a^{(1)}{i,4}}
+\frac {\partial J}{\partial z^{(2)}{i,2}}\cdot \frac {\partial z^{(2)}{i,2}}{\partial a^{(1)}{i,4}}
=(p(y
{i}=1)-1(y_{i}=1))\cdot W^{(2)}{4,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}_{4,2}
$$

$$
\Rightarrow \frac {\partial J}{\partial a^{(1)}{i}}
=\begin{bmatrix}
p(y
{i}=1)-1(y_{i}=1) & p(y_{i}=2)-1(y_{i}=2)
\end{bmatrix}
\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{2,1} & W^{(2)}{3,1} & W^{(2)}{4,1}\
W^{(2)}{1,2} & W^{(2)}{2,2} & W^{(2)}{3,2} & W^{(2)}{4,2}
\end{bmatrix} \
=\frac {\partial J}{\partial z^{(2)}_{i}}\cdot (W^{(2)})^T
=R^{1\times 2}\cdot R^{2\times 4}
=R^{1\times 4}
$$

$$
\Rightarrow \frac {\partial J}{\partial a^{(1)}}
=\begin{bmatrix}
p(y_{1}=1)-1(y_{1}=1) & p(y_{1}=2)-1(y_{1}=2)\
\vdots & \vdots\
p(y_{m}=1)-1(y_{m}=1) & p(y_{m}=2)-1(y_{m}=2)
\end{bmatrix}
\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{2,1} & W^{(2)}{3,1} & W^{(2)}{4,1}\
W^{(2)}{1,2} & W^{(2)}{2,2} & W^{(2)}{3,2} & W^{(2)}{4,2}
\end{bmatrix} \
=\frac {\partial J}{\partial z^{(2)}}\cdot (W^{(2)})^T
=R^{m\times 2}\cdot R^{2\times 4}
=R^{m\times 4}
$$

计算隐藏层输入向量的梯度

$$
\frac {\partial J}{\partial z^{(1)}{i,1}}
=\frac {\partial J}{\partial a^{(1)}
{i,1}}\cdot \frac {\partial a^{(1)}{i,1}}{\partial z^{(1)}{i,1}}
=((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{1,1}
+(p(y=2)-1(y=2))\cdot W^{(2)}
{1,2})\cdot 1(z^{(1)}_{i,1}\geq 0)
$$

$$
\frac {\partial J}{\partial z^{(1)}{i,2}}
=\frac {\partial J}{\partial a^{(1)}
{i,2}}\cdot \frac {\partial a^{(1)}{i,2}}{\partial z^{(1)}{i,2}}
=((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{2,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}{2,2})\cdot 1(z^{(1)}{i,2}\geq 0)
$$

$$
\frac {\partial J}{\partial z^{(1)}{i,3}}
=\frac {\partial J}{\partial a^{(1)}
{i,3}}\cdot \frac {\partial a^{(1)}{i,3}}{\partial z^{(1)}{i,3}}
=((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{3,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}{3,2})\cdot 1(z^{(1)}{i,3}\geq 0)
$$

$$
\frac {\partial J}{\partial z^{(1)}{i,4}}
=\frac {\partial J}{\partial a^{(1)}
{i,4}}\cdot \frac {\partial a^{(1)}{i,4}}{\partial z^{(1)}{i,4}}
=((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{4,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}{4,2})\cdot 1(z^{(1)}{i,4}\geq 0)
$$

$$
\Rightarrow \frac {\partial J}{\partial z^{(1)}{i}}
=(\begin{bmatrix}
p(y
{i}=1)-1(y_{i}=1) & p(y_{i}=2)-1(y_{i}=2)
\end{bmatrix}
\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{2,1} & W^{(2)}{3,1} & W^{(2)}{4,1}\
W^{(2)}{1,2} & W^{(2)}{2,2} & W^{(2)}{3,2} & W^{(2)}{4,2}
\end{bmatrix})
\begin{bmatrix}
\frac {\partial a^{(1)}{i,1}}{\partial z^{(1)}{i,1}}&
\frac {\partial a^{(1)}{i,2}}{\partial z^{(1)}{i,2}}&
\frac {\partial a^{(1)}{i,3}}{\partial z^{(1)}{i,3}}&
\frac {\partial a^{(1)}{i,4}}{\partial z^{(1)}{i,4}}
\end{bmatrix}\
=(R^{1\times 2}\cdot R^{2\times 4})
R^{1\times 4}
=R^{1\times 4}
$$

$$
\Rightarrow \frac {\partial J}{\partial z^{(1)}{i}}
=(\begin{bmatrix}
p(y
{i}=1)-1(y_{i}=1) & p(y_{i}=2)-1(y_{i}=2)
\end{bmatrix}
\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{2,1} & W^{(2)}{3,1} & W^{(2)}{4,1}\
W^{(2)}{1,2} & W^{(2)}{2,2} & W^{(2)}{3,2} & W^{(2)}{4,2}
\end{bmatrix})
*
\begin{bmatrix}
1(z^{(1)}{i,1}\geq 0) & 1(z^{(1)}{i,2}\geq 0) & 1(z^{(1)}{i,3}\geq 0) & 1(z^{(1)}{i,4}\geq 0)
\end{bmatrix}\
=(R^{1\times 2}\cdot R^{2\times 4})\ast R^{1\times 4}
=R^{1\times 4}
$$

$$
\Rightarrow \frac {\partial J}{\partial z^{(1)}}
=(\begin{bmatrix}
p(y_{1}=1)-1(y_{1}=1) & p(y_{1}=2)-1(y_{1}=2)\
\vdots & \vdots\
p(y_{m}=1)-1(y_{m}=1) & p(y_{m}=2)-1(y_{m}=2)
\end{bmatrix}
\begin{bmatrix}
W^{(2)}{1,1} & W^{(2)}{2,1} & W^{(2)}{3,1} & W^{(2)}{4,1}\
W^{(2)}{1,2} & W^{(2)}{2,2} & W^{(2)}{3,2} & W^{(2)}{4,2}
\end{bmatrix})
*
\begin{bmatrix}
1(z^{(1)}{1,1}\geq 0) & 1(z^{(1)}{1,2}\geq 0) & 1(z^{(1)}{1,3}\geq 0) & 1(z^{(1)}{1,4}\geq 0)\
\vdots & \vdots\
1(z^{(1)}{m,1}\geq 0) & 1(z^{(1)}{m,2}\geq 0) & 1(z^{(1)}{m,3}\geq 0) & 1(z^{(1)}{m,4}\geq 0)
\end{bmatrix}\
=\frac {\partial J}{\partial a^{(1)}} * 1(z^{(1)}\geq 0)
=(R^{m\times 2}\cdot R^{2\times 4})\ast R^{m\times 4}
=R^{m\times 4}
$$

计算隐藏层权重向量的梯度

$$
\frac {\partial J}{\partial W^{(1)}{1,1}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,1}}\cdot
\frac {\partial z^{(1)}
{i,1}}{\partial W^{(1)}{1,1}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{1,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}{1,2})\cdot 1(z^{(1)}{i,1}\geq 0)\cdot a^{(0)}_{i,1}
$$

$$
\frac {\partial J}{\partial W^{(1)}{1,2}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot
\frac {\partial z^{(1)}
{i,2}}{\partial W^{(1)}{1,2}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{2,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}{2,2})\cdot 1(z^{(1)}{i,2}\geq 0)\cdot a^{(0)}_{i,1}
$$

$$
\Rightarrow \frac {\partial J}{\partial W^{(1)}{k,l}}
=\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,l}}\cdot
\frac {\partial z^{(1)}
{i,l}}{\partial W^{(1)}{k,l}}
=\frac {1}{m}\sum
{i=1}^{m} ((p(y_{i}=1)-1(y_{i}=1))\cdot W^{(2)}{l,1}
+(p(y
{i}=2)-1(y_{i}=2))\cdot W^{(2)}{l,2})\cdot 1(z^{(1)}{i,l}\geq 0)\cdot a^{(0)}_{i,k}
$$

$$
\Rightarrow \frac {\partial J}{\partial W^{(1)}}
=\begin{bmatrix}
\frac {\partial J}{\partial W^{(1)}{1,1}} & \frac {\partial J}{\partial W^{(1)}{1,2}} & \frac {\partial J}{\partial W^{(1)}{1,3}} & \frac {\partial J}{\partial W^{(1)}{1,4}}\
\frac {\partial J}{\partial W^{(1)}{2,1}} & \frac {\partial J}{\partial W^{(1)}{2,2}} & \frac {\partial J}{\partial W^{(1)}{2,3}} & \frac {\partial J}{\partial W^{(1)}{2,4}}\
\frac {\partial J}{\partial W^{(1)}{3,1}} & \frac {\partial J}{\partial W^{(1)}{3,2}} & \frac {\partial J}{\partial W^{(1)}{3,3}} & \frac {\partial J}{\partial W^{(1)}{3,4}}
\end{bmatrix}\
=\begin{bmatrix}
\frac {1}{m}\sum_{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{1,1}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{1,2}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{1,3}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{1,4}}\
\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{2,1}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{2,2}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{2,3}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{2,4}}\
\frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{3,1}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{3,2}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{3,3}}
& \frac {1}{m}\sum
{i=1}^{m} \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{3,4}}
\end{bmatrix}\
=\frac {1}{m}\sum
{i=1}^{m} \begin{bmatrix}
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{1,1}}
& \frac {\partial J}{\partial z^{(1)}
{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{1,2}}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{1,3}}
& \frac {\partial J}{\partial z^{(1)}
{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{1,4}}\
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{2,1}}
& \frac {\partial J}{\partial z^{(1)}
{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{2,2}}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{2,3}}
& \frac {\partial J}{\partial z^{(1)}
{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{2,4}}\
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot \frac {\partial z^{(1)}{i,1}}{\partial W^{(1)}{3,1}}
& \frac {\partial J}{\partial z^{(1)}
{i,2}}\cdot \frac {\partial z^{(1)}{i,2}}{\partial W^{(1)}{3,2}}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot \frac {\partial z^{(1)}{i,3}}{\partial W^{(1)}{3,3}}
& \frac {\partial J}{\partial z^{(1)}
{i,4}}\cdot \frac {\partial z^{(1)}{i,4}}{\partial W^{(1)}{3,4}}
\end{bmatrix}\
=\frac {1}{m}\sum_{i=1}^{m} \begin{bmatrix}
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot a^{(0)}{i,1}
& \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot a^{(0)}{i,1}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot a^{(0)}{i,1}
& \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot a^{(0)}{i,1}\
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot a^{(0)}{i,2}
& \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot a^{(0)}{i,2}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot a^{(0)}{i,2}
& \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot a^{(0)}{i,2}\
\frac {\partial J}{\partial z^{(1)}{i,1}}\cdot a^{(0)}{i,3}
& \frac {\partial J}{\partial z^{(1)}{i,2}}\cdot a^{(0)}{i,3}
& \frac {\partial J}{\partial z^{(1)}{i,3}}\cdot a^{(0)}{i,3}
& \frac {\partial J}{\partial z^{(1)}{i,4}}\cdot a^{(0)}{i,3}
\end{bmatrix}\
=\frac {1}{m}\sum_{i=1}^{m}
\begin{bmatrix}
a^{(0)}{i,1}\
a^{(0)}
{i,2}\
a^{(0)}{i,3}
\end{bmatrix}
\begin{bmatrix}
\frac {\partial J}{\partial z^{(1)}
{i,1}}
& \frac {\partial J}{\partial z^{(1)}{i,2}}
& \frac {\partial J}{\partial z^{(1)}
{i,3}}
& \frac {\partial J}{\partial z^{(1)}{i,4}}
\end{bmatrix}
=\frac {1}{m}\sum
{i=1}^{m} (a^{(0)}{i})^T\cdot \frac {\partial J}{\partial z^{(1)}{i}}
=\frac {1}{m} (a^{(0)})^T\cdot \frac {\partial J}{\partial z^{(1)}}
=R^{3\times m}\cdot R^{m\times 4}
=R^{3\times 4}
$$

小结

TestNet网络的前向操作如下:

$$
z^{(1)}
=a^{(0)}\cdot W^{(1)}+b^{(1)}
$$

$$
a^{(1)}=relu(z^{(1)})
$$

$$
z^{(2)}
=a^{(1)}\cdot W^{(2)}+b^{(2)}
$$

$$
h(z^{(2)})
=\begin{bmatrix}
p(y_{1}=1) & p(y_{1}=2) \
\vdots & \vdots\
p(y_{m}=1) & p(y_{m}=2)
\end{bmatrix}
$$

$$
J(z^{(2)})=(-1)\sum_{i=1}^{m} \sum_{j=1}^{2}\cdot 1(y_{m,j}=1)\ln p(y_{m,j}=1)
$$

反向传播如下:

$$
\frac {\partial J}{\partial z^{(2)}}
=\begin{bmatrix}
p(y_{1}=1)-1(y_{1}=1) & p(y_{1}=2)-1(y_{1}=2)\
\vdots & \vdots\
p(y_{m}=1)-1(y_{m}=1) & p(y_{m}=2)-1(y_{m}=2)
\end{bmatrix}
$$

$$
\frac {\partial J}{\partial W^{(2)}}
=\frac {1}{m} (a^{(1)})^{T}\cdot \frac {\partial J}{\partial z^{(2)}}
$$

$$
\frac {\partial J}{\partial b^{(2)}}
=\frac {1}{m}\sum_{i=1}^{m} \frac {\partial J}{\partial z^{(2)}_{i}}
$$

$$
\frac {\partial J}{\partial a^{(1)}}
=\frac {\partial J}{\partial z^{(2)}}\cdot (W^{(2)})^T
$$

$$
\frac {\partial J}{\partial z^{(1)}}
=\frac {\partial J}{\partial a^{(1)}} * 1(z^{(1)}\geq 0)
$$

$$
\frac {\partial J}{\partial W^{(1)}}
=\frac {1}{m} (a^{(0)})^T\cdot \frac {\partial J}{\partial z^{(1)}}
$$

$$
\frac {\partial J}{\partial b^{(1)}}
=\frac {1}{m}\sum_{i=1}^{m} \frac {\partial J}{\partial z^{(1)}_{i}}
$$

假设批量数据大小为$m$,数据维数为$D$,网络层数为$L$($1,2,…,l,…,L$),输出类别为$C$

参考反向传导算法神经网络反向传播的数学原理,设每层输入向量为残差$\delta^{(l)}=\frac{\partial J(W, b)}{\partial z^{(l)}}$,用于表示该层对最终输出值的残差造成的影响;而最终输出值的残差$\delta^{(L)}$就是损失函数对输出层输入向量的梯度

前向传播执行步骤

  1. 层与层之间的操作就是输出向量和权值矩阵的加权求和以及对输入向量的函数激活(以relu为例

    $$
    z^{(l)} = a^{(l-1)}\cdot W^{(l)}+b^{(l)} \
    a^{(l)} = relu(z^{(l)})
    $$

  2. 输出层输出结果后,进行评分函数的计算,得到最终的计算结果(以softmax分类为例

    $$
    h(z^{(L)})
    =\begin{bmatrix}
    p(y_{1}=1) & \dots & p(y_{1}=C) \
    \vdots & \vdots & \vdots\
    p(y_{m}=1) & \dots & p(y_{m}=C)
    \end{bmatrix}
    =\begin{bmatrix}
    \frac {exp(z^{(2)}{1,1})}{\sum exp(z^{(2)}{1})} & \dots & \frac {exp(z^{(2)}{1,C})}{\sum exp(z^{(2)}{1})} \
    \vdots & \vdots & \vdots\
    \frac {exp(z^{(2)}{m,1})}{\sum exp(z^{(2)}{m})} & \dots & \frac {exp(z^{(2)}{m,C})}{\sum exp(z^{(2)}{m})}
    \end{bmatrix}
    $$

  3. 损失函数根据计算结果判断最终损失值(以交叉熵损失为例

    $$
    J(z^{(L)})=(-1)\sum_{i=1}^{m} \sum_{j=1}^{2}\cdot 1(y_{m,j}=1)\ln p(y_{m,j}=1)
    $$

反向传播执行步骤

  1. 计算损失函数对于输出层输入向量的梯度(最终层残差)

    $$
    \delta^{(L)}=
    \frac {\partial J}{\partial z^{(L)}}
    =\begin{bmatrix}
    p(y_{1}=1)-1(y_{1}=1) & \dots & p(y_{1}=C)-1(y_{1}=C)\
    \vdots & \vdots & \vdots\
    p(y_{m}=1)-1(y_{m}=1) & \dots & p(y_{m}=C)-1(y_{m}=C)
    \end{bmatrix}
    $$

  2. 计算中间隐藏层的残差值($L-1,L-2,…1$)

    $$
    \delta^{(l)}=
    \frac{\varphi J}{\varphi z^{(l)}}
    =(\frac{\varphi J}{\varphi z^{(l+1)}}\cdot \frac{\varphi z^{(l+1)}}{\varphi a^{(l)}})
    *\frac{\varphi a^{(l)}}{\varphi z^{(l)}}
    =(\delta^{(l+1)}\cdot (W^{(l+1)})^{T})
    *1(z^{(l)}\geq 0)
    $$

  3. 完成所有的可学习参数(权值矩阵和偏置向量)的梯度计算

    $$
    \nabla_{W^{(l)}} J(W, b)= \frac {1}{m} (a^{(l-1)})^{T}\cdot \delta^{(l)}\
    \nabla_{b^{(l)}} J(W, b)= \frac {1}{m}\sum_{i=1}^{m} \delta^{(l)}_{i}
    $$

  4. 更新权值矩阵和偏置向量

    $$
    W^{(l)}=W^{(l)}-\alpha\left[\nabla W^{(l)}+\lambda W^{(l)}\right] \
    b^{(l)}=b^{(l)}-\alpha \nabla b^{(l)}
    $$

相关阅读