# 神经网络中误差反向传播(back propagation)算法的工作原理

··4016 words·9 mins·
Academic Optimization CNN Math

$$$\label{eq1} f(x)=\frac{1}{1+\exp(-x)}$$$

$$$f'(x)=f(x)(1-f(x))$$$

\begin{align} z^{(l+1)} &= W^{(l)}a^{(l)}+b^{(l)} \label{eq3} \\ a^{(l+1)} &= f\left(z^{(l+1)}\right) \label{eq4} \end{align}

# 正式开始推导

$\left\{(x^{(i)}, y^{(i)})\right\}, i= 1,2,\ldots,m$

$$$\label{eq5} J\left(W,b;x^{(i)}, y^{(i)}\right) = \min_{W,b} \frac{1}{2}\left\lVert h\left(x^{(i)}\right)- y^{(i)} \right\rVert^2$$$

\begin{aligned} J(W, b) &= \frac{1}{m}\sum_{i=1}^{m}J\left(W,b;x^{(i)}, y^{(i)}\right) + \frac{\lambda}{2}\sum_{l=1}^{L-1}\left\lVert W^{(l)} \right\rVert_{F}^{2}\\ &= \frac{1}{m}\sum_{i=1}^{m}\frac{1}{2}\left\lVert h(x^{(i)}) - y^{(i)}\right\rVert^2 + \frac{\lambda}{2}\sum_{l=1}^{L-1}\left\lVert W^{(l)} \right\rVert_{F}^{2} \end{aligned}\label{eq6}

## 关于F范数的一点小知识

$$$\lVert A \rVert_F = \sqrt{\sum_{i=1}^{m}\sum_{j=1}^{n} (a_{ij})^2}$$$

$$$\label{eq8} \frac{\partial \lVert A \rVert_F^2 }{\partial A}= 2A$$$

# 抽丝剥茧，不断深入

\begin{align} W_{ij}^{(l)} &= W_{ij}^{(l)} - \alpha \frac{\partial J(W,b)}{\partial W_{ij}^{(l)}} \label{eq9} \\ b_i^{(l)} &= b_i^{(l)} - \alpha \frac{\partial J(W, b)}{\partial b_i^{(l)}} \label{10} \end{align}

\begin{align} \frac{\partial J(W, b) }{\partial W_{ij}^{(l)}} &= \left[\frac{1}{m}\sum_{i=1}^{m}\frac{\partial J\left(W, b; x^{(i)}, y^{(i)}\right)}{\partial W_{ij}^{(l)}} \right] + \lambda W_{ij}^{(l)} \label{eq11}\\ \frac{\partial J(W, b) }{\partial b_i^{(l)}} &= \frac{1}{m}\sum_{i=1}^{m}\frac{\partial J\left(W, b; x^{(i)}, y^{(i)}\right)}{\partial b_i^{(l)}} \label{eq12} \end{align}

# 计算辅助变量的值

\begin{aligned} \delta_{i}^{(L)}&=\frac{\partial}{\partial z_{i}^{(L)}}\left(\frac{1}{2}\left\lVert y - h(x) \right\rVert^2\right) \\ &= \frac{\partial}{\partial a_{i}^{(L)}}\left(\frac{1}{2}\left\lVert y - h(x) \right\rVert^2\right)\cdot \frac{\partial a_{i}^{(L)}}{\partial z_{i}^{(L)}}\\ &=\frac{1}{2}\left[ \frac{\partial}{\partial a_{i}^{(L)}}\sum_{j=1}^{S_L} \left(y_j - a_j^{(L)}\right)^2 \right] \cdot \frac{\partial a_{i}^{(L)}}{\partial z_{i}^{(L)}}\\ &=-\left(y_{i}-a_{i}^{(L)}\right)f'\left(z_{i}^{(L)}\right) \end{aligned}\label{eq13}

\begin{aligned} \delta_{i}^{(l)}&=\frac{\partial}{\partial z_{i}^{(l)}}J(W,b;x,y) \\ &=\sum_{j=1}^{s_{l+1}}\frac{\partial J}{\partial z_{j}^{(l+1)}}\cdot \frac{\partial z_{j}^{(l+1)}}{\partial z_{i}^{(l)}}\\ &=\sum_{j=1}^{s_{l+1}}\delta_{j}^{(l+1)}\cdot W_{ji}^{(l)}f'\left(z_{i}^{(l)}\right)\\ &=\left(\sum_{j=1}^{s_{l+1}} W_{ji}^{(l)}\delta_{j}^{(l+1)}\right)\cdot f'\left(z_{i}^{(l)}\right) \end{aligned}\label{eq14}

$$$\label{eq15} \frac{\partial \, obj}{\partial x}=\frac{\partial \, obj}{\partial P}\cdot \frac{\partial P}{\partial x}+\frac{\partial \, obj}{\partial Q}\cdot \frac{\partial Q}{\partial x}+\frac{\partial \, obj}{\partial R}\cdot \frac{\partial R}{\partial x}$$$

\begin{aligned} z_{j}^{(l+1)}&= \left [\sum_{i=1}^{s_l} W_{ji}^{(l)}\cdot a_{i}^{(l)} \right]+b_{j}^{(l)} \\ &=\left [\sum_{i=1}^{s_l} W_{ji}^{(l)}\cdot f\left(z_i^{(l)}\right)\right]+b_{j}^{(l)} \end{aligned}\label{eq16}

$$$\frac{\partial z_{j}^{(l+1)}}{\partial z_{i}^{(l)}}=W_{ji}^{(l)}f'(z_{i}^{(l)})$$$

# 计算误差相对于矩阵元素和偏置向量元素的偏导

\begin{aligned} \frac{\partial J}{\partial W_{ij}^{(l)}}&=\frac{\partial J}{\partial z_{i}^{(l+1)}}\cdot\frac{\partial z_{i}^{(l+1)}}{\partial W_{ij}^{(l)}} \\ &= \delta_{i}^{(l+1)}\cdot a_{j}^{(l)} \end{aligned}

\begin{aligned} \frac{\partial J}{\partial b_{i}^{(l)}}&=\frac{\partial J}{\partial z_{i}^{(l+1)}}\cdot\frac{\partial z_{i}^{(l+1)}}{\partial b_{i}^{(l)}}\\ &= \delta_{i}^{(l+1)}\cdot 1 \\ &= \delta_{i}^{(l+1)} \end{aligned}

# 向量化表示

$$$\delta^{(L)}=-\left(y-a^{(L)}\right)\cdot f'\left(z^{(L)}\right)$$$

$$$\delta^{(l)}=\left[\left(W^{(l)}\right)^{T}\delta^{(l+1)}\right]\cdot f'\left(z^{(l)}\right)$$$

\begin{align} \frac{\partial J}{\partial W^{(l)}} &= \delta^{(l+1)}\left(a^{(l)}\right)^{T}\\ \frac{\partial J}{\partial b^{(l)}} &= \delta^{(l+1)} \end{align}

# 把所有公式整合在一起

1. 初始化，对于所有层 ($$l=1,2,\cdots,L-1$$)，令 $$\Delta W^{(l)}=0$$, $$\Delta b^{(l)}=0$$,前一项是一个矩阵，后一项是一个向量，分别代表对权重矩阵以及偏置向量的更新量。

2. 对于一个batch的所有训练样本 (for i=1 to m)

• 使用误差反传计算 $$\nabla_{W^{(l)}}J\left(W,b;x^{(i)},y^{(i)}\right)$$$$\nabla_{b^{(l)}}J\left(W,b;x^{(i)},y^{(i)}\right)$$
• $$\Delta W^{(l)}:= \Delta W^{(l)}+\nabla_{W^{(l)}}J\left(W,b;x^{(i)},y^{(i)}\right)$$
• $$\Delta b^{(l)}:= \Delta b^{(l)}+\nabla_{b^{(l)}}J\left(W,b;x^{(i)},y^{(i)}\right)$$
3. 更新参数

\begin{align} W^{(l)} &= W^{(l)}-\alpha\left[\left(\frac{1}{m}\Delta W^{(l)}\right) + \lambda W^{(l)} \right] \\ b^{(l)} &= b^{(l)}-\alpha\left[\frac{1}{m}\Delta b^{(l)}\right] \end{align}

# 参考资料

