Skip to main content

# 神经网络中误差反向传播(back propagation)算法的工作原理

··4016 words·9 mins·
Academic Optimization CNN Math

$$$\label{eq1} f(x)=\frac{1}{1+\exp(-x)}$$$

$$$f'(x)=f(x)(1-f(x))$$$

\begin{align} z^{(l+1)} &= W^{(l)}a^{(l)}+b^{(l)} \label{eq3} \\ a^{(l+1)} &= f\left(z^{(l+1)}\right) \label{eq4} \end{align}

# 正式开始推导

$\left\{(x^{(i)}, y^{(i)})\right\}, i= 1,2,\ldots,m$

$$$\label{eq5} J\left(W,b;x^{(i)}, y^{(i)}\right) = \min_{W,b} \frac{1}{2}\left\lVert h\left(x^{(i)}\right)- y^{(i)} \right\rVert^2$$$

\begin{aligned} J(W, b) &= \frac{1}{m}\sum_{i=1}^{m}J\left(W,b;x^{(i)}, y^{(i)}\right) + \frac{\lambda}{2}\sum_{l=1}^{L-1}\left\lVert W^{(l)} \right\rVert_{F}^{2}\\ &= \frac{1}{m}\sum_{i=1}^{m}\frac{1}{2}\left\lVert h(x^{(i)}) - y^{(i)}\right\rVert^2 + \frac{\lambda}{2}\sum_{l=1}^{L-1}\left\lVert W^{(l)} \right\rVert_{F}^{2} \end{aligned}\label{eq6}

## 关于F范数的一点小知识

$$$\lVert A \rVert_F = \sqrt{\sum_{i=1}^{m}\sum_{j=1}^{n} (a_{ij})^2}$$$

$$$\label{eq8} \frac{\partial \lVert A \rVert_F^2 }{\partial A}= 2A$$$

# 抽丝剥茧，不断深入

\begin{align} W_{ij}^{(l)} &= W_{ij}^{(l)} - \alpha \frac{\partial J(W,b)}{\partial W_{ij}^{(l)}} \label{eq9} \\ b_i^{(l)} &= b_i^{(l)} - \alpha \frac{\partial J(W, b)}{\partial b_i^{(l)}} \label{10} \end{align}

\begin{align} \frac{\partial J(W, b) }{\partial W_{ij}^{(l)}} &= \left[\frac{1}{m}\sum_{i=1}^{m}\frac{\partial J\left(W, b; x^{(i)}, y^{(i)}\right)}{\partial W_{ij}^{(l)}} \right] + \lambda W_{ij}^{(l)} \label{eq11}\\ \frac{\partial J(W, b) }{\partial b_i^{(l)}} &= \frac{1}{m}\sum_{i=1}^{m}\frac{\partial J\left(W, b; x^{(i)}, y^{(i)}\right)}{\partial b_i^{(l)}} \label{eq12} \end{align}

# 计算辅助变量的值

\begin{aligned} \delta_{i}^{(L)}&=\frac{\partial}{\partial z_{i}^{(L)}}\left(\frac{1}{2}\left\lVert y - h(x) \right\rVert^2\right) \\ &= \frac{\partial}{\partial a_{i}^{(L)}}\left(\frac{1}{2}\left\lVert y - h(x) \right\rVert^2\right)\cdot \frac{\partial a_{i}^{(L)}}{\partial z_{i}^{(L)}}\\ &=\frac{1}{2}\left[ \frac{\partial}{\partial a_{i}^{(L)}}\sum_{j=1}^{S_L} \left(y_j - a_j^{(L)}\right)^2 \right] \cdot \frac{\partial a_{i}^{(L)}}{\partial z_{i}^{(L)}}\\ &=-\left(y_{i}-a_{i}^{(L)}\right)f'\left(z_{i}^{(L)}\right) \end{aligned}\label{eq13}

\begin{aligned} \delta_{i}^{(l)}&=\frac{\partial}{\partial z_{i}^{(l)}}J(W,b;x,y) \\ &=\sum_{j=1}^{s_{l+1}}\frac{\partial J}{\partial z_{j}^{(l+1)}}\cdot \frac{\partial z_{j}^{(l+1)}}{\partial z_{i}^{(l)}}\\ &=\sum_{j=1}^{s_{l+1}}\delta_{j}^{(l+1)}\cdot W_{ji}^{(l)}f'\left(z_{i}^{(l)}\right)\\ &=\left(\sum_{j=1}^{s_{l+1}} W_{ji}^{(l)}\delta_{j}^{(l+1)}\right)\cdot f'\left(z_{i}^{(l)}\right) \end{aligned}\label{eq14}

$$$\label{eq15} \frac{\partial \, obj}{\partial x}=\frac{\partial \, obj}{\partial P}\cdot \frac{\partial P}{\partial x}+\frac{\partial \, obj}{\partial Q}\cdot \frac{\partial Q}{\partial x}+\frac{\partial \, obj}{\partial R}\cdot \frac{\partial R}{\partial x}$$$

\begin{aligned} z_{j}^{(l+1)}&= \left [\sum_{i=1}^{s_l} W_{ji}^{(l)}\cdot a_{i}^{(l)} \right]+b_{j}^{(l)} \\ &=\left [\sum_{i=1}^{s_l} W_{ji}^{(l)}\cdot f\left(z_i^{(l)}\right)\right]+b_{j}^{(l)} \end{aligned}\label{eq16}

$$$\frac{\partial z_{j}^{(l+1)}}{\partial z_{i}^{(l)}}=W_{ji}^{(l)}f'(z_{i}^{(l)})$$$

# 计算误差相对于矩阵元素和偏置向量元素的偏导

\begin{aligned} \frac{\partial J}{\partial W_{ij}^{(l)}}&=\frac{\partial J}{\partial z_{i}^{(l+1)}}\cdot\frac{\partial z_{i}^{(l+1)}}{\partial W_{ij}^{(l)}} \\ &= \delta_{i}^{(l+1)}\cdot a_{j}^{(l)} \end{aligned}

\begin{aligned} \frac{\partial J}{\partial b_{i}^{(l)}}&=\frac{\partial J}{\partial z_{i}^{(l+1)}}\cdot\frac{\partial z_{i}^{(l+1)}}{\partial b_{i}^{(l)}}\\ &= \delta_{i}^{(l+1)}\cdot 1 \\ &= \delta_{i}^{(l+1)} \end{aligned}

# 向量化表示

$$$\delta^{(L)}=-\left(y-a^{(L)}\right)\cdot f'\left(z^{(L)}\right)$$$

$$$\delta^{(l)}=\left[\left(W^{(l)}\right)^{T}\delta^{(l+1)}\right]\cdot f'\left(z^{(l)}\right)$$$

\begin{align} \frac{\partial J}{\partial W^{(l)}} &= \delta^{(l+1)}\left(a^{(l)}\right)^{T}\\ \frac{\partial J}{\partial b^{(l)}} &= \delta^{(l+1)} \end{align}

# 把所有公式整合在一起

1. 初始化，对于所有层 ($$l=1,2,\cdots,L-1$$)，令 $$\Delta W^{(l)}=0$$, $$\Delta b^{(l)}=0$$,前一项是一个矩阵，后一项是一个向量，分别代表对权重矩阵以及偏置向量的更新量。

2. 对于一个batch的所有训练样本 (for i=1 to m)

• 使用误差反传计算 $$\nabla_{W^{(l)}}J\left(W,b;x^{(i)},y^{(i)}\right)$$$$\nabla_{b^{(l)}}J\left(W,b;x^{(i)},y^{(i)}\right)$$
• $$\Delta W^{(l)}:= \Delta W^{(l)}+\nabla_{W^{(l)}}J\left(W,b;x^{(i)},y^{(i)}\right)$$
• $$\Delta b^{(l)}:= \Delta b^{(l)}+\nabla_{b^{(l)}}J\left(W,b;x^{(i)},y^{(i)}\right)$$
3. 更新参数

\begin{align} W^{(l)} &= W^{(l)}-\alpha\left[\left(\frac{1}{m}\Delta W^{(l)}\right) + \lambda W^{(l)} \right] \\ b^{(l)} &= b^{(l)}-\alpha\left[\frac{1}{m}\Delta b^{(l)}\right] \end{align}

# 参考资料

## Related

How to Calculate Square Root without Using sqrt()?
··748 words·4 mins
Programming Optimization Math
Similarity Measurement in Image Retrieval
··482 words·3 mins
Academic Metric Math Retrieval

··359 words·1 min
Academic Optimization