首页 > 其他 > 详细

BP神经网络

时间:2019-03-11 00:14:15      阅读:170      评论:0      收藏:0      [点我收藏+]

理论推导

神经网络通常第一层称为输入层,最后一层 \(L\) 被称为输出层,其他层 \(l\) 称为隐含层 \((1<l<L)\)

设输入向量为:

\(x = (x_1,x_2,...,x_i,...,x_m),\quad i = 1,2,...,m\)

输出向量为:

\(y = (y_1, y_2,...,y_k,...,y_n),\quad k = 1,2,...,n\)

\(l\)隐含层的输出为:

\(h^{(l)} = (h^{(l)}_1,h^{(l)}_2,...,h^{(l)}_i,...,h^{(l)}_{s_l}), \quad i = 1,2,...,s_l\)

其中:$ s_l $ 为第 \(l\) 层神经元的个数。

设$ W_{ij}^{(l)} $为第 \(l\) 层的神经元 \(i\) 与第 \(l-1\) 层神经元 \(j\) 的连接权值;$ b_i^{(l)} $为第 \(l\) 层神经元 \(i\) 的偏置,有:

\(h_i^{(l)} = f(net_i^{(l)})\)

\(net_i^{(l)} = \sum_{j=1}^{s_l - 1} W_{ij}^{(l)}h_j^{(l-1)} + b_i^{(l)}\)

其中,$ net_i^{(l)} $是第 \(l\) 层的第 \(i\) 个神经元的输入,\(f(x)\) 为神经元的激活函数:

\(f(x) = \frac{1}{1+e^{-x}} \quad f'(x) = f(x)(1-f(x))\)

算法推导-法一

\(m\) 个训练样本:\(\{(x(1),y(1)), (x(2),y(2)), (x(3), y(3)), ... ,(x(m), y(m))\}\) 期望

输出:\(d(i)\)

误差函数:
\[ E=\frac{1}{m}\sum_{i=1}^{m}E(i) \]
$ E(i) $是一个样本的训练误差:
\[ E(i) = \frac{1}{2}\sum^n_{k=1}(d_k(i) - y_k(i))^2\y_k(i) = h^{(L)}_k(i) \]
代入有:
\[ E(i) = \frac{1}{2m}\sum_{i=1}^{m}\sum^n_{k=1}(d_k(i) - y_k(i))^2 \]
权值更新:
\[ W_{ij}^{(l)} = W_{ij}^{(l)} - \alpha \frac{\partial E}{\partial W_{ij}^{(l)}} \]
偏置更新:
\[ b_{i}^{(l)} = b_{i}^{(l)} - \alpha \frac{\partial E}{\partial b_{i}^{(l)}} \]
其中:$ \alpha $ 是学习率。

对于单个样本,输出层的权值偏导为:
\[ \frac{\partial E(i)}{\partial W_{kj}^{(L)} } = \frac{\partial}{\partial W_{kj}^{(L)}}(\frac{1}{2}\sum^n_{k=1}(d_k(i) - y_k(i))^2)\= \frac{\partial}{\partial W_{kj}^{(L)}}(\frac{1}{2}(d_k(i) - y_k(i))^2)\= -(d_k(i) - y_k(i))\frac{\partial y_k(i)}{\partial W_{kj}^{(L)}}\= -(d_k(i) - y_k(i))\frac{\partial y_k(i)}{\partial net_k^{(L)}}\frac{\partial net_k^{(L)}}{\partial W_{kj}^{(L)}}\= -(d_k(i) - y_k(i))f'(x)|_{x=net_k^{(L)}}\frac{\partial net_k^{(L)}}{\partial W_{kj}^{(L)}}\= -(d_k(i) - y_k(i))f'(x)|_{x=net_k^{(L)}}h_j^{(L-1)}\\]
则:
\[ \frac{\partial E(i)}{\partial W_{kj}^{(L)} } =-(d_k(i) - y_k(i))f'(x)|_{x=net_k^{(L)}}h_j^{(L-1)} \]
同理有:
\[ \frac{\partial E(i)}{\partial b_k^{(L)} } =-(d_k(i) - y_k(i))f'(x)|_{x=net_k^{(L)}} \]
令:
\[ \delta_k^{(L)} = \frac{\partial E(i)}{\partial b_k^{(L)} } \]
则有:
\[ \frac{\partial E(i)}{\partial W_{kj}^{(L)} } = \delta_k^{(L)}h_j^{(L-1)} \]
对于隐含层 \(L-1\)
\[ \frac{\partial E(i)}{\partial W_{ji}^{(L-1)}} = \frac{\partial}{\partial W_{ji}^{(L-1)}}(\frac{1}{2}\sum_{k=1}^{n} (d_k(i) - y_k(i) )^2 )\= \frac{\partial}{\partial W_{ji}^{(L-1)}}(\frac{1}{2}\sum_{k=1}^{n} (d_k(i) - f(\sum_{j=1}^{s_{L-1} } W_{kj}^{(L)} h_j^{(L-1)} + b_k^{(L)} ))^2 )\= \frac{\partial}{\partial W_{ji}^{(L-1)}}(\frac{1}{2}\sum_{k=1}^{n} (d_k(i) - f(\sum_{j=1}^{s_{L-1} } W_{kj}^{(L)} f(\sum_{i=1}^{s_{L-2} } W_{ji}^{(L-1)} h_i^{(L-2)} + b_j^{(L-1)}) + b_k^{(L)} ))^2 )\= -\sum^n_{k=1}(d_k(i)-y_k(i))f(x)'|_{x=net_k^{(L)}}\frac{\partial net_k^{(L)}}{\partial W_{ji}^{(L-1)} }\\]
其中:
\[ net_k^{(L)} = \sum_{j=1}^{s_{L-1}} W_{kj}^{(L)}h_j^{(L-1)} + b_k^{(L)}\= \sum_{j=1}^{s_{L-1}} W_{kj}^{(L)} f(net_j^{(L-1)}) + b_k^{(L)}\= \sum_{j=1}^{s_{L-1}} W_{kj}^{(L)} f(\sum^{s_{L-2}}_{i=1} W_{ji}^{(L-1)} h_i^{(L-2)} + b_j^{(L-1)} )+ b_k^{(L)}\\]
代入有:
\[ \frac{\partial E(i)}{\partial W_{ji}^{(L-1)}} = -\sum^n_{k=1}(d_k(i)-y_k(i))f(x)'|_{x=net_k^{(L)}}\frac{\partial net_k^{(L)}}{\partial W_{ji}^{(L-1)} }\= -\sum^n_{k=1}(d_k(i)-y_k(i))f(x)'|_{x=net_k^{(L)}} \frac{\partial net_k^{(L)} }{\partial f(net_j^{(L-1)})} \frac{\partial f(net_j^{(L-1)})}{\partial net_j^{(L-1)}} \frac{\partial net_j^{(L-1)}}{\partial W_{ji}^{L-1} }\= -\sum^n_{k=1}(d_k(i)-y_k(i))f(x)'|_{x=net_k^{(L)}} W_{kj}^{(L)} f'(x)|_{x=net_j^{(L-1)}} h_i^{(L-2)} \\]
同理可得:
\[ \frac{\partial E(i)}{\partial b_j^{(L-1)}} = -\sum^n_{k=1}(d_k(i)-y_k(i))f(x)'|_{x=net_k^{(L)}} W_{kj}^{(L)} f'(x)|_{x=net_j^{(L-1)}} \\]
令:
\[ \delta_j^{(L-1)} = \frac{\partial E(i)}{\partial b_j^{(L-1)}} \]
有:
\[ \delta_j^{(L-1)} = -\sum^n_{k=1}(d_k(i)-y_k(i))f(x)'|_{x=net_k^{(L)}} W_{kj}^{(L)} f'(x)|_{x=net_j^{(L-1)}} \= \sum^n_{k=1}\delta_k^{(L)} W_{kj}^{(L)} f'(x)|_{x=net_j^{(L-1)}}\\]

\[ \frac{\partial E(i)}{\partial W_{ji}^{(L-1)}} = \delta_j^{(L-1)}h_i^{(L-2)} \]

由此可得,第 \(l(1<l<L)\) 层的权值和偏置的偏导为:
\[ \frac{\partial E(i)}{\partial W_{ji}^{(l)}} = \delta_j^{(l)}h_i^{(l-1)}\\frac{\partial E(i)}{\partial b_j^{(l)}} = \delta_j^{(l)} \\delta_j^{(l)} = \sum_{k=1}^{s_{l+1}} \delta_k^{(l+1)} W_{kj}^{(l+1)}f'(x)|_{x=net_j^{(l)}}\\]

算法推导-法二

\[ \frac{\partial E(i)}{\partial W_{kj}^{(L)} } = \frac{\partial E(i)}{\partial h_k^{(L)}} \frac{\partial h_k^{(L)}}{\partial net_k^{(L)}} \frac{\partial net_k^{(L)}}{\partial W_{kj}^{(L)}}\= -(d_k(i) - y_k(i))f'(x)|_{x=net_k^{(L)}}h_j^{(L-1)}\\]

则:
\[ \frac{\partial E(i)}{\partial W_{kj}^{(L)} } =-(d_k(i) - y_k(i))f'(x)|_{x=net_k^{(L)}}h_j^{(L-1)} \]
对偏置向量求偏导:
\[ \frac{\partial E(i)}{\partial b_k^{(L)} } = \frac{\partial E(i)}{\partial h_k^{(L)}} \frac{\partial h_k^{(L)}}{\partial net_k^{(L)}} \frac{\partial net_k^{(L)}}{\partial b_k^{(L)}}\= -(d_k(i) - y_k(i))f'(x)|_{x=net_k^{(L)}}\\]
则:
\[ \frac{\partial E(i)}{\partial b_k^{(L)} } =-(d_k(i) - y_k(i))f'(x)|_{x=net_k^{(L)}} \]
令:
\[ \delta_k^{(L)} = \frac{\partial E(i)}{\partial b_k^{(L)} } \]
则有:
\[ \frac{\partial E(i)}{\partial W_{kj}^{(L)} } = \delta_k^{(L)}h_j^{(L-1)} \]

隐含层:

对权值矩阵求偏导:
\[ \frac{\partial E(i)}{\partial W_{ji}^{(L-1)} } = \frac{\partial E(i)}{\partial h_k^{(L)}} \frac{\partial h_k^{(L)}}{\partial net_k^{(L)}} \frac{\partial net_k^{(L)}}{\partial h_j^{(L-1)}} \frac{\partial h_j^{(L-1)}}{\partial net_j^{(L-1)}} \frac{\partial net_j^{(L-1)}}{\partial W_{ji}^{(L-1)}}\= -\sum^n_{k=1}(d_k(i)-y_k(i))f(x)'|_{x=net_k^{(L)}} W_{kj}^{(L)} f'(x)|_{x=net_j^{(L-1)}} h_i^{(L-2)} \\]
对偏置向量求偏导:
\[ \frac{\partial E(i)}{\partial b_j^{(L-1)} } = \frac{\partial E(i)}{\partial h_k^{(L)}} \frac{\partial h_k^{(L)}}{\partial net_k^{(L)}} \frac{\partial net_k^{(L)}}{\partial h_j^{(L-1)}} \frac{\partial h_j^{(L-1)}}{\partial net_j^{(L-1)}} \frac{\partial net_j^{(L-1)}}{\partial b_j^{(L-1)}}\= -\sum^n_{k=1}(d_k(i)-y_k(i))f(x)'|_{x=net_k^{(L)}} W_{kj}^{(L)} f'(x)|_{x=net_j^{(L-1)}} \\]

推导心得

  • 反向传播形象上是从后向前传播,利用后边的信息更新前面的参数。
  • 从数学上讲是链式法则,就像链表一样,推导时根据变量的关系,相距较远的参数需要通过中间参数来传递关系。
  • 通过将中间关系明确出来,有利于进行数学推导和代码的实现。
  • 对带有求和符号求偏导时,关注变量的角标变化,如 $\frac{\partial net_j^{(L)}}{\partial W_{ji}^{L} } $ 中的 $ W_{ji}^{L} $ 的 $ ji $ 是变化的,则求导时就不能对其进行赋值,否则求导就是错误的。

BP神经网络

原文:https://www.cnblogs.com/niubidexiebiao/p/10508145.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!