LN训练过程
- 计算网络层的均值与方差
\[\mu^{(l)} = \frac{\Sigma_{i=1}^{n^{(l)}}x_i^{(l)}}{n^{(l)}}
\]
\[\sigma^{(l)^2} = \frac{\Sigma_{i=1}^{n^{(l)}}(x_i^{(l)}-\mu^{(l)})^2}{n^{(l)}}
\]
- 进行归一化
\[x_i^{(l)} = \frac{x_i^{(l)}-\mu^{(l)}}{\sqrt{\sigma^{(l)^2}+\epsilon}}
\]
- 设置可训练的缩放及偏移
\[y_i^{(l)} = \gamma x_i^{(l)}+\beta
\]