首页 > 其他 > 详细

高斯分布 | 推导 | 笔记

时间:2020-02-03 17:08:25      阅读:80      评论:0      收藏:0      [点我收藏+]

博客部分公式有格式问题:请前往语雀 https://www.yuque.com/leesamoyed/bvsayi/pd6nte#V4xDr

Linear Gaussian Model(线性高斯模型)
Kalman Fillter(卡诺曼滤波)
高斯分布非常重要!!!

一、极大似然估计:

\(X:(x_1,x_2,...,x_n)^T = \begin{gathered} \begin{pmatrix} x_1\\x_2\\...\\x_n \end{pmatrix} \end{gathered}\)
\(x_i \epsilon R^p \quad x_i \mathop{\sim}^{iid} N(\mu,\Sigma) \quad \theta = (\mu,\Sigma)\)
其中,iid:独立同分布
\(MLE:\theta_{MLE} = \arg\max\limits_{\theta} P(X|\theta)\)
\(p=1,\theta=(\mu,\sigma^2)\)
\(p=1\)方便计算)
一维:
\(p(x) = \frac {1}{\sqrt{2\pi}\sigma}exp(-\frac{(x-\mu)^2}{2\sigma^2})\)
多维(假设是\(p\)维):
\(p(x) = \frac {1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}exp(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu))\)


那么我们可以写出他的似然函数:

\(\log P(X|\theta) = \log \mathop \Pi \limits_{i=1} \limits^{N} p(x_i|\theta) = \mathop \Pi \limits_{i=1} \limits^{N} \log p(x_i|\theta) \\ =\mathop \Sigma \limits_{i=1} \limits^{N}\log\frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(x-\mu)^2}{2\sigma^2}) \\ =\mathop \Sigma \limits_{i=1} \limits^{N}[\log\frac{1}{\sqrt{2\pi}}+log{\frac{1}{\sigma}}-\frac{(x_i-\mu)^2}{2\sigma^2} ]\)
那么\(\mu\)估计:
\(\mu_{MLE} = \arg\max\limits_{\mu}\log p(X|\theta)\\=\arg\max \mathop \Sigma \limits_{i=1}\limits^{N}-\frac{(x_i-\mu)^2}{2\sigma^2}\\=\arg\min\limits_{\mu} \mathop \Sigma \limits_{i=1}\limits^{N}(x_i-\mu)^2\)
\(\mu\)进行求导:
\(\frac{\partial}{\partial\mu}\Sigma(x_i-\mu)^2 = \mathop \Sigma \limits_{i-1}\limits^{N} 2\cdot(x_i-\mu) \cdot (-1)=0\\\mathop \Sigma \limits_{i=1}\limits^{N}(x_i-\mu)=0\\\mathop \Sigma \limits_{i=1}\limits^{N}x_i -\mathop \Sigma \limits_{i=1}\limits^{N} \mu =0\\ \mu_{MLE} = \frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}x_i\)
无偏估计\(E[\mu_{MLE}] \\ = \frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}E[x_i] \\ = \frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}\mu \\ = \frac{1}{N}N\cdot\mu \\ =\mu\)


那么\(\sigma^2\)估计:
\(\sigma^2_{MLE} = \arg\max\limits_{\sigma} p(X|\sigma) \\ =\arg\max \Sigma \begin{matrix} \underbrace{ (-\log\sigma - \frac{1}{2\sigma^2}(x_i-\mu)^2) } \\ \alpha \end{matrix}\)
求导:
\(\frac{\partial\alpha}{\partial\sigma} = \mathop \Sigma \limits_{i=1}\limits^{N}[-\frac{1}{\sigma}+\frac{1}{2}(x_i-\mu)^2 \cdot (+2)\sigma^{-3}] = 0\\\mathop \Sigma \limits_{i=1}\limits^{N}[-\frac{1}{\sigma}+ (x_i-\mu)^2 \cdot \sigma^{-3}] = 0\\\mathop \Sigma \limits_{i=1}\limits^{N}[-\sigma^2 + (x_i-\mu)^2] = 0\\-\mathop \Sigma \limits_{i=1}\limits^{N}\sigma^2 + \mathop \Sigma\limits_{i-1}\limits^{N}(x_i-\mu)^2 = 0\\\mathop \Sigma \limits_{i=1}\limits^{N}\sigma^2 = \mathop \Sigma \limits_{i-1}\limits^{N}(x_i-\mu)^2\\\sigma^2_{MLE} = \frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}(x_i-\mu)^2\)
\(\sigma^2_{MLE} = \frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}(x_i-\mu_{MLE})^2\)
有偏估计
实际上:\(E[\sigma^2_{MLE}] = \frac{N-1}{N}\sigma^2\)
然而真正的无偏是这样的:\(\widehat{\sigma} = \frac{1}{N-1}\mathop \Sigma \limits_{i=1}\limits^{N}(x_i-\mu_{MLE})^2\)

二、无偏&有偏:

无偏估计:\(\mu_{MLE} = \frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}x_i\)
有偏估计:\(\sigma^2_{MLE} = \frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}(x_i-\mu)^2\)
其中:\(x_i \sim N(\mu,\sigma^2)\)
有偏估计和无偏估计的判断:\(E[T(x)] = T(x)\)例如:\(E[\widehat{\mu}] = \mu \quad E[\widehat{\sigma}] = \sigma\)相等则无偏,不相等则有偏


\(E[\mu_{MLE}] = E[\frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}x_i] = \frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}E[x_i] = \frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}\mu = \mu\)所以无偏差

那么有偏差其实是看:
\(E[\sigma^2_{MLE}] \mathop = \limits^? \sigma^2\)
\(\sigma^2_{MLE} = \frac{1}{N}\mathop \Sigma \limits_{i-1}\limits^{N}(x_i-\mu_{MLE})^2= \frac{1}{N}\mathop \Sigma \limits_{i-1}\limits^{N}(x_i^2 - 2x_i\mu_{MLE} + \mu^2_{MLE}) =\frac{1}{N}\mathop \Sigma \limits_{i-1}\limits^{N}x_i^2 - \begin{matrix}\underbrace{\frac{1}{N}\mathop \Sigma \limits_{i-1}\limits^{N}2x_i\mu_{MLE}} \\ 2\cdot\mu_{MLE}^2\end{matrix}- \begin{matrix}\underbrace{\frac{1}{N}\mathop \Sigma \limits_{i-1}\limits^{N}\mu^2_{MLE}} \\ \mu^2_{MLE}\end{matrix}\\=\frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}x_i^2 -\mu^2_{MLE}\)
\(E[\sigma^2_{MLE}] = E[\frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}x_i^2 -\mu^2_{MLE}]= E[\frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}x_i^2 -\mu^2 - (\mu^2_{MLE}- \mu^2)] \\\begin{matrix}\underbrace{=E[\frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}x_i^2 -\mu^2]} \\ \underbrace{E[\frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}(x_i^2 -\mu^2)]}\\\underbrace{\frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}E(x_i^2 -\mu^2)}\\\underbrace{\frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}(E(x_i^2) -\mu^2)}\\ \underbrace{\frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N} Var(x_i)}\\\underbrace{\frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N} \sigma^2}\\\sigma^2\end{matrix}\begin{matrix}\underbrace{-E[(\mu^2_{MLE}- \mu^2)]}\\ \underbrace{E[\mu^2_{MLE}]- E[\mu^2]}\\ \underbrace{E[\mu^2_{MLE}]- \mu^2}\\ \underbrace{E[\mu^2_{MLE}]- \mu_{MLE}^2}\\ \underbrace{Var(\mu_{MLE})}\\ \frac{1}{N}\sigma^2\end{matrix}=\frac{N-1}{N}\sigma^2\)
利用极大似然估计对于高斯分布:方差估计小了;
其中对于\(Var(\mu_{MLE})\)
有:\(Var[\mu_{MLE}] = Var[\frac{1}{N}\mathop \Sigma \limits_{i=1}\limits^{N}x_i] = \frac{1}{N^2}\mathop \Sigma \limits_{i=1}\limits^{N}Var(x_i) = \frac{1}{N^2}\mathop \Sigma \limits_{i=1}\limits^N \sigma^2 = \frac{1}{N^2}\cdot N \cdot \sigma^2 = \frac{\sigma}{N}\)

三、从概率密度(PDF)角度观察:

\(x \sim (\mu,\Sigma) = p(x) = \frac {1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}exp(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu))\)
其中(一般\(\Sigma\)是半正定,在这里假设是正定的):
\(x_i \epsilon R^p \quad r.v\)
\(x:(x_1,x_2,...,x_n)^T = \begin{gathered} \begin{pmatrix} x_1\\x_2\\...\\x_p \end{pmatrix} \end{gathered}\) \(\mu=\begin{gathered} \begin{pmatrix} \mu_1\\\mu_2\\...\\\mu_p \end{pmatrix} \end{gathered}\) \(\Sigma=\begin{gathered} \begin{pmatrix} \sigma_{11} \quad \sigma_{12} \quad ... \quad \sigma_{1p} \\ \sigma_{21} \quad \sigma_{22} \quad ... \quad \sigma_{2p} \\ ... \quad ... \quad ... \quad ... \\ \sigma_{p1} \quad \sigma_{p2} \quad ... \quad \sigma_{pp} \end{pmatrix} \end{gathered}_{p×p}\)
主要研究式子中与\(x\)有关的部分即:\((x-\mu)^T\Sigma^{-1}(x-\mu)\) 可以看作是马氏距离(\(x\)\(\mu\)之间)
当然最后得到的:\((1×p)×(p×p)×(p×1)=1×1\) 其实就是一个数(毕竟是概率密度)

\((x-\mu)^T\Sigma^{-1}(x-\mu)\) 可以看作是马氏距离(\(x\)\(\mu\)之间)(当\(\Sigma\)是单位矩阵,马氏距离等于欧式距离)


\(\Sigma = U \wedge U^T\)(特征分解), \(UU^T=U^TU=I\)\(\wedge = diag(\lambda_i) \quad i=1,...,p \quad U=(U_1,U_2,...,U_p)_{p×p}\)

\(=(u_1 \quad u_2 \quad ... \quad u_p) \begin{gathered} \begin{pmatrix} \lambda_1 \quad ... \quad ... \quad ... \\ ... \quad \lambda_2 \quad ... \quad ... \\ ... \quad ... \quad ... \quad ... \\ ... \quad ... \quad ... \quad \lambda_p \end{pmatrix} \end{gathered}\)
\(=(u1\lambda_1 \quad u2\lambda_2 \quad ... \quad up\lambda_p) \begin{gathered} \begin{pmatrix} u_1^T \\u_2^T \\... \\u_p^T \end{pmatrix} \end{gathered}\)
\(=\)\(\mathop \Sigma \limits_{i=1} \limits^{p}u_i\lambda_iu_i^T\)
\(\Sigma^{-1}=(U^T\wedge U)^{-1}=(U^T)^{-1}\wedge^{-1}U^{-1}=U\wedge^{-1}U^T\)其中:\(\wedge^{-1}=diag(\frac{1}{\lambda_i})\)\(i=1,...,p\)
\(=\mathop \Sigma \limits_{i=1} \limits^{p}u_i\frac{1}{\lambda_i}u_i^T\)
\((x-\mu)^T \Sigma^{-1}(x-\mu)\quad = \quad (x-\mu)^T\mathop \Sigma \limits_{i=1} \limits^{p}u_i\frac{1}{\lambda_i}u_i^T(x-\mu)\quad =\quad \mathop \Sigma \limits_{i=1} \limits^{p}(x-\mu)^Tu_i\frac{1}{\lambda_i}u_i^T(x-\mu)\)
令:\(y_i= \begin{gathered} \begin{pmatrix} y_1\\ y_2\\ ...\\ y_p \end{pmatrix} \end{gathered}\) \(y_i=(x-\mu)^Tu_i \quad = \quad \mathop \Sigma \limits_{i=1} \limits^{p}y_i\frac{1}{\lambda_i}y_i^T \quad = \quad \mathop \Sigma \limits_{i=1} \limits^{p}\frac{y_i^2}{\lambda_i}\)


\(p=2\)\(\Delta=(x-\mu)^T \Sigma^{-1}(x-\mu)\)
\(\Delta=\frac{y_1^2}{\lambda_1} + \frac{y_2^2}{\lambda_2}=1\)
(P.S 这一章节理解起来巨麻烦)

四、局限性:

  1. 参数个数:\(\Sigma_{p×p} \rightarrow \frac{p^2-p}{2}+p=\frac{p^2+p}{2}=O(p^2)\) 参数太多计算会太复杂
    \(\Sigma \rightarrow\)对角矩阵
  2. 用一个高斯分布没法表达模型(无法确切表达模型)(多个高斯混合模型);

五、求边缘概率及条件概率:

高维高斯分布推到最复杂的地方章节!!!(要敏感)
已知:\(x=\begin{gathered}\begin{pmatrix}x_a\\x_b\end{pmatrix}\end{gathered}\begin{gathered}\begin{matrix}\rightarrow m\\\rightarrow n\end{matrix}\end{gathered}\) \(m+n=p\) \(u=\begin{gathered}\begin{pmatrix}u_a\\u_b\end{pmatrix}\end{gathered}\)
\(\Sigma=\begin{gathered}\begin{pmatrix}\Sigma_{aa} \quad \Sigma_{ab}\\\Sigma_{ba} \quad \Sigma_{bb}\end{pmatrix}\end{gathered}\)
求:\(P(x_a)\)\(P(x_b|x_a)\)\(P(x_b)\)\(P(x_a|x_b)\)用对称性):
通用方法:配方法;


记住一个定理:
已知: \(X \sim N(\mu,\Sigma)\) \(y=AX+B\)
结论: \(y \sim N(A\mu+B,A\Sigma A^T) \quad\begin{gathered}\begin{matrix}E[y]=E[Ax+B]\\ =AE[x]+B\\ =A\mu+B \end{matrix}\end{gathered}\quad\begin{gathered}\begin{matrix}Var[y]=Var[Ax+B]\\=Var[Ax]+Var[B]\\=A\cdot \Sigma \cdot A^T\end{matrix}\end{gathered}\)
关于\(A\)\(A^T\)先后顺序,用矩阵维度去推理就可以


现在定义:\(x_a=\begin{gathered}\begin{matrix}\underbrace{\begin{pmatrix}I_m&0_n\end{pmatrix}}\\ A\end{matrix}\end{gathered}\begin{gathered}\begin{matrix}\underbrace{\begin{pmatrix}x_a\\ x_b\end{pmatrix}}\\ x\end{matrix}\end{gathered}+0\)\begin{gathered}\begin{pmatrix}I_m&0\end{pmatrix}\end{gathered}\begin{gathered}\begin{pmatrix}\mu_a\\mu_b\end{pmatrix}\end{gathered}\begin{gathered}\begin{pmatrix}I_m & 0\end{pmatrix}\end{gathered}\begin{gathered}\begin{pmatrix}\Sigma_{aa} & \Sigma_{ab}\ \Sigma_{ba} & \Sigma_{bb}\end{pmatrix}\end{gathered}\begin{gathered}\begin{pmatrix}\Sigma_{aa} & \Sigma{ab}\end{pmatrix}\end{gathered}\begin{gathered}\begin{pmatrix}I_m \ 0\end{pmatrix}\end{gathered}对于\(x_b|x_a\),我们可以先定义:\(x_{b \cdot a}=x_b - \Sigma_{ba} \Sigma_{aa}^{-1}x_a\)
(大量实践)\(\mu_{b \cdot a}=\mu_b - \Sigma_{ba} \Sigma_{aa}^{-1}\mu_a\)(schur complement)\begin{gathered}\begin{matrix}\underbrace{\begin{pmatrix}-\Sigma_{ba} \Sigma_{aa}^{-1} & I \end{pmatrix}}\ A \end{matrix}\end{gathered}\begin{gathered}\begin{matrix}\underbrace{\begin{pmatrix}x_a \ x_b\end{pmatrix}}\ x \end{matrix}\end{gathered}\(E[x_{b \cdot a}] =\begin{gathered}\begin{pmatrix}-\Sigma_{ba} \Sigma_{aa}^{-1} & I \end{pmatrix}\end{gathered}\begin{gathered}\begin{pmatrix}x_a \\ x_b\end{pmatrix}\end{gathered}=\mu_a\Sigma_{ba}\Sigma_{aa}^{-1}\mu_a=\mu_{b \cdot a}\)\begin{gathered}\begin{pmatrix}-\Sigma_{ba} \Sigma_{aa}^{-1} & I\end{pmatrix}\end{gathered}\begin{gathered}\begin{pmatrix}\Sigma_{aa} & \Sigma_{ab} \ \Sigma_{ba} & \Sigma_{bb}\end{pmatrix}\end{gathered}\begin{gathered}\begin{pmatrix}\Sigma_{aa} &\Sigma_{ab} \ \Sigma_{ba} & \Sigma_{bb}\end{pmatrix}\end{gathered}\begin{gathered}\begin{pmatrix}\Sigma_{aa}^{-1} & \Sigma_{ba}^{-1} \ I\end{pmatrix}\end{gathered}\begin{gathered}\begin{pmatrix}\Sigma_{ba}\Sigma_{aa}^{-1}\Sigma_{aa}+\Sigma_{ba} & \Sigma_{ba}\Sigma_{aa}^{-1}\Sigma_{ab}+\Sigma_{bb}\end{pmatrix}\end{gathered}\begin{gathered}\begin{pmatrix}-\Sigma_{aa}^{-1} & \Sigma_{ba}^{-1} \ I\end{pmatrix}\end{gathered}\begin{gathered}\begin{pmatrix}0 & \Sigma_{bb}-\Sigma_{ba}\Sigma_{aa}^{-1}\Sigma_{ab}\end{pmatrix}\end{gathered}\begin{gathered}\begin{pmatrix}-\Sigma_{aa}^{-1} &\Sigma_{ba}^{-1} \ I\end{pmatrix}\end{gathered}所以\(x_{b \cdot a} \sim N(\mu_{b \cdot a},\Sigma_{b \cdot b \cdot a})\)
求:\(x_b|x_a\)
其中:\(x_b = x_{b \cdot a}+ \Sigma_{ba}\Sigma_{aa}^{-1}x_a\)
所以:\(E[x_b|x_a] = \mu_{b \cdot a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_a\) \(Var[x_b|x_a] = Var[x_{b \cdot a}]=\Sigma_{b \cdot b \cdot a}\)
最后得到:\(x_b|x_a \sim N(\mu_{b \cdot a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_a,\Sigma_{b \cdot b \cdot a})\) $x_a|x_b $把其中a换b,b换a就行

六、求联合概率分布:

已知:\(p(x)=N(x|\mu,\wedge^{-1}) \quad p(y|x)=N(y|Ax+b,L^{-1})\)
求:\(p(y),p(x|y)\) 已知:解:\(y=Ax+b+\varepsilon \quad \varepsilon \sim N(0,L^{-1})\)


①第一步:

$
\(y \sim N(A\mu+b,A\cdot \wedge^{-1} \cdot A^T +L^{-1})\)


②第二步:
\(Z=\begin{gathered}\begin{pmatrix} x\\y\end{pmatrix}\end{gathered} \sim N( \begin{gathered}\begin{matrix}\underbrace{\begin{bmatrix} \mu \\ A\mu+b\end{bmatrix}}\\E[z] \end{matrix}\end{gathered}, \begin{gathered}\begin{bmatrix} \wedge^{-1} & O \\O & L^{-1}+A\wedge^{-1}A^T \end{bmatrix}\end{gathered} )\),我们缺少\(O\)(圆圈)的部分,实质上两个圆圈缺的内容是一样的(本质一个圆圈内容是另一个圆圈内容转置,但是他们互为转置),我们定义圆圈的内容为\(\Delta\)\(\Delta=Cov(x,y)\\=E[(x-E[x]) \cdot (y-E[y])]^T\\=E[(x-\mu)(y-A\mu-b)]^T\\=E[(x- \mu)(Ax+b+\varepsilon-A/mu-b)^T]\\=E[(x-\mu)(Ax-A\mu+\varepsilon)^T]\\=E[(x-\mu)(Ax-A/mu)^T+(x-\mu)\varepsilon^T]\\=E[(x-\mu)(Ax-A/mu)^T]+\begin{gathered}\begin{matrix}\underbrace{E[(x-\mu)\varepsilon^T]}\\x\bot\varepsilon\\x-\mu\bot\varepsilon\\ E[(x-\mu)]E[\varepsilon] =0\end{matrix}\end{gathered} \\=E[(x-u)(Ax-A\mu^T)]\\=E[(x-\mu)(x-\mu)^T\cdot A^T]\\=E[(x-u)(Ax-A\mu^T)]\\=E[(x-\mu)(x-\mu)^T]\cdot A^T\\=Var[x] /cdot A^T \\=\wedge^{-1}A^T\)
最后一步中,\(\mu\)就是\(E[x]\),所以那个事方差公式


代入\(O\)(圆圈)
\(p(x|y) \sim N(,)\)最后答案套公式

高斯分布 | 推导 | 笔记

原文:https://www.cnblogs.com/leesamoyed/p/12256235.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!