设计效应抽样调查 证明与练习
证明部分
证明1
证明:对简单随机估计\(\bar{y}\),有\({E}(\bar{y})=\bar{Y}\),\({V}(\bar{y})=\dfrac{1-f}{N}S^2\)。
用\(a_i\)表示总体中\(Y_i\)入样这一事件,则\(a_i\)是随机变量,且
\[{E}(a_i)=f,\quad {V}(a_i)={E}(a_i^2)-[{E}(a_i)]^2=f(1-f),\{E}(a_ia_j)=\frac{n(n-1)}{N(N-1)},\\mathrm{cov}(a_i,a_j)={E}(a_ia_j)-{E}(a_i){E}(a_j)=\frac{-f(1-f)}{(N-1)}.
\]
同时可以对\(\bar{y}\)作变换为
\[\bar{y}=\frac{1}{n}\sum_{i=1}^{n}y_i=\frac{1}{n}\sum_{i=1}^{N}a_iY_i.
\]
因此对期望,有
\[\begin{aligned}
E(\bar{y})&=\frac{1}{n}E\left(\sum_{i=1}^{N}a_iY_i \right)\&=\frac{1}{n}\sum_{i=1}^{N}E(a_i)Y_i\&=\frac{f}{n}\sum_{i=1}^{N}Y_i\&=\bar{Y};
\end{aligned}
\]
对方差,有
\[\begin{aligned}
V(\bar{y})&=\frac{1}{n^2}V\left(\sum_{i=1}^{N}a_iY_i \right)\&=\frac{1}{n^2}\left[\sum_{i=1}^{N}Y_i^2V(a_i)+2\sum_{i<j}^{N}Y_iY_j\mathrm{cov}(a_i,a_j) \right]\&=\frac{1}{n^2}\left[f(1-f)\sum_{i=1}^{N}Y_i^2-2\frac{f(1-f)}{N-1}\sum_{i<j}^{N}Y_iY_j \right]\&=\frac{f(1-f)}{n^2}\left[\sum_{i=1}^{N}Y_i^2-\frac{1}{N-1}\sum_{i<j}^{N}2Y_iY_j \right]\&=\frac{f(1-f)}{n^2}\left[\sum_{i=1}^{N}Y_i^2-\frac{1}{N-1}\left(\sum_{i=1}^{N}Y_i \right)^2+\frac{1}{N-1}\sum_{i=1}^{N}Y_i^2 \right] \&=\frac{f(1-f)}{n^2}\left[\frac{N}{N-1}\sum_{i=1}^{N}Y_i^2-\frac{1}{N-1}\left(\sum_{i=1}^{N}Y_i \right)^2 \right]\&=\frac{f(1-f)}{n^2}\frac{N}{N-1}\left[\sum_{i=1}^{N}Y_i^2-N\bar{Y}^2 \right]\&=\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{n}(Y_i-\bar{Y})^2\&=\frac{1-f}{n}S^2.
\end{aligned}
\]
证明2
证明:样本方差是总体方差的无偏估计,即\(E(s^2)=S^2\);样本协方差是总体协方差的无偏估计,即\(E(s_{yx})=S_{yx}\)。
沿用上题的记号,有
\[\begin{aligned}
E(s^2)&=E\left[\frac{1}{n-1}\sum_{i=1}^{n}(y_i-\bar{y})^2 \right]\&=\frac{1}{n-1}E\left(\sum_{i=1}^{n}y_i^2\right)-\frac{n}{n-1}E(\bar{y}^2)\&=\frac{1}{n-1}E\left(\sum_{i=1}^{N}a_iY_i^2 \right)-\frac{n}{n-1}\left[\frac{1-f}{n}S^2+\bar{Y}^2 \right]\&=\frac{f}{n-1}\sum_{i=1}^{N}Y_i^2-\frac{1-f}{n-1}S^2+\frac{n}{n-1}\bar{Y}^2\&=\frac{f}{n-1}\left[(N-1)S^2+N\bar{Y}^2 \right]-\frac{1-f}{n-1}S^2+\frac{n}{n-1}\bar{Y}^2\&=S^2\left[\frac{f(N-1)-(1-f)}{n-1} \right]+\bar{Y}^2\left(\frac{fN-n}{n-1} \right)\&=S^2.
\end{aligned}
\]
为证下一个结论,需要先计算\(\mathrm{cov}(\bar{y},\bar{x})\)。为此,引进变换\(U=Y+X\),类似定义\(u_i\),\(\bar{u}\),\(S_u^2\),于是
\[V(\bar u)=V(\bar y)+V(\bar x)+2\mathrm{cov}(\bar y, \bar x),\\begin{aligned}
\mathrm{cov}(\bar y,\bar x)&=\frac{1}{2}[V(\bar u)-V(\bar y)-V(\bar x)]\&=\frac{1}{2}\frac{1-f}{n}\frac{1}{N-1}\left[\sum_{i=1}^{N}[(U_i-\bar{U})^2-(Y_i-\bar{Y})^2-(X_i-\bar{X})^2 \right]\&=\frac{1-f}{2n}\frac{1}{N-1}\cdot \sum_{i=1}^{N}2(Y_i-\bar{Y})(X_i-\bar{X})\&=\frac{1-f}{n}S_{yx}.
\end{aligned}
\]
这时就有
\[\begin{aligned}
E(s_{yx})&=E\left[\frac{1}{n-1}\sum_{i=1}^{n}(y_i-\bar{y})(x_i-\bar{x}) \right]\&=\frac{1}{n-1}E\left(\sum_{i=1}^{n}y_ix_i \right)-\frac{n}{n-1}E(\bar{y}\bar{x})\&=\frac{f}{n-1}\sum_{i=1}^{N}Y_iX_i-\frac{n}{n-1}\bar{Y}\bar{X}-\frac{n}{n-1}\frac{1-f}{n}S_{yx}\&=\frac{f}{n-1}\left[(N-1)S_{yx}+N\bar{Y}\bar{X} \right]-\frac{n}{n-1}\bar{Y}\bar{X}-\frac{n}{n-1}\frac{1-f}{n}S_{yx}\&=S_{yx}\left[\frac{f(N-1)-n(1-f)}{n-1}\right]+\bar{Y}\bar{X}\left(\frac{fN-n}{n-1}\right)\&=S_{yx}.
\end{aligned}
\]
证明3
证明:比率估计量\(r\)的方差为
\[V(r)\approx \frac{1}{\bar{X}^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}(Y_i-RX_i)^2=\frac{1}{\bar{X}^2}\frac{1-f}{n}(S^2-2RS_{yx}+R^2S_x^2).
\]
定义\(G=Y-RX\),类似定义\(g_i\),\(\bar{g}\),\(\bar{G}\),容易验证\(\bar{G}=0\),从而
\[\begin{aligned}
V(r)&\approx E(r-R)^2\&=E\left(\frac{\bar{y}-R\bar{x}}{\bar{x}} \right)^2\&\approx\frac{1}{\bar{X}^2}E(\bar{y}-R\bar{x})^2\&=\frac{1}{\bar{X}^2}E(\bar g^2)=\frac{1}{\bar{X}^2}V(\bar g)\&=\frac{1}{\bar{X}^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}(G_i-\bar{G})^2\&=\frac{1}{\bar{X}^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}(Y_i-RX_i)^2.
\end{aligned}
\]
对后面的等式,有
\[\begin{aligned}
V(r)&\approx \frac{1}{\bar{X}^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}(Y_i-RX_i)^2\&=\frac{1}{\bar{X}^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}[(Y_i-\bar{Y})-R(X_i-\bar{X})]^2\&=\frac{1}{\bar{X}^2}\frac{1-f}{n}\left[S^2-2RS_{yx}+R^2S_{x}^2 \right].
\end{aligned}
\]
证明4
证明:对\(\bar{y}_{RC}=\dfrac{\bar{y}_{st}}{\bar{x}_{st}}\bar{X}\),有\(E(\bar{y}_{RC})\approx \bar{Y}\),\(\displaystyle{V(\bar{y}_{RC})\approx\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(S_h^2-2RS_{yxh}+R^2S_{xh}^2) }\)。
由\(E(\bar{x}_{st})\approx \bar{X}\),有
\[E(\bar{y}_{RC})=\bar{X}E\left(\frac{\bar{y}_{st}}{\bar{x}_{st}} \right)\approx E(\bar{y}_{st})=\bar{Y}.
\]
作变换\(G=Y-RX\),类似定义\(G_{hi}\),\(\bar{G}_h\),\(\bar{g}_{st}\),我们有\(\bar{G}_h=\bar{Y}_h-R\bar{X}_h\),\(\bar{g}_{st}=\bar{y}_{st}-R\bar{x}_{st}\),故\(E(\bar{g}_{st})=0\)。因此
\[\begin{aligned}
V(\bar{y}_{RC})&\approx E(\bar{y}_{RC}-\bar{Y})^2\&\approx E(\bar{y}_{st}-R\bar{x}_{st})^2\&=V(\bar{g}_{st})\&=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}S_{gh}^2\&=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}\left[\frac{1}{N_h-1}\sum_{i=1}^{N_h}(G_{hi}-\bar{G}_h)^2 \right]\&=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(S_{yh}^2-2RS_{yxh}+R^2S_{xh}^2).
\end{aligned}
\]
证明5
证明分层抽样的最优分配比例为
\[n_h\propto\frac{W_hS_h}{\sqrt{c_h}}.
\]
这里\(c_h\)为调查第\(h\)层样本的单位成本。
我们有
\[z=\left(\sum_{h=1}^{L}n_hc_h \right)\left(\sum_{h=1}^{L}\frac{W_hS_h^2}{n_h} \right).
\]
由柯西不等式,有
\[z\ge \left(\sum_{h=1}^{L}\sqrt{c_hW_hS_h^2} \right)^2,
\]
当且仅当各层都有
\[\frac{n_h^2c_h}{W_hS_h^2}=K,
\]
\(K\)为某一常数时等号成立,即
\[n_h\propto \frac{W_hS_h^2}{\sqrt{c_h}}.
\]
证明6
证明整群抽样的设计效应约为
\[deff=\frac{V(\bar{\bar{y}})}{V_{srs}(\bar{\bar{y}})}\approx 1+(M-1)\rho_{c}.
\]
这里\(\rho_c\)为群内相关系数,即
\[\rho_c=\frac{2\sum\limits_{i=1}^{N}\sum\limits_{j<k}^{M}(Y_{ij}-\bar{\bar{Y}})(Y_{ik}-\bar{\bar{Y}})}{(M-1)(NM-1)S^2}.
\]
我们假设\(N,M\)都很大,这样\(N-1\approx N\),\(NM-1\approx NM\),于是
\[\begin{aligned}
V(\bar{\bar{y}})&=\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}(\bar{Y}_i-\bar{\bar{Y}})^2\&=\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}\left(\frac{1}{M}\sum_{j=1}^{M}Y_{ij}-\bar{\bar{Y}} \right)^2\&=\frac{1}{M^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}\left[\sum_{j=1}^{M}(Y_{ij}-\bar{\bar{Y}}) \right]^2\&=\frac{1-f}{nM}\frac{1}{M(N-1)}\sum_{i=1}^{N}\left[\sum_{j=1}^{M}(Y_{ij}-\bar{\bar{Y}})^2+2\sum_{j<k}^{M}(Y_{ij}-\bar{\bar{Y}})(Y_{ik}-\bar{\bar{Y}}) \right]\&=\frac{1-f}{nM}\frac{1}{M(N-1)}\left[(NM-1)S^2+(M-1)(NM-1)S^2\rho_c \right]\&=\frac{1-f}{nM}\frac{(NM-1)S^2}{M(N-1)}[1+(M-1)\rho_c]\&\approx \frac{1-f}{nM}S^2[1+(M-1)\rho_c].
\end{aligned}
\]
注意到\(V_{srs}(\bar{\bar{y}})=\dfrac{1-f}{nM}S^2\),所以
\[deff\approx [1+(M-1)\rho_c].
\]
证明7
对于两阶段抽样,有
\[E(\hat\theta)=E_1E_2(\hat\theta),\V(\hat\theta)=V_1[E_2(\hat\theta)]+E_1[V_2(\hat\theta)].
\]
均值公式就是全期望公式。记\(E(\hat\theta)=\theta\),对方差有
\[\begin{aligned}
V(\hat\theta)&=E(\hat\theta-\theta)^2\&=E_1E_2(\hat\theta-\theta)^2\&=E_1[E_2(\hat\theta^2)-2\theta E_2(\hat\theta)+\theta^2]\&=E_1[V_2(\hat\theta)+[E_2(\hat\theta)]^2]-\theta^2\&=E_1V_2(\hat\theta)+E_1[E_2(\hat\theta)]^2-[E_1E_2(\hat\theta)]^2\&=E_1V_2(\hat\theta)+V_1E_2(\hat\theta).
\end{aligned}
\]
证明8
对于两阶段抽样,证明:
\[V(\bar{\bar{y}})=\frac{1-f_1}{n}S_1^2+\frac{1-f_2}{nm}S_2^2.
\]
我们有\(V(\bar{\bar{y}})=V_1E_2(\bar{\bar{y}}_2)+E_1V_2(\bar{\bar{y}}_2)\),针对两项分别计算。对第一项,有
\[\begin{aligned}
V_1E_2(\bar{\bar{y}})&=V_1E_2\left(\frac{1}{n}\sum_{i=1}^{n}\bar{y}_i \right)\&=V_1\left(\frac{1}{n}\sum_{i=1}^{n}\bar{Y}_i \right)\&=\frac{1-f_1}{n}\frac{1}{N-1}\sum_{i=1}^{N}(\bar{Y}_i-\bar{\bar{Y}})^2\&=\frac{1-f_1}{n}S_1^2,
\end{aligned}
\]
对第二项,有
\[\begin{aligned}
E_1V_2(\bar{\bar{y}})&=E_1V_2\left(\frac{1}{n}\sum_{i=1}^{n}\bar{y}_i \right)\&=E_1\left[\frac{1}{n^2}\sum_{i=1}^{n}\frac{1-f_2}{m}\frac{1}{M-1}\sum_{j=1}^{M}(Y_{ij}-\bar{Y}_i)^2 \right]\&=\frac{1}{n}E_1\left[\frac{1}{n}\sum_{i=1}^{n}\frac{1-f_2}{m}S_{2i}^2 \right]\&=\frac{1-f_2}{nm}E_1\left(\frac{1}{n}\sum_{i=1}^{n}S_{2i}^2 \right)\&=\frac{1-f_2}{nm}\left(\frac{1}{N}\sum_{i=1}^{N}S_{2i}^2 \right)\&=\frac{1-f_2}{nm}S_2^2.
\end{aligned}
\]
原式得证。
证明9
对两阶段抽样,有
\[E(s_1^2)=S_1^2+\frac{1-f_2}{m}S_2^2,\E(s_2^2)=S_2^2.
\]
对\(s_1^2\),有
\[\begin{aligned}
E_2[(n-1)s_1^2]&=E_2\left[\sum_{i=1}^{n}(\bar{y}_i-\bar{\bar{y}})^2 \right]\&=\sum_{i=1}^{n}E_2(\bar{y}_i^2)-nE_2(\bar{\bar{y}}^2)\&=\sum_{i=1}^{n}\{[E_2(\bar{y}_i)]^2+V_2(\bar{y}_i)\}-n\left\{[E_2(\bar{\bar{y}})]^2+V_2(\bar{\bar{y}}) \right\}\&=\sum_{i=1}^{n}\bar{Y}_i^2+\sum_{i=1}^{n}\frac{1-f_2}{m}S_{2i}^2-n\left(\frac{1}{n}\sum_{i=1}^{n}\bar{Y}_i \right)^2-\frac{1-f_2}{nm}\sum_{i=1}^{n}S_{2i}^2,
\end{aligned}
\]
引入\(\bar{Y}_n=\displaystyle{\frac{1}{n}\sum_{i=1}^{n}\bar{Y}_i}\),我们有
\[\begin{aligned}
E_2[(n-1)s_2^2]&=\sum_{i=1}^{n}(\bar{Y}_i-\bar{Y}_{n})^2+\frac{(n-1)(1-f_2)}{nm}\sum_{i=1}^{n}S_{2i}^2,\E(s_2^2)&=E_1E_2(s_2^2)\&=E_1\left[\frac{1}{n-1}\sum_{i=1}^{n}(\bar{Y}_i-\bar{Y}_n)^2+\frac{1-f_2}{nm}\sum_{i=1}^{n}S_{2i}^2 \right]\&=\frac{1}{N-1}\sum_{i=1}^{N}(\bar{Y}_i-\bar{\bar{Y}})^2+\frac{1-f_2}{m}E_1\left(\frac{1}{n}\sum_{i=1}^{n}S_{2i}^2 \right)\&=S_1^2+\frac{1-f_2}{m}\frac{1}{N}\sum_{i=1}^{N}S_{2i}^2\&=S_1^2+\frac{1-f_2}{m}S_{2}^2.
\end{aligned}
\]
对\(s_2^2\),有
\[\begin{aligned}
E_2(s_2^2)&=E_2\left(\frac{1}{n}\sum_{i=1}^{n}s_{2i}^2 \right)\&=\frac{1}{n}\sum_{i=1}^{n}E_2(s_{2i}^2)\&=\frac{1}{n}\sum_{i=1}^{n}S_{2i}^2,\E(s_2^2)&=E_1E_2(s_2^2)\&=E_1\left(\frac{1}{n}\sum_{i=1}^{n}S_{2i}^2 \right)\&=\frac{1}{N}\sum_{i=1}^{N}S_{2i}^2\&=S_{2}^2.
\end{aligned}
\]
得证。
证明10
证明:对\(V(\hat{Y}_{HH})\)的无偏估计为
\[v(\hat{Y}_{HH})=\frac{1}{n}\frac{1}{n-1}\sum_{i=1}^{n}\left(\frac{Y_i}{Z_i}-\hat{Y}_{HH} \right)^2.
\]
设\(t_i\)为\(Y_i\)的入样次数,则\(\displaystyle{\sum_{i=1}^{N}t_i=n}\),诸\(t_i\)服从多项分布\(B(n;Z_1,Z_2,\cdots,Z_N)\),故
\[E(t_i)=nZ_i,\quad V(t_i)=nZ_i(1-Z_i),\quad \mathrm{cov}(t_i,t_j)=-nZ_iZ_j.
\]
注意到\(V(\hat{Y}_{HH})=\dfrac{1}{n}\displaystyle{\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2}\),于是
\[\begin{aligned}
E\left[\sum_{i=1}^{n}\left(\frac{Y_i}{Z_i}-\hat{Y}_{HH} \right)^2\right]&=E\left[\sum_{i=1}^{n}\left(\frac{Y_i}{Z_i} \right)^2-n\hat{Y}_{HH}^2 \right]\&=E\left[\sum_{i=1}^{n}\left(\frac{Y_i}{Z_i}-Y \right)^2-n(\hat{Y}_{HH}-Y)^2 \right]\&=E\left[\sum_{i=1}^{N}t_i\left(\frac{Y_i}{Z_i}-Y \right)^2 \right]-nE(\hat{Y}_{HH}-Y)^2\&=\sum_{i=1}^{N}nZ_i\left(\frac{Y_i}{Z_i}-Y \right)^2-nV(\hat{Y}_{HH})\&=(n^2-n)V(\hat{Y}_{HH}).
\end{aligned}
\]
结论得证。
证明11
证明当\(n\)固定时,对HT统计量的方差,有
\[V(\hat{Y}_{HT})=\sum_{i=1}^{N}\frac{1-\pi_i}{\pi_i}Y_i^2+2\sum_{i<j}^{N}\frac{\pi_{ij}-\pi_i\pi_j}{\pi_i\pi_j}Y_iY_j=\sum_{i<j}^{N}(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i}{\pi_i}-\frac{Y_j}{\pi_j} \right)^2.
\]
注意到此时对给定的\(i\),总有
\[\sum_{j\ne i}^{N}(\pi_{ij}-\pi_i\pi_j)=\sum_{j\ne i}^{N}\pi_{ij}-\pi_i\sum_{j\ne i}^{N}\pi_j=(n-1)\pi_i-\pi_i(n-\pi_i)=-\pi_i(1-\pi_i),
\]
所以
\[\begin{aligned}
\sum_{i=1}^{N}\frac{1-\pi_i}{\pi_i}Y_i^2&=\sum_{i=1}^{N}\frac{\pi(1-\pi_i)Y_i^2}{\pi_i^2}\&=\sum_{i=1}^{N}\sum_{j\ne i}^{N}(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i^2}{\pi_i^2} \right)\&=\sum_{i<j}^{N}(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i^2}{\pi_i^2}+\frac{Y_j^2}{\pi_j^2} \right),
\end{aligned}
\]
加上第二项,就得到
\[\begin{aligned}
V(\hat{Y}_{HT})&=\sum_{i<j}^{N}\left[(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i^2}{\pi_i^2}+\frac{Y_j^2}{\pi_j^2}-2\frac{Y_iY_j}{\pi_i\pi_j} \right) \right]\&=\sum_{i<j}^{N}(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i}{\pi_i}-\frac{Y_j}{\pi_j} \right)^2.
\end{aligned}
\]
证明12
证明Brewer抽样方法是\(\mathrm{\pi PS}\)的,即
- 按照\(\dfrac{Z_i(1-Z_i)}{1-2Z_i}\)的概率抽取第一个单元;
- 在剩下的单元中,按照和\(Z_i\)成比例的概率抽取下一个单元。
且\(\pi_i=2Z_i\),\(\pi_{ij}=\dfrac{4Z_iZ_j(1-Z_i-Z_j)}{(1-2Z_i)(1-2Z_j(1+\sum\limits_{i=1}^{N}\dfrac{Z_i}{1-2Z_i})}\)。
令
\[\begin{aligned}
D&=\sum_{i=1}^{N}\frac{Z_i(1-Z_i)}{1-2Z_i}\&=\sum_{i=1}^{N}\left(\frac{Z_i(1-Z_i)}{1-2Z_i}-\frac{1}{2}Z_i\right)+\frac{1}{2}\&=\sum_{i=1}^{N}\frac{Z_i}{2(1-2Z_i)}+\frac{1}{2}\&=\frac{1}{2}\left(\sum_{i=1}^{N}\frac{Z_i}{1-2Z_i}+1\right),
\end{aligned}
\]
则
\[\begin{aligned}
\pi_i&=\frac{Z_i(1-Z_i)}{D(1-2Z_i)}+\sum_{j\ne i}^{N}\frac{Z_iZ_j}{D(1-2Z_j)}\&=\frac{Z_i}{D}\left[ 1+\frac{Z_i}{1-2Z_i}+\sum_{j\ne i}^{N}\frac{Z_j}{(1-2Z_j)}\right]\&=\frac{Z_i}{D}(2D)\&=2Z_i.
\end{aligned}
\]
而
\[\begin{aligned}
\pi_{ij}&=\frac{Z_i(1-Z_i)}{D(1-2Z_i)}\cdot \frac{Z_j}{1-Z_i}+\frac{Z_j(1-Z_j)}{D(1-2Z_j)}\cdot\frac{Z_i}{1-Z_j}\&=\frac{Z_iZ_j(1-2Z_j)+Z_iZ_j(1-2Z_j)}{D(1-2Z_i)(1-2Z_i)}\&=\frac{2Z_iZ_j(1-Z_i-Z_j)}{(1-2Z_i)(1-2Z_j)\displaystyle{\left(1+\sum_{i=1}^{N}\frac{Z_i}{1-2Z_i} \right)}}.
\end{aligned},
\]
得证。
证明13
证明系统抽样的方差为
\[V(\bar{y}_{sy})=\frac{N-1}{N}S^2-\frac{k(n-1)}{N}S_{wsy}^2,
\]
这里
\[S^2=\frac{1}{N-1}\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y})^2,\S_{wsy}^2=\frac{1}{k}\sum_{r=1}^{k}\frac{1}{n-1}\sum_{j=1}^{n}(Y_{rj}-\bar{Y}_{r})^2.
\]
对\(S^2\)进行分解,有
\[\begin{aligned}
(N-1)S^2&=\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y})^2\&=\sum_{r=1}^{k}\sum_{j=1}^{n}({Y}_{rj}-\bar{Y}_r)^2+\sum_{r=1}^{k}\sum_{j=1}^{n}(\bar{Y}_r-\bar{Y})^2\&=\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y}_r)^2+n\sum_{r=1}^{k}(\bar{Y}_{r}-\bar{Y})^2\&=\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y}_r)^2+N\left[\frac{1}{k}\sum_{r=1}^{k}(\bar{Y}_r-\bar{Y})^2 \right]\&=k(n-1)S_{wsy}^2+NV(\bar{y}_{sy}),
\end{aligned}
\]
从而
\[V(\bar{y}_{sy})=\frac{N-1}{N}S^2-\frac{k(n-1)}{N}S_{wsy}^2.
\]
证明14
对分层二重抽样,有
\[E(\bar{y}_{stD})=\bar{Y},\V(\bar{y}_{stD})=\left(\frac{1}{n‘}-\frac{1}{N} \right)S^2+\sum_{h=1}^{L}\frac{W_h^2S_h^2}{n‘}\left(\frac{1}{f_{hD}}-1 \right).
\]
对均值,注意\(\displaystyle{\sum_{h=1}^{L}w_h‘\bar{y}_h‘=\bar{y}‘}\),且\(\bar{y}‘\)是从总体中以抽样比\(f_1=\dfrac{n‘}{N}\)抽取的简单随机样本,所以
\[\begin{aligned}
E(\bar{y}_{stD})&=E_1E_2\left(\sum_{h=1}^{L}w_h‘\bar{y}_h \right)\&=E_1\left(\sum_{h=1}^{L}w_h‘\bar{y}_h‘ \right)\&=E_1(\bar{y}‘)\&=\bar{y}.
\end{aligned}
\]
对方差,有\(V(\bar{y}_{stD})=V_1E_2(\bar{y}_{stD})+E_1V_2(\bar{y}_{stD})\),分别计算(注意\(n_h=n_h‘f_{hD}\),\(n_h‘=w_h‘n‘\)),有
\[\begin{aligned}
V_1E_2(\bar{y}_{stD})&=V_1\left(\sum_{h=1}^{L}w_h‘\bar{y}_h‘ \right)\&=V_1(\bar{y}‘)\&=\left(\frac{1}{n‘}-\frac{1}{N} \right)S^2;\E_1V_2(\bar{y}_{stD})&=E_1\left[\sum_{h=1}^{L}w_h‘^2s_h‘^2\left(\frac{1}{n_h}-\frac{1}{n_h‘} \right) \right]\&=E_1\left[\sum_{h=1}^{L}\frac{w_h‘s_h‘^2}{n‘}\left(\frac{1}{f_{hD}}-1 \right) \right]\&=\frac{1}{n‘}\sum_{h=1}^{L}\left(\frac{1}{f_{hD}}-1 \right)E_1(w_h‘s_h‘^2)\&=\frac{1}{n‘}\sum_{h=1}^{L}\left(\frac{1}{f_{hD}}-1 \right)E_1[E_1(w_h‘s_h‘^2|w_h‘)]\&=\frac{1}{n‘}\sum_{h=1}^{L}\left(\frac{1}{f_{hD}}-1 \right)S_h^2E_1(w_h‘)\&=\sum_{h=1}^{L}\frac{W_hS_h^2}{n‘}\left(\frac{1}{f_{hD}}-1 \right).
\end{aligned}
\]
这里运用到全概率公式,再代回即可得到结果。
证明15
证明分层二重抽样在成本\(C_{T}^*=c_1+\displaystyle{\sum_{h=1}^{L}c_{2h}n_h}\)下的样本量最优分配为:
\[f_{hD}=S_h\sqrt{\frac{c_1}{c_{2h}\displaystyle{\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}}},\n‘=\frac{C_{T}^*}{c_1+\displaystyle{\sum_{h=1}^{L}c_{2h}W_hf_{hD}}}.
\]
方差为
\[V(\bar{y}_{stD})=\left(\frac{1}{n‘}-\frac{1}{N} \right)S^2+\sum_{h=1}^{L}\frac{W_hS_h^2}{n‘}\left(\frac{1}{f_{hD}}-1 \right)=\frac{S^2}{n‘}+\sum_{h=1}^{L}\frac{W_hS_h^2}{n‘f_{hD}}-\sum_{h=1}^{L}\frac{W_hS_h^2}{n‘}-\frac{S^2}{N},
\]
故极小化
\[C_{T}^*\left(V+\frac{S^2}{N} \right)=\left(c_1+\sum_{h=1}^{L}c_{2h}f_{hD}W_h \right)\left[\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)+\sum_{h=1}^{L}\frac{W_hS_h^2}{f_{hD}} \right],
\]
由Cauchy不等式,有
\[C_{T}^{*}\left(V+\frac{S^2}{N} \right)\ge \left[\sqrt{c\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}+\sum_{h=1}^{L}\sqrt{c_{2h}}W_hS_h \right]^2,
\]
等号成立当且仅当
\[\frac{c_{2h}f_{hD}W_h}{W_hS_h^2/f_{hD}}=\frac{c_1}{\displaystyle{\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}},
\]
即
\[f_{hD}=S_h\sqrt{\frac{c_1}{c_{2h}\displaystyle{\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}}}.
\]
为得到\(n’\),只需代回。
练习题
1. 简单随机抽样
给定如下的数据框,这里\(Y\)是待估变量,\(X\)是辅助变量。
\[\begin{array}{c|cc}
\hline
Y & 4 & 6 & 8 & 5 & 4 \X & 2 & 3 & 3 & 2 & 1 \\hline
\end{array}
\]
且知道\(N=50\),\(n=5\),\(\bar{X}=2\),求:
- \(\bar{Y}\)的简单估计,及其\(95\%\)置信区间。
- \(\bar{Y}\)的比估计,及其\(95\%\)的置信区间。
- \(\bar{Y}\)的回归估计,及其\(95\%\)的置信区间。
-
对简单估计,有
\[\bar{y}=5.4,\quad s^2=2.8 \v(\bar{y})=\frac{1-f}{n}s^2=0.504
\]
计算\(\bar{y}\pm u_{\alpha/2}\sqrt{v(\bar{y})}\),得到置信区间为
\[[4.0085,6.7915].
\]
-
对比估计,先计算得
\[\bar{x}=2.2,\quad s_x^2=0.7,\quad s_y^2=2.8,\quad s_{xy}=1.15.
\]
所以
\[r = \frac{\bar{y}}{\bar{x}}=2.4545,\\bar{y}_{R}=\frac{\bar{y}}{\bar{x}}\bar{X}=4.9091,\v(\bar{y}_{R})=\frac{1-f}{n}(s^2-2rs_{yx}+r^2s_x^2)=0.2469,
\]
计算\(\bar{y}_{R}\pm u_{\alpha/2}\sqrt{v(\bar{y}_{R})}\),得到置信区间为
\[[4.2779,5.8650].
\]
-
对回归估计,需计算回归参数,即
\[b=\frac{s_{yx}}{s_{x}^2}=1.6429,\\bar{y}_{lr}=\bar{y}+b(\bar{X}-\bar{x})=5.0714.
\]
为估计其方差,需计算相关系数,即
\[\hat\rho=\frac{s_{yx}}{s_ys_x}=0.8214,\v(\bar{y})=\frac{1-f}{n}s_y^2(1-\hat\rho^2)=0.1639,
\]
计算\(\bar{y}_{lr}\pm u_{\alpha/2}\sqrt{v(\bar{y}_{lr})}\),得到置信区间为
\[[4.2779,5.8650].
\]
2. 分层随机抽样的比估计
已知两层的总体数据为\(N_1=15\),\(N_2=10\),\(\bar X_1=20\),\(\bar X_2=50\)。从两层中各抽取\(3\)个样本,结果是
\[\begin{array}{c|cc}
\hline
Y_1 & 30 & 35 & 40 \X_1 & 18 & 18 & 25 \\hline
Y_2 & 75 & 82 & 85 \X_2 & 55 & 40 & 60 \\hline
\end{array}
\]
- 给出\(\bar{Y}\)的分别比估计结果,并估计其方差。
- 给出\(\bar{Y}\)的联合比估计结果,并估计其方差。
-
已知\(W_1=0.6\),\(W_2=0.4\),\(f_1=0.2\),\(f_2=0.3\)。对分别比估计,有
\[\bar{r}_1=1.7213,\\bar{r}_2=1.5613,\\bar{y}_{RS}=W_1\bar{X}_1\bar{r}_1+W_2\bar{X}_2\bar{r}_2=51.8815.
\]
对其方差,有
\[v(\bar{y}_{RS})=\sum_{h=1}^{2}W_h^2\frac{1-f_h}{n_h}(s_{yh}^2-2\bar{r}_hs_{yxh}+\bar{r}_h^2s_{xh}^2)=12.0071.
\]
-
对联合比估计,有
\[\bar{y}_{st}=\sum_{h=1}^{2}W_h\bar{y}_h=53.2667,\\bar{x}_{st}=\sum_{h=1}^{2}W_h\bar{x}_h=32.8667,
\]
故
\[r=\frac{\bar{y}_{st}}{\bar{x}_{st}}=1.6207,\quad \bar{X}=32,\\bar{y}_{RC}=\frac{\bar{y}_{st}}{\bar{x}_{st}}\bar{X}=51.8620,\v(\bar{y}_{RC})=\sum_{h=1}^{2}W_h^2\frac{1-f_h}{n_h}(s_h^2-2rs_{yxh}+r^2s_{xh}^2)=12.5786.
\]
3. 分层随机抽样的样本分配
对一个两层总体调查比率,\(N_1=10\),\(N_2=20\),\(n_1=n_2=5\),算得\(p_1=0.4\),\(p_2=0.2\)。
- 试使用分层随机抽样估计\(P\),并给出\(p_{st}\)的标准差。
- 计算Neyman分配时,以及\(c_2=4c_1\)时最优分配时,两层样本量的比值。
-
对\(p_{st}\)的估计,有
\[p_{pst}=\frac{1}{3}p_1+\frac{2}{3}p_2=0.266667.
\]
对方差估计,有
\[s_h^2=\frac{n_hp_h(1-p_h)}{n_h-1},
\]
所以
\[s_1^2=1.25\times 0.4\times 0.6=0.3,\s_2^2=1.25\times 0.2\times 0.8=0.2,\v(p_{st})=\frac{1}{9}\frac{1-0.5}{5}0.3+\frac{4}{9}\frac{1-0.25}{5}0.2=0.016667,\\sigma(p_{st})=0.1291.
\]
-
对于最优分配,有\(n_h\propto W_hS_h\),所以
\[\frac{n_1}{n_2}=\frac{1/3\times \sqrt{0.3}}{2/3\times\sqrt{0.2}}=0.6124.
\]
对于一般情况下的最优分配,有\(n_h\propto W_hS_h/\sqrt{c_h}\),所以
\[\frac{n_1}{n_2}=\frac{1/3\times \sqrt{0.3}}{2/3\times \sqrt{0.2}\times \sqrt{4}}=0.3062.
\]
4. 等概率整群抽样
现有\(10\)个等规模\(M=10\)的群,随机抽取了\(4\)个整群,调查得到其群总值分别为
\[\begin{array}{c|c}
\hline
i & y_i & y_{ij}\\hline
1 & 19 & 1,2,1,3,3,2,1,4,1,1 \2 & 20 & 1,3,2,2,3,1,4,1,1,2 \3 & 16 & 2,1,1,1,1,3,2,1,3,1 \4 & 20 & 1,1,3,2,1,5,1,2,3,1 \\hline
\end{array}
\]
- 求\(\bar{\bar{y}}\)的估计及其标准差。
- 求设计效应。
-
即\(\bar{y}_1=1.9\),\(\bar{y}_{2}=2\),\(\bar{y}_3=1.6\),\(\bar{y}_4=2\)。由简单随机抽样的性质,有
\[\bar{\bar{y}}=\frac{1}{4}\sum_{i=1}^{4}\bar{y}_i=1.875,
\]
且
\[v(\bar{\bar{y}})=\frac{1-0.4}{4}\frac{1}{3}\sum_{i=1}^{4}(\bar{y}_i-\bar{\bar{y}})^2=0.005375,\\sigma(\bar{\bar{y}})=0.07331.
\]
-
此时
\[s_{b}^2=\frac{1}{n-1}\sum_{i=1}^{4}M(\bar y_i-\bar{\bar y}_i)^2=0.358333,\s_w^2=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{M-1}\sum_{j=1}^{M}(y_{ij}-\bar{y}_i)^2=1.202778,
\]
所以
\[\hat \rho_c=\frac{s_b^2-s_w^2}{s_b^2+(M-1)s_w^2}=-0.0755,\deff\approx 1+(M-1)\hat\rho^c=0.3204.
\]
5. 两阶段抽样
现有\(N=10\)个等规模的的群,每个群中有\(M=50\)个个体,从中抽取\(3\)个群,每个群抽取\(5\)个样本,得到的结果如下:
\[\begin{array}{c|cc}
\hline
1 & 20 & 25 & 20 & 25 & 20 \2 & 18 & 20 & 22 & 25 & 20 \3 & 25 & 28 & 18 & 15 & 21 \\hline
\end{array}
\]
- 试求\(\bar{\bar{Y}}\)的估计量及其方差,并给出\(95\%\)置信区间。
- 如抽取一个群的成本为\(c_1\),调查一个个体的成本为\(c_2\),其他字母同教材定义,试导出最优的\(m\)。
-
先计算以下量:
\[\bar{y}_1=22,\quad s_{21}=7.5;\\bar{y}_2=21,\quad s_{22}=7;\\bar{y}_3=21.4,\quad s_{23}=27.3.
\]
所以
\[\bar{\bar{y}}=\frac{1}{3}\sum_{i=1}^{3}\bar{y}_i=21.4667,\s_{1}^2=\frac{1}{2}\sum_{i=1}^{3}(\bar{y}_i-\bar{\bar{y}})^2=0.253333,\s_2^2=\frac{1}{3}\sum_{i=1}^{3}s_{2i}^2=13.9333.
\]
得其方差为
\[v(\bar{\bar{y}})=\frac{1-0.3}{3}s_1^2+\frac{0.3(1-0.1)}{15}s_2^2=0.3099,
\]
从而\(95\%\)置信区间是
\[[20.3756,22.5578].
\]
-
两阶段抽样的方差为
\[V=\frac{1}{n}S_1^2-\frac{1}{N}S_1^2+\frac{1}{nm}S_2^2-\frac{1}{n}\frac{S^2_2}{M},
\]
故对下式进行最小化:
\[(c_1n+c_2nm)\left(\frac{S_1^2-S_2^2/M}{n}+\frac{S_2^2}{nm} \right)=(c_1+c_2m)\left(S_1^2-\frac{S_2^2}{M}+\frac{S_2^2}{m} \right).
\]
从而
\[\frac{c_1}{S_1^2-S_2^2/M}=\frac{c_2m^2}{S_2^2},\m_{opt}=\sqrt{\frac{c_1S_2^2}{c_2\left(S_1^2-\dfrac{S_2^2}{M}\right)}}.
\]
且
\[\hat{S}_1^2=s_1^2-\frac{1-f_1}{M}s_2^2,\\hat{S}_2^2=s_2^2.
\]
注:若代入本题数据,得出的\(m_{opt}\)将是负值,故请不要代入计算。
6. \(\mathrm{PPS}\)抽样
对一个\(N=10\)的总体执行不等概抽样,抽样结果如下:
\[\begin{array}{c|cc}
\hline
i & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\hline
Z_i & 0.2 & 0.2 & 0.1 & 0.05 & 0.05 & 0.05 & 0.05 & 0.1 & 0.1 & 0.1 \t_i & 2 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0 \Y_i & 35 & ? & ? & 40 & ? & ? & ? & 20 & 40 & ? \\hline
\end{array}
\]
求总体均值的估计,并给出相应的方差。
构造汉森-赫维茨统计量为
\[\hat{Y}_{HH}=\frac{1}{5}\sum_{i=1}^{5}\frac{y_i}{Z_i}=350,\\bar{y}_{HH}=\frac{\hat{Y}_{HH}}{N}=35.
\]
方差有
\[v(\hat{Y}_{HH})=\frac{1}{5\times 4}\sum_{i=1}^{5}\left(\frac{y_i}{Z_i}-\hat{Y}_{HH} \right)^2=14437.5,\v(\bar{y}_{HH})=\frac{v(\hat{Y}_{HH})}{N^2}=144.375.
\]
7. 两阶段放回不等概抽样
假设某总体共有\(N=10\)个群,每个群中有\(M=10\)个个体。现进行两阶段放回不等概抽样,第一阶段中抽到了两次\(Y_1\),一次\(Y_{2}\)与一次\(Y_3\),其抽选概率分别为
\[Z_1=0.5,\quad Z_2=Z_3=0.1.
\]
现对\(Y_1\)执行两次简单随机抽样,对\(Y_2,Y_3\)各执行一次,取\(m=4\),抽样结果如下:
\[\begin{array}{c|cc}
\hline
Y_1^{(1)} & 3 & 5 & 8 & 10 \Y_1^{(2)} & 3 & 7 & 7 & 9 \Y_2 & 6 & 9 & 10 & 12 \Y_3 & 10 & 15 & 18 & 20 \\hline
\end{array}
\]
试作\(\bar{\bar{Y}}\)的估计,并求其方差。
对总体总值作估计,有
\[\hat{Y}_1=65,\quad \hat{Y}_2=65,\quad \hat{Y}_3=92.5,\quad \hat{Y}_4=157.5,
\]
于是
\[\hat{Y}_{HH}=\frac{1}{4}\sum_{i=1}^{4}\frac{\hat{Y}_i}{Z_i}=690,\quad \bar{y}_{HH}=6.9;\v(\hat{Y}_{HH})=\frac{1}{n(n-1)}\sum_{i=1}^{4}\left(\frac{\hat{Y}_i}{Z_i}-\hat{Y}_{HH} \right)^2=122137.5,\quad v(\bar{y}_{HH})=12.21375.
\]
8. \(\mathrm{\pi PS}\)抽样
考虑一个\(N=8\)个体的总体,欲采用Brewer抽样法获得两个样本:\(y_1=12\),\(y_2=20\),且\(Z_1=0.2\),\(Z_2=0.1\)。
- 简述Brewer抽样方法与实施条件。
- 构造霍维茨-汤普森估计量,对总体总值进行估计。
- 如果这两个样本是按照Yates-Grundy逐个抽取法抽取的,且下一个抽取了\(y_3=15\),\(Z_3=0.05\),构造Raj估计量对总体总值进行估计,并估计其方差。
-
Brewer抽样,第一步按与\(\dfrac{Z_i(1-Z_i)}{1-2Z_i}\)成比例的概率抽取第一个样本,抽到的样本视为\(j\);第二步按与\(Z_i\)成比例的概率即\(\dfrac{Z_i}{1-Z_j}\)抽取第二个样本。
实施条件是\(1-2Z_i>0\),即对每一个\(i\)都有\(Z_i<1/2\)。
-
对总体总值的估计为
\[\hat{Y}_{HT}=\frac{1}{2}\left(\frac{y_1}{Z_1}+\frac{y_2}{Z_2} \right)=130.
\]
-
计算得
\[t_1=\frac{y_1}{Z_1}=60,\t_2=y_1+\frac{y_2}{Z_2}(1-Z_1)=172,\t_3=y_1+y_2+\frac{y_3}{Z_3}(1-Z_1-Z_2)=242.
\]
所以
\[\hat{Y}_{Raj}=\frac{1}{3}\sum_{i=1}^{3}t_i=158,\v(\hat{Y}_{Raj})=\frac{1}{3\times 2}\sum_{i=1}^{3}(t_i-\hat{Y}_{Raj})^2=2809.333.
\]
9. 系统抽样
设总体\(N=30\),欲抽取\(10\)个样本。
- 若样本中包含\(Y_{16}\),求所有样本。
- 在什么情况下,系统抽样优于简单随机抽样。
-
\(16\%3=1\),故样本起点为\(Y_1\),所有样本是
\[Y_1,Y_4,Y_7,Y_{10},Y_{13},\Y_{16},Y_{19},Y_{22},Y_{25},Y_{28}.
\]
-
设
\[S^2=\frac{1}{N-1}\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y})^2,\S_{wsy}^2=\frac{1}{k}\sum_{r=1}^{k}\frac{1}{n-1}\sum_{j=1}^{n}(Y_{rj}-\bar{Y}_j)^2.
\]
则
\[\begin{aligned}
(N-1)S^2&=\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y})^2\&=\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y}_r)^2+\sum_{r=1}^{k}n(\bar{Y}_{r}-\bar{Y})^2\&=k(n-1)S_{wsy}^2+Nv(\bar{y}_{sy}),
\end{aligned}
\]
因此
\[v(\bar{y}_{sy})=\frac{N-1}{N}S^2-\frac{k(n-1)}{N}S_{wsy}^2.
\]
而
\[v(\bar{y}_{srs})=\frac{1-f}{n}S^2=\frac{k-1}{N}S^2,
\]
作差得
\[v(\bar{y}_{srs})-v(\bar{y}_{sy})=\frac{(k-N)S^2+k(n-1)S_{wsy}^2}{N}=\frac{k(n-1)(S_{wsy}^2-S^2)}{N}.
\]
故\(S_{wsy}^2>S^2\)时系统抽样更优。
10. 分层二重抽样
一个含\(1000000\)个体的总体可分为\(2\)层,由于总体情况未知,先抽取\(n‘=10000\)个个体进行预调查,得到结果为\(n_1‘=2000\),\(n_2‘=8000\)。接下来又抽取了\(n_1=n_2=5\)个个体进行细致调查,得到结果为\(\bar{y}_1=200\),\(\bar{y}_2=80\),其方差分别为\(s_1^2=4500\),\(s_2^2=200\)。
- 求总体均值\(\bar{Y}\)的估计,并给出方差估计,这里抽样方差比可忽略。
- 求最优方差比\(f_{hD}\)。
-
分层二重抽样估计为
\[\bar{y}_{stD}=w_1‘\bar{y}_1+w_2‘\bar{y}_2=104,\\]
对其方差,有
\[v(\bar{y}_{stD})=\sum_{h=1}^{L}\frac{w_h‘s_h^2}{n_h}+\frac{1}{n‘}\sum_{h=1}^{L}w_h‘(\bar{y}-\bar{y}_{stD})^2=212.2304.
\]
-
由于
\[\begin{aligned}
V(\bar{y}_{stD})&=\left(\frac{1}{n‘}-\frac{1}{N} \right)S^2+\sum_{h=1}^{L}\frac{W_hS_h^2}{n‘}\left(\frac{1}{f_{hD}}-1 \right)\&=\frac{1}{n‘}\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)+\sum_{h=1}^{L}\frac{W_hS_h^2}{n‘f_{hD}}-\frac{S^2}{N}.
\end{aligned}
\]
而\(C_{T}^*=\displaystyle{c_1n‘+n‘\sum_{h=1}^{L}c_{2h}W_hf_{hD}}\),所以对下式进行最小优化:
\[\left(c_1+\sum_{h=1}^{L}c_{2h}W_hf_{hD} \right)\left[\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)+\sum_{h=1}^{L}\frac{W_hS_h^2}{f_{hD}} \right],
\]
因此
\[\frac{c_1}{S^2-\displaystyle{\sum_{h=1}^{L}W_hS_h^2}}=\frac{c_{2h}f_{hD}^2}{S_h^2},\f_{hD}=S_h\sqrt{\frac{c_1}{c_{2h}\displaystyle{\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}}}.
\]
11. 二重抽样比估计
一个\(N\)很大的总体,由于总体情况未知,先抽取\(n‘=10000\)个样本调查辅助变量\(X\),得到\(\bar{x}‘=50\)。接下来,第二重抽样抽取\(10\)个样本,得到\(\bar{y}=80\),\(\bar{x}=40\),\(s_x^2=1600\),\(s_{yx}=2400\),\(s_{y}^2=8000\)。求二重抽样比估计\(\bar{y}_{RD}\),并计算其估计量方差。
二重抽样比估计为
\[\bar{y}_{RD}=\frac{\bar{y}}{\bar{x}}\bar{x}‘=100.
\]
这里\(\hat{R}=2\),于是方差估计为
\[v(\bar{y}_{RD})=\frac{1}{n}s_y^2+\left(\frac{1}{n}-\frac{1}{n‘} \right)(\hat{R}^2s_{x}^2-2\hat{R}s_{yx})=480.32.
\]
12. 捕获再捕获抽样
为估计湖中有多少条鱼,从中捞出\(1000\)条,标上记号后放回湖中,然后捞出\(150\)条,发现其中有\(10\)条有记号。用Chapman估计给出湖中鱼的总数,并给出方差估计,给出\(95\%\)的区间。
计算得
\[\tilde{N}=\frac{1001\times 151}{11}-1=13740,\v(\tilde{N})=\frac{1001\times 151\times 990\times 140}{11^2\times 12}=14428050.
\]
于是置信区间是
\[[6295,21185].
\]
总述
抽样方法
-
简单随机抽样的简单估计。
\[\bar{y}=\frac{1}{n}\sum_{i=1}^{n}y_i,\V(\bar{y})=\frac{1-f}{n}S^2,\v(\bar{y})=\frac{1-f}{n}s^2.
\]
-
简单随机抽样的比估计。
\[\bar{y}_{R}=\frac{\bar{y}}{\bar{x}}\bar{X},\quad r=\frac{\bar{y}}{\bar{x}}, \V(\bar{y}_{R})\approx \frac{1-f}{n}(S^2-2RS_{yx}+R^2S_x^2),\v(\bar{y}_{R})=\frac{1-f}{n}(s_y^2-2rs_{yx}+r^2s_{x}^2).
\]
-
简单随机抽样的回归估计,回归参数已知。
\[\bar{y}_{lr}=\bar{y}+\beta_0(\bar{X}-\bar{x}),\V(\bar{y}_{lr})\approx \frac{1-f}{n}(S^2-2\beta_0S_{yx}+\beta_0^2S_{x}^2),\v(\bar{y}_{lr})=\frac{1-f}{n}(s_y^2-2\beta_0x_{yx}+\beta_0^2s_{x}^2).
\]
-
简单随机抽样的回归估计,回归参数未知。
\[b=\frac{s_{yx}}{s_{x}^2},\\bar{y}_{lr}=\bar{y}+b(\bar{X}-\bar{x}),\V(\bar{y}_{lr})\approx \frac{1-f}{n}S^2(1-\rho^2),\v(\bar{y}_{lr})\approx \frac{1-f}{n}s_y^2(1-\hat\rho^2).
\]
-
分层随机抽样的简单估计。
\[\bar{y}_{st}=\sum_{h=1}^{L}W_h\bar{y}_{h},\V(\bar{y}_{st})=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}S_h^2,\v(\bar{y}_{st})=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}s_h^2.
\]
-
分层随机抽样的分别比估计。
\[\bar{y}_{RS}=\sum_{h=1}^{L}W_h\frac{\bar{y}_h}{\bar{x}_h}\bar{X}_h,\quad r_h=\frac{\bar{y}_h}{\bar{x}_j},\V(\bar{y}_{RS})\approx \sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(S_{yh}^2-2R_hS_{yxh}+R_h^2S_{xh}^2),\v(\bar{y}_{RS})=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(s_{yh}^2-2r_hs_{yxh}+r_h^2s_{xh}^2).
\]
-
分层随机抽样的联合比估计。
\[\bar{y}_{RC}=\frac{\bar{y}_{st}}{\bar{x}_{st}}\bar{X},\quad r=\frac{\bar{y}_{st}}{\bar{x}_{st}},\V(\bar{y}_{RC})\approx \sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(S_{yh}^2-2RS_{yxh}+R^2S_{xh}^2),\v(\bar{y}_{RC})=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(s_{yh}^2-2rs_{yxh}+r^2s_{xh}^2).
\]
-
等概率等规模整群抽样。
\[\bar{\bar{y}}=\frac{1}{n}\sum_{i=1}^{n}\bar{Y}_i,\V(\bar{\bar{y}})=\frac{1-f}{n}\sum_{i=1}^{n}(\bar{Y}_i-\bar{\bar{Y}})^2,\v(\bar{\bar{y}})=\frac{1-f}{n}\sum_{i=1}^{n}(\bar{y}_i-\bar{\bar{y}})^2.
\]
-
等概率等规模两阶段抽样。
\[\bar{\bar{y}}=\frac{1}{n}\sum_{i=1}^{n}\bar{y}_i,\V(\bar{\bar{y}})=\frac{1-f_1}{n}S_1^2+\frac{1-f_2}{nm}S_2^2,\v(\bar{\bar{y}})=\frac{1-f_1}{n}s_1^2+\frac{f_1(1-f_2)}{nm}s_2^2.
\]
-
放回不等概抽样的汉森-赫维茨估计量。
\[\hat{Y}_{HH}=\frac{1}{n}\sum_{i=1}^{n}\frac{y_i}{Z_i},\V(\hat{Y}_{HH})=\frac{1}{n}\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2,\v(\hat{Y}_{HH})=\frac{1}{n(n-1)}\sum_{i=1}^{n}\left(\frac{y_i}{Z_i}-\hat{Y}_{HH} \right)^2.
\]
-
两阶段放回不等概抽样的汉森-赫维茨估计量。
\[\hat{Y}_{HH}=\frac{1}{n}\sum_{i=1}^{n}\frac{\hat{Y}_i}{Z_i},\V(\hat{Y}_{HH})=\frac{1}{n}\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2+\frac{1}{n}\sum_{i=1}^{N}\frac{V_2(\hat{Y}_i)}{Z_i},\v(\hat{Y}_{HH})=\frac{1}{n(n-1)}\sum_{i=1}^{n}\left(\frac{\hat{Y}_i}{Z_i}-\hat{Y}_{HH} \right)^2.
\]
-
不放回不等概抽样中严格\(\mathrm{\pi PS}\)的赫维茨-汤普森估计量,\(n\)固定
\[\hat{Y}_{HT}=\sum_{i=1}^{n}\frac{y_i}{\pi_i},\V(\hat{Y}_{HT})=\sum_{i<j}^{N}(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i}{\pi_i}-\frac{Y_j}{\pi_j} \right)^2,\v_{YGS}=\sum_{i<j}^{N}\frac{\pi_i\pi_j-\pi_{ij}}{\pi_{ij}}\left(\frac{y_i}{\pi_i}-\frac{y_j}{\pi_j} \right).
\]
-
不严格\(\mathrm{\pi PS}\)的耶茨-格伦迪抽样的Raj估计量,\(n\)不固定。
\[t_i=\sum_{j=1}^{i-1}y_j+\frac{y_i}{Z_i}\left(1-\sum_{j=1}^{i-1}Z_i \right),\\hat{Y}_{Raj}=\frac{1}{n}\sum_{i=1}^{n}t_i,\v(\hat{Y}_{Raj})=\frac{1}{n(n-1)}\sum_{i=1}^{n}(t_i-\hat{Y}_{Raj})^2.
\]
-
分层二重抽样。
\[\bar{y}_{stD}=\sum_{h=1}^{L}w_h‘\bar{y}_h,\V(\bar{y}_{stD})=\left(\frac{1}{n‘}-\frac{1}{N} \right)S^2+\sum_{h=1}^{L}\frac{W_h^2S_h^2}{n‘}\left(\frac{1}{f_{hD}}-1 \right),\v(\bar{y}_{stD})\approx \sum_{h=1}^{L}\frac{w_h‘s_h^2}{n_h}+\frac{1}{n‘}\sum_{h=1}^{L}w_h‘(\bar{y}-\bar{y}_{stD}).
\]
-
分层抽样比估计。
\[\bar{y}_{RD}=\frac{\bar{y}}{\bar{x}}\bar{x}‘,\quad r=\frac{\bar{y}}{\bar{x}},\V(\bar{y}_{RD})\approx \left(\frac{1}{n‘}-\frac{1}{N} \right)S_y^2+\left(\frac{1}{n}-\frac{1}{n‘} \right)(S_y^2-2RS_{yx}+R^2S_x^2),\v(\bar{y}_{RD})=\frac{1}{n}s_{y}^2+\left(\frac{1}{n}-\frac{1}{n‘} \right)(r^2s_{x}^2-2rs_{yx}).
\]
-
等距等概率系统抽样。
\[\bar{y}_{sy}=\frac{1}{n}\sum_{i=1}^{n}\bar{y}_{i},\V(\bar{y}_{sy})=\frac{N-1}{N}S^2-\frac{k(n-1)}{N}S_{wsy}^2.
\]
-
捕获再捕获抽样。
\[\tilde{N}=\frac{(n_1+1)(n_2+1)}{m+1}-1,\v(\tilde{N})=\frac{(n_1+1)(n_2+1)(n_1-m)(n_2-m)}{(m+1)^2(m+2)}.
\]
其他公式
-
分层抽样的最优分配与Neyman分配:
\[n_h\propto\frac{W_hS_h}{\sqrt{c_n}}\xlongequal{c_n=c}W_hS_h.
\]
-
整群抽样的三大方差以及相应的估计:
\[S^2=\frac{1}{NM}\sum_{i=1}^{N}\sum_{j=1}^{M}(Y_{ij}-\bar{\bar{Y}})^2,\S_b^2=\frac{1}{N-1}\sum_{i=1}^{N}M(\bar{Y}_i-\bar{\bar{Y}})^2,\S_{w}^2=\frac{1}{N(M-1)}\sum_{i=1}^{N}\sum_{j=1}^{M}(Y_{ij}-\bar{Y}_i)^2,\s_b^2=\frac{1}{n-1}\sum_{i=1}^{n}M(\bar{y}_i-\bar{\bar{y}})^2,\s_w^2=\frac{1}{n(M-1)}\sum_{i=1}^{n}\sum_{j=1}^{M}(Y_{ij}-\bar{Y}_i)^2.
\]
-
整群抽样的群内相关系数估计,设计效应:
\[\hat\rho_c=\frac{s_b^2-s_w^2}{s_b^2+(M-1)s_w^2},\deff\approx 1+(M-1)\hat\rho_c.
\]
-
Brewer抽样方法抽取第一个样本的概率,入样概率:
\[Z_i^*\propto\frac{Z_i(1-Z_i)}{1-2Z_i},\\pi_i=2Z_i,\\pi_{ij}=\frac{4Z_iZ_j(1-Z_i-Z_i)}{(1-2Z_i)(1-2Z_j)\displaystyle{\left(1+\sum_{i=1}^{N}\frac{Z_i}{1-2Z_i}\right)}}.
\]
-
水野法抽取第一个样本的概率:
\[Z_i^*=\frac{n(N-1)Z_i}{N-n}-\frac{n-1}{N-n}.
\]
-
分层二重抽样的最优方差比:
\[f_{hD}=S_h\sqrt{\frac{c_1}{c_{2h}\displaystyle{\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}}}.
\]
-
二重抽样比估计的最优二重抽样比:
\[f=\sqrt{\frac{c_1(S_y^2+R^2S_x^2-2RS_{yx})}{c_2(2RS_{yx}-R^2S_x^2)}}.
\]
抽样调查:证明与练习
原文:https://www.cnblogs.com/jy333/p/Sampling_Survey_Extra.html