《数理统计》课程教学资源(参考资料)Large sample properties of MLE 05

84 Section 8.5.Breakdown of assumptions Non-Existence of the MLE Multiple Solutions to Maximization Problem Multiple Solutions to Score Equations Number of Parameters Increase with the Sample Size Support of p(z;0)depends on 0 ●Non-I.I.D.Data
84 Section 8.5. Breakdown of assumptions • Non-Existence of the MLE • Multiple Solutions to Maximization Problem • Multiple Solutions to Score Equations • Number of Parameters Increase with the Sample Size • Support of p(x; θ) depends on θ • Non-I.I.D. Data

85 Non-Existence of the MLE The non-existence of the MLE may occur for all values of m or for only some of them.In general,this is due either to the fact that the parameter space is not compact or that the log-likelihood is discontinuous in 0. Example 8.1:Suppose that X~Bernoulli(1/(1+exp(0)),where e=R.If we observe z =1,then L(;1)=1/(1+exp(0)).The likelihood function is a decreasing function of 0 and the maximum is not attained on If were closed,i.e.,=R,the MLE would be -oo. Example 8.2:Suppose that X~Normal(u,o2).So,0=(u,o2) and日=R×R+.Now,l(0;x)ox-logo-a(z-)2.Take u=x.Then as o→0,l(0;x)→+o.So,the MLE does not exist
85 Non-Existence of the MLE The non-existence of the MLE may occur for all values of xn or for only some of them. In general, this is due either to the fact that the parameter space is not compact or that the log-likelihood is discontinuous in θ. Example 8.1: Suppose that X ∼ Bernoulli(1/(1 + exp(θ)), where Θ = R. If we observe x = 1, then L(θ; 1) = 1/(1 + exp(θ)). The likelihood function is a decreasing function of θ and the maximum is not attained on Θ. If Θ were closed, i.e., Θ = R ¯ , the MLE would be −∞. Example 8.2: Suppose that X ∼ Normal(µ, σ2). So, θ = (µ, σ2) and Θ = R × R+. Now, l(θ; x) ∝ − log σ − 12σ2 (x − µ)2. Take µ = x. Then as σ → 0, l(θ; x) → +∞. So, the MLE does not exist

86 Multiple Solutions One reason for multiple solutions to the maximization problem is non-identification of the parameter 0. Example 8.3:Suppose that Y~Normal(X0,I),where X is an n×k matrix with rank smaller than k and 0∈曰cRk.The density function is pv:0)-(2z)-a/2exp(-j(u-x0Y(v-X0) Since X is not full rank,there exists an infinite number of solutions to xo=0.That means that there exists an infinite number of 0's that generate the same density function.So,0 is not identified. Furthermore,note that the likelihood is maximized at all values of 0 satisfying X'X=X'y
86 Multiple Solutions One reason for multiple solutions to the maximization problem is non-identification of the parameter θ. Example 8.3: Suppose that Y ∼ Normal(Xθ, I), where X is an n × k matrix with rank smaller than k and θ ∈ Θ ⊂ Rk. The density function is p(y; θ) = (2π)−n/2 exp(−12(y − Xθ)(y − Xθ)) Since X is not full rank, there exists an infinite number of solutions to Xθ = 0. That means that there exists an infinite number of θ’s that generate the same density function. So, θ is not identified. Furthermore, note that the likelihood is maximized at all values of θ satisfying XXθ = Xy

87 Multiple Roots to the Score Equations Even though the score equations may have multiple roots for fixed n,we can still use our theorems to show consistency and asymptotic normality.This will work provided that as n gets large there is a unique maximum with large probability. Example 8.4:Suppose that Xn=(X1,...,Xn),where the Xi's are i.i.d.Cauchy(0,1).We assume that 0o lies in the interior of a compact setΘcR.So, 1 p(x;0)= π(1+(x-0)2) So,the log-likelihood for the full sample is l(0:x)=-nlogπ-∑log(1+(-0)2) i=1 Note that as0→±o,l(0;c)→-o
87 Multiple Roots to the Score Equations Even though the score equations may have multiple roots for fixed n, we can still use our theorems to show consistency and asymptotic normality. This will work provided that as n gets large there is a unique maximum with large probability. Example 8.4: Suppose that Xn = (X1,...,Xn), where the Xi’s are i.i.d. Cauchy(θ, 1). We assume that θ0 lies in the interior of a compact set Θ ⊂ R. So, p(x; θ) = 1 π(1 + (x − θ)2) So, the log-likelihood for the full sample is l(θ; x) = −n log π − n i=1 log(1 + (xi − θ)2) Note that as θ → ±∞, l(θ; x) → −∞

88 The score for 0 is given by 立部 2(xc-0) de =1 As the picture below demonstrates,there can be multiple roots to the score equations
88 The score for θ is given by dl(θ; x) dθ = n i=1 2(xi − θ) 1+(xi − θ)2 As the picture below demonstrates, there can be multiple roots to the score equations

89 We can verify the conditions of Theorem 8.2 to show that the MLE is consistent.First,we know that Qo()is uniquely maximized at 0o since we can show that 0o is identified.Does there exist 00o so that p(x;0)=p(x;00)?If so,then it must be the case that (x-0)2 =(x-00)2 for all x.This can only happen if =00.Thus, 0o is identified.By assumption,we know that e is compact.To show continuity of Qo()and uniform convergence in probability of Q(;Xn)to Qo(0),we appeal to the conditions of Lemma 8.3.We have to show that logp(x;0)is continuous in 0 for 0e and all x e Y.This function clearly satisfies this continuity condition. Finally,we have to show that there exists a function d(x)such that |log p(x;f)l≤d(x)for all0∈Θandx∈Y and Eo[d(X)】<o
89 We can verify the conditions of Theorem 8.2 to show that the MLE is consistent. First, we know that Q0(θ) is uniquely maximized at θ0 since we can show that θ0 is identified. Does there exist θ = θ0 so that p(x; θ) = p(x; θ0)? If so, then it must be the case that (x − θ)2 = (x − θ0)2 for all x. This can only happen if θ = θ0. Thus, θ0 is identified. By assumption, we know that Θ is compact. To show continuity of Q0(θ) and uniform convergence in probability of Q(θ; Xn) to Q0(θ), we appeal to the conditions of Lemma 8.3. We have to show that log p(x; θ) is continuous in θ for θ ∈ Θ and all x ∈ X . This function clearly satisfies this continuity condition. Finally, we have to show that there exists a function d(x) such that | log p(x; θ)| ≤ d(x) for all θ ∈ Θ and x ∈ X and Eθ0 [d(X)] < ∞

90 Note that there exist positive constants C1,C2>1 and C3 so that Ilogp(x;0)=I-log-log(1+(x-0)2) logπ+log(1+(x-0)2) C1+log(C2+C3x2)=d(x) It remains to show that Eold(X)]<oo.Note that Eold(x)C:+log(Ca+Ca00 1 -dx -a+bgc+ca+日- 1 -dx oCC =( 1 d 60 = G+人aec+ce+ao9a十西c+ a(C+C+u 1 00
90 Note that there exist positive constants C1, C2 > 1 and C3 so that | log p(x; θ)| = | − log π − log(1 + (x − θ)2)| = log π + log(1 + (x − θ)2) ≤ C1 + log(C2 + C3x2) = d(x) It remains to show that Eθ0 [d(X)] < ∞. Note that Eθ0 [d(X)] = ∞−∞{C1 + log(C2 + C3x2)} 1 π(1 + (x − θ0)2) dx = C1 + ∞−∞ log(C2 + C3x2) 1 π(1 + (x − θ0)2)dx = C1 + ∞−∞ log(C2 + C3(x + θ0)2) 1 π(1 + x2)dx = C1 + −θ0 −∞ log(C2 + C3(x + θ0)2) 1 π(1 + x2) dx + ∞−θ0 log(C2 + C3(x + θ0)2) 1 π(1 + x2) dx

91 Now,log(CCa(0)2)is equal to 00 Jx(00) ec+cae+na+西o+hw+ca+aP门+ rx(0o) which is less than -60 rx(0o) 回dr (00) ee+cae+o门++ π(1+x2) for x(0o)small enough.Both of the integrals in the sum are bounded.Similar arguments can be made for the olog(CCa())d.Thus,we know that Eeold(X)]<co
91 Now, −θ0 −∞ log(C2 + C3(x + θ0)2) 1 π(1+x2)dx is equal to −θ0 x(θ0) log(C2+C3(x+θ0)2) 1 π(1 + x2) dx+ x(θ0) −∞ log(C2+C3(x+θ0)2) 1 π(1 + x2) dx which is less than −θ0 x(θ0) log(C2 + C3(x + θ0)2) 1 π(1 + x2)dx + x(θ0) −∞ |x| π(1 + x2)dx for x(θ0) small enough. Both of the integrals in the sum are bounded. Similar arguments can be made for the ∞θ0 log(C2 + C3(x + θ0)2) 1 π(1+x2)dx. Thus, we know that Eθ0 [d(X)] < ∞.

92 Number of Parameters Increase with the Sample Size Up to now,we have implicitly assumed that the number of parameters is equal to a fixed constant k.In some cases the number of parameters increases naturally with the number of observations.In such cases,the MLE may i.no longer converge ii.may converge to a parameter value different than 0o iii.may still converge to 0o. In general,the outcome depends on the importance of the number of parameters relative to the number of observations
92 Number of Parameters Increase with the Sample Size Up to now, we have implicitly assumed that the number of parameters is equal to a fixed constant k. In some cases the number of parameters increases naturally with the number of observations. In such cases, the MLE may i. no longer converge ii. may converge to a parameter value different than θ0 iii. may still converge to θ0. In general, the outcome depends on the importance of the number of parameters relative to the number of observations

93 Example 8.5:(Neyman-Scott,Econometrika,1948) Suppose that Xn=(X1,...,Xn),where the Xi's are independent with Xi=(Xil,Xi2),Xil independent of Xi2 and Xip ~N(ui,o2) for p=1,2.We are interested in estimating the ui's and o2.In this problem,we have n+1 parameters.The likelihood function is 10amo2g)-13a(-a2x-9 2=1 It is easy to show that the MLE's are = for 2 62= 去2x。-aP i=1p=1
93 Example 8.5: (Neyman-Scott, Econometrika, 1948) Suppose that Xn = (X1,...,Xn), where the Xi’s are independent with Xi = (Xi1, Xi2), Xi1 independent of Xi2 and Xip ∼ N(µi, σ2) for p = 1, 2. We are interested in estimating the µi’s and σ2. In this problem, we have n + 1 parameters. The likelihood function is L(µ1,...,µn, σ2; xn) = n i=1 1 2πσ2 exp(− 12σ2 2 p=1 (Xip − µi)2) It is easy to show that the MLE’s are µ ˆ i = 1 2(Xi1 + Xi2) for i = 1,...,n σ ˆ2 = 1 2n n i=1 2 p=1 (Xip − µˆi)2
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《数理统计》课程教学资源(参考资料)Large sample properties of MLE 04.pdf
- 《数理统计》课程教学资源(参考资料)Large sample properties of MLE 03.pdf
- 《数理统计》课程教学资源(参考资料)Large sample properties of MLE 02.pdf
- 《数理统计》课程教学资源(参考资料)Large sample properties of MLE 01.pdf
- 《数理统计》课程教学资源(参考资料)An Inconsistent maximum likelihood estimate.pdf
- 《数理统计》课程教学资源(参考资料)Maximum Likelihood - An Introduction.pdf
- 中国科学技术大学:《数理统计》课程教学资源(课件讲义)第五讲 点估计方法(二)极大似然估计方法.pdf
- 中国科学技术大学:《数理统计》课程教学资源(课件讲义)第四讲 点估计方法(一)矩估计方法.pdf
- 中国科学技术大学:《数理统计》课程教学资源(课件讲义)第三讲 指数族与充分完备统计量.pdf
- 中国科学技术大学:《数理统计》课程教学资源(课件讲义)第二讲 统计量的分布(抽样分布).pdf
- 《数理统计》课程教学资源(参考资料)Hoeffding's Indequality 证明中的一个不等式的证明 complementary.pdf
- 《数理统计》课程教学资源(参考资料)Glivenko-Cantelli 定理的证明.pdf
- 中国科学技术大学:《数理统计》课程教学资源(PPT课件讲稿)统计学——统计数据的描述.ppt
- 中国科学技术大学:《数理统计》课程教学资源(课件讲义)第一讲 总体与样本 What Statistics can do(主讲:张伟平).ppt
- 佛山科学技术学院:《生物统计附试验设计》课程电子教案(PPT课件讲稿)第六章 方差分析(二)第二节 单因素试验资料的方差分析、第三节 两因素试验资料的方差分析.ppt
- 佛山科学技术学院:《生物统计附试验设计》课程电子教案(PPT课件讲稿)第六章 方差分析(三)第四节 方差分析的数学模型与期望均方、第五节 数据转换.ppt
- 佛山科学技术学院:《生物统计附试验设计》课程电子教案(PPT课件讲稿)第六章 方差分析 Analysis of Variance(一)第一节 方差分析的基本原理与步骤.ppt
- 佛山科学技术学院:《生物统计附试验设计》课程电子教案(PPT课件讲稿)第七章 次数资料分析——X2检验.ppt
- 佛山科学技术学院:《生物统计附试验设计》课程电子教案(PPT课件讲稿)第四章 常用概率分布.ppt
- 佛山科学技术学院:《生物统计附试验设计》课程电子教案(PPT课件讲稿)第五章 假设检验(二).ppt
- 《数理统计》课程教学资源(参考资料)THE MM, ME, ML, EL, EF AND GMM APPROACHES TO ESTIMATION - A SYNTHESIS.pdf
- 《数理统计》课程教学资源(参考资料)Another scratch proof of consistency and asymptotical normal of MLE.pdf
- 中国科学技术大学:《数理统计》课程教学资源(课件讲义)第六讲 点估计方法(三)一致最小方差无偏估计.pdf
- 中国科学技术大学:《数理统计》课程教学资源(课件讲义)第七讲 区间估计(一)置信区间.pdf
- 中国科学技术大学:《数理统计》课程教学资源(课件讲义)第八讲 区间估计(二)容忍区间.pdf
- 中国科学技术大学:《数理统计》课程教学资源(课件讲义)第九讲 参数假设检验(一).pdf
- 《数理统计》课程教学资源(参考资料)How do we do hypothesis testing.pdf
- 中国科学技术大学:《数理统计》课程教学资源(课件讲义)第十讲 参数假设检验(二).pdf
- 中国科学技术大学:《数理统计》课程教学资源(课件讲义)第十一讲 参数假设检验(三).pdf
- 《数理统计》课程教学资源(参考资料)Likelihood Ratio, Wald, and(Rao)Score Tests.pdf
- 中国科学技术大学:《数理统计》课程教学资源(课件讲义)第十二讲 非参数检验(一).pdf
- 中国科学技术大学:《数理统计》课程教学资源(课件讲义)第十二讲 非参数检验(二).pdf
- 中国科学技术大学:《数理统计》课程教学资源(课件讲义)第十三讲 Bayes统计初步(Bayes方法和统计决策理论).pdf
- 《数理统计》课程教学资源(参考资料)THE FORMAL DEFINITION OF REFERENCE PRIORS, ANNALS, 2009.pdf
- 《数理统计》课程教学资源(参考资料)Bayes Factor - What They Are and What They Are Not.pdf
- 中国科学技术大学:《数理统计》课程教学资源(课件讲义)第十四讲 回归分析(线性回归模型).pdf
- 中国科学技术大学:《实用统计软件》课程课件讲义(统计计算与软件)第一讲 R语言基础(一).pdf
- 《实用统计软件》课程教学资源(阅读材料)R for beginner(中文第二版,共七章).pdf
- 中国科学技术大学:《实用统计软件》课程课件讲义(统计计算与软件)第二讲 R语言基础(二).pdf
- 中国科学技术大学:《实用统计软件》课程课件讲义(统计计算与软件)第三讲 LaTeX科技论文排版系统.pdf