《实用非参数统计》课程教学资源(阅读材料)一些理论 Bootstrap and Jackknife Estimation of Sampling Distributions

1 Chapter 8 Bootstrap and Jackknife Estimation of Sampling Distributions 1.A General View of the Bootstrap 2.Bootstrap Methods 3.The Jackknife 4.Some limit theory for bootstrap methods 5.The bootstrap and the delta method 6.Bootstrap Tests and Bootstrap Confidence Intervals 7.M-Estimators and the Bootstrap
1 Chapter 8 Bootstrap and Jackknife Estimation of Sampling Distributions 1. A General View of the Bootstrap 2. Bootstrap Methods 3. The Jackknife 4. Some limit theory for bootstrap methods 5. The bootstrap and the delta method 6. Bootstrap Tests and Bootstrap Confidence Intervals 7. M - Estimators and the Bootstrap

2
2

Chapter 8 Bootstrap and Jackknife Estimation of Sampling Distributions 1 A General view of the bootstrap We begin with a general approach to bootstrap methods.The goal is to formulate the ideas in a context which is free of particular model assumptions. Suppose that the data x~Pa∈P={Pg:B∈曰}.The parameter space曰is allowed to be very general;it could be a subset of R(in which case the model P is a parametric model),or it could be the distributions of all i.i.d.sequences on some measurable space (Y,A)(in which case the model P is the "nonparametric i.i.d."model). Suppose that we have an estimator 0 of 0,and thereby an estimator P of Pa.Consider estimation of: A.The distribution of 0:e.g.P(0 A)=Po(0(X)EA)for a measurable subset A of e; B.f日cRk,Vara(gT(X)for a fixed vector a∈Rk Natural (ideal)bootstrap estimators of these parameters are provided by: A'.Pa(0(X*)∈A): B'.Varo(aT0(X*)). While these ideal bootstrap estimators are often difficult to compute exactly,we can often obtain Monte-Carlo estimates thereof by sampling fromm P:let Xi,...,X be i.i.d.with common distribution P,and calculate 0(X;)for j=1,...,B.Then Monte-Carlo approximations (or implementations)of the bootstrap estimators in A'and B'are given by A".B-1∑B11{X)∈A: B”.B-1∑Ba6X)-B-1∑B1TX》2 If p is a parametric model,the above approach yields a parametric bootstrap.If P is a nonparametric model,then this yields a nonparametric bootstrap.In the following section,we try to make these ideas more concrete first in the context of X =(X1,...,Xn)i.i.d.F or P with P nonparametric so that Po=Fx...x F and P=Fn x...x Fn.Or,if the basic underlying sample space for each Xi is not R,Pa=P×…×P and Pa=PnX·×Pn
Chapter 8 Bootstrap and Jackknife Estimation of Sampling Distributions 1 A General view of the bootstrap We begin with a general approach to bootstrap methods. The goal is to formulate the ideas in a context which is free of particular model assumptions. Suppose that the data X ∼ Pθ ∈ P = {Pθ : θ ∈ Θ}. The parameter space Θ is allowed to be very general; it could be a subset of R k (in which case the model P is a parametric model), or it could be the distributions of all i.i.d. sequences on some measurable space (X , A) (in which case the model P is the “nonparametric i.i.d.” model). Suppose that we have an estimator ˆθ of θ ∈ Θ, and thereby an estimator Pθˆ of Pθ. Consider estimation of: A. The distribution of ˆθ: e.g. Pθ( ˆθ ∈ A) = Pθ( ˆθ(X) ∈ A) for a measurable subset A of Θ; B. If Θ ⊂ R k , V arθ(a T ˆθ(X)) for a fixed vector a ∈ R k . Natural (ideal) bootstrap estimators of these parameters are provided by: A0 . Pθˆ( ˆθ(X∗ ) ∈ A); B0 . V arθˆ(a T ˆθ(X∗ )). While these ideal bootstrap estimators are often difficult to compute exactly, we can often obtain Monte-Carlo estimates thereof by sampling fromm Pθˆ : let X∗ 1 , . . . , X∗ B be i.i.d. with common distribution Pθˆ, and calculate ˆθ(X∗ j ) for j = 1, . . . , B. Then Monte-Carlo approximations (or implementations) of the bootstrap estimators in A’ and B’ are given by A00 . B−1 PB j=1 1{ ˆθ(X∗ j ) ∈ A}; B00 . B−1 PB j=1(a T ˆθ(X∗ j ) − B−1 PB j=1 a T ˆθ(X∗ j ))2 . If P is a parametric model, the above approach yields a parametric bootstrap. If P is a nonparametric model, then this yields a nonparametric bootstrap. In the following section, we try to make these ideas more concrete first in the context of X = (X1, . . . , Xn) i.i.d. F or P with P nonparametric so that Pθ = F × · · · × F and Pθˆ = Fn × · · · × Fn. Or, if the basic underlying sample space for each Xi is not R, Pθ = P × · · · × P and Pθˆ = Pn × · · · × Pn. 3

4CHAPTER 8.BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS 2 Bootstrap Methods We begin with a discussion of Efron's nonparametric bootstrap;we will then discuss some of the many alternatives. Efron's nonparametric bootstrap Suppose that T(F)is some (real-valued)functional of F.If X1,...,Xn are i.i.d.with dis- tribution function F,then we estimate T(F)by T(Fn)=Tn where Fn is the empirical d.f. Fn=n-1 )More generally,if T(P)is some functional of P and X1,...,Xn are i.i.d.P,then a natural estimator of T(P)is just T(Pn)where Pn is the empirical measure Pn=n-1∑g1dx: Consider estimation of: A.bn(F)=nEF(Tn)-T(F). B.no2(F)≡nVarF(Tn). C.K3.n(F)=EF[Tn-EF(Tn)]3/n(F). D.Hn(x,F)=Pr(Vn(Tn-T(F))<). E.Kn(c,F)≡Pr(√nFn-Flo≤x) F.Ln(,P)=Prp(vnPn-Px)where F is a class of functions for which the central limit theorem holds uniformly over F(i.e.a Donsker class). The (ideal)nonparametric bootstrap estimates of these quantities are obtained simply via the substitution principle:if F(or P)is unknown,estimate it by the empirical distribution function Fn(or the empirical measure Pn).This yields the following nonparametric bootstrap estimates in examples A-F: A'.bn(Fn)=nfEEn (Tn)-T(Fn)} B'.noi(Fn)=nVarg,(Tn). C/.K3.n(Fn)=Eg [Tn-EF (Tn)]3/on(Fn). D'.Hn(a,Fn)≡Pgn(m(Tn-T(fn)≤x): E'.Kn(x,Fn)≡Pgn(VFt-Fnlo≤x) F'.Ln(,Pn)=Prp (vnlPh -PnllF x)where F is a class of functions for which the central limit theorem holds uniformly over F(i.e.a Donsker class). Because we usually lack closed-form expressions for the ideal bootstrap estimators in A'-F, evaluation of A'-F is usually indirect.Since the empirical d.f.Fn is discrete (with all its mass at the data),we could,in principle enumerate all possible samples of size n from Fn(or Pn)with replacement.If n is large,this is a large number,however:n".Problem:show that the number of distinct bootstrap samples is(] On the other hand,Monte-Carlo approximations to A'-F are easy:let (X1,,Xjm)j=1,,B
4CHAPTER 8. BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS 2 Bootstrap Methods We begin with a discussion of Efron’s nonparametric bootstrap; we will then discuss some of the many alternatives. Efron’s nonparametric bootstrap Suppose that T(F) is some (real-valued) functional of F. If X1, . . . , Xn are i.i.d. with distribution function F, then we estimate T(F) by T(Fn) ≡ Tn where Fn is the empirical d.f. Fn ≡ n −1 Pn i=1 1{Xi ≤ x}. More generally, if T(P) is some functional of P and X1, . . . , Xn are i.i.d. P, then a natural estimator of T(P) is just T(Pn) where Pn is the empirical measure Pn = n −1 Pn i=1 δXi . Consider estimation of: A. bn(F) ≡ n{EF (Tn) − T(F)}. B. nσ2 n (F) ≡ nV arF (Tn). C. κ3,n(F) ≡ EF [Tn − EF (Tn)]3/σ3 n (F). D. Hn(x, F) ≡ PF ( √ n(Tn − T(F)) ≤ x). E. Kn(x, F) ≡ PF ( √ nkFn − Fk∞ ≤ x). F. Ln(x, P) ≡ P rP ( √ nkPn − PkF ≤ x) where F is a class of functions for which the central limit theorem holds uniformly over F (i.e. a Donsker class). The (ideal) nonparametric bootstrap estimates of these quantities are obtained simply via the substitution principle: if F (or P) is unknown, estimate it by the empirical distribution function Fn (or the empirical measure Pn). This yields the following nonparametric bootstrap estimates in examples A - F: A0 . bn(Fn) ≡ n{EFn (Tn) − T(Fn)}. B0 . nσ2 n (Fn) ≡ nV arFn (Tn). C0 . κ3,n(Fn) ≡ EFn [Tn − EFn (Tn)]3/σ3 n (Fn). D0 . Hn(x, Fn) ≡ PFn ( √ n(Tn − T(Fn)) ≤ x). E0 . Kn(x, Fn) ≡ PFn ( √ nkF ∗ n − Fnk∞ ≤ x). F 0 . Ln(x, Pn) ≡ P rPn ( √ nkP ∗ n − PnkF ≤ x) where F is a class of functions for which the central limit theorem holds uniformly over F (i.e. a Donsker class). Because we usually lack closed - form expressions for the ideal bootstrap estimators in A0 - F0 , evaluation of A0 - F0 is usually indirect. Since the empirical d.f. Fn is discrete (with all its mass at the data), we could, in principle enumerate all possible samples of size n from Fn (or Pn) with replacement. If n is large, this is a large number, however: n n . [Problem: show that the number of distinct bootstrap samples is 2n−1 n .] On the other hand, Monte-Carlo approximations to A0 − F 0 are easy: let (X∗ j1 , . . . , X∗ jn) j = 1, . . . , B

2. BOOTSTRAP METHODS 5 be B independent samples of size n drawn with replacement from Fn(or Pn);let Fn()≡n be the empirical d.f.of the j-th sample,and let Tn≡T(Fn),j=1,.,B Then approximations of A'-F are given by: A".bB三n{a∑月Tn-Tn} B”.no品g=n∑月1(Tn-T2 C".Kn.B(Tin -Ta)3/onB D”.H.B(x)=a∑Bl{V元(Tn-Tn)≤x以. E”.K.B(x)≡a∑Bl{VF防n-Fne≤x以 F”.克B()=言∑B1l{VPm-PF≤x以. For fixed sample size n and data Fn,it follows from the Glivenko-Cantelli theorem (applied to the bootstrap sampling)that sup,B(x)-Hn(c,Fn)l→as.0asB→oo, and,by Donsker's theorem, VB(HtB(x)-Hn(x,Fn)》→U*(Hn(x,Fn)asB→o. Moreover,by the Dvoretzky,Kiefer,Wolfowitz (1956)inequality P(Un >A)0 where the constant 2 before the exponential comes via Massart (1990)), P(sup.B(c)-Hn(x,Fn)川≥e)≤2exp(-2Be2). For a given e>0 we can make this probability as small as we please by choosing B (over which we have complete control given sufficient computing power)sufficiently large.Since the deviations of H"B from Hn(,Fn)are so well -understood and controlled,much of our discussion below will focus on the differences between Hn(x,Fn)and Hn(,F). Sometimes it is possible to compute the distribution of the bootstrap estimator explicitly with out resort to Monte-Carlo;here is an example of this kind. Example 2.1 (The distribution of the bootstrap estimator of the median).Suppose that T(F)= F-1(1/2).Then T(Fn)=Fn1(1/2)=Xm+l/2 and T()=F路-1(1/2)=Xm+1/2
2. BOOTSTRAP METHODS 5 be B independent samples of size n drawn with replacement from Fn (or Pn); let F ∗ j,n(x) ≡ n −1Xn i=1 1[X∗ j,i≤x] be the empirical d.f. of the j−th sample, and let T ∗ j,n ≡ T(F ∗ j,n), j = 1, . . . , B. Then approximations of A0 − F 0 are given by: A00 . b ∗ n,B ≡ n n 1 B PB j=1 T ∗ j,n − Tn o . B00 . nσ∗2 n,B ≡ n 1 B PB j=1(T ∗ j,n − T∗ n ) 2 . C00 . κ ∗ 3,n,B ≡ 1 B PB j=1(T ∗ j,n − T∗ n ) 3/σ∗3 n,B. D00 . H∗ n,B(x) ≡ 1 B PB j=1 1{ √ n(T ∗ j,n − Tn) ≤ x}. E00 . K∗ n,B(x) ≡ 1 B PB j=1 1{ √ nkF ∗ j,n − Fnk∞ ≤ x}. F 00 . L ∗ n,B(x) ≡ 1 B PB j=1 1{ √ nkP ∗ j,n − PnkF ≤ x}. For fixed sample size n and data Fn, it follows from the Glivenko - Cantelli theorem (applied to the bootstrap sampling) that sup x |H∗ n,B(x) − Hn(x, Fn)| →a.s. 0 as B → ∞, and, by Donsker’s theorem, √ B(H∗ n,B(x) − Hn(x, Fn)) ⇒ U ∗∗(Hn(x, Fn)) as B → ∞. Moreover, by the Dvoretzky, Kiefer, Wolfowitz (1956) inequality ( P(kUnk ≥ λ) ≤ 2 exp(−2λ 2 ) for all n and λ > 0 where the constant 2 before the exponential comes via Massart (1990)), P(sup x |H∗ n,B(x) − Hn(x, Fn)| ≥ ) ≤ 2 exp(−2B2 ). For a given > 0 we can make this probability as small as we please by choosing B (over which we have complete control given sufficient computing power) sufficiently large. Since the deviations of H∗ n,B from Hn(x, Fn) are so well -understood and controlled, much of our discussion below will focus on the differences between Hn(x, Fn) and Hn(x, F). Sometimes it is possible to compute the distribution of the bootstrap estimator explicitly with out resort to Monte-Carlo; here is an example of this kind. Example 2.1 (The distribution of the bootstrap estimator of the median). Suppose that T(F) = F −1 (1/2). Then T(Fn) = F −1 n (1/2) = X([n+1]/2) and T(F ∗ n ) = F ∗−1 n (1/2) = X∗ ([n+1]/2)

6CHAPTER 8.BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS Let m [n+1]/2,and let Mj=#=Xj(w):i=1,...,n},j=1,...,n so that M=(M,...,Mn)~Multn(n,(1/n,...,1/n)). Now[Xim)>X((a]=[nF(X(k)(u)≤m-,and hence P(T(F克)=Xtm>X((o)lFn)=P(n(X((a》≤m-1Fn) P(Binomial(n,k/n)x)=P(X(m)>2)=P(nFn()and is given by 2=E(XY-p,X-1,y2-1)82: here Xs =(X-ux)/ax and Ys =(Y-uy)/oy are the standardized variables.If F is bivariate normal,then V2 =(1-p2)2. Consider estimation of the standard deviation of pn: on(F)=Varr(pn))112. The normal theory estimator of on(F)is (1-2)/vn-3. The delta-method estimate of on(F)is =(Var(z -(p/2)+
6CHAPTER 8. BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS Let m = [n + 1]/2, and let Mj ≡ #{X∗ i = Xj (ω) : i = 1, . . . , n}, j = 1, . . . , n so that M ≡ (M1, . . . , Mn) ∼ Multn(n,(1/n, . . . , 1/n)). Now [X∗ (m) > X(k) (ω)] = [nF ∗ n (X(k) (ω)) ≤ m − 1], and hence P(T(F ∗ n ) = X∗ (m) > X(k) (ω)|Fn) = P(nF ∗ n (X(k) (ω)) ≤ m − 1|Fn) = P(Binomial(n, k/n) ≤ m − 1) = mX−1 j=0 n j (k/n) j (1 − k/n) n−j , while P(Tn > x) = P(X(m) > x) = P(nFn(x) < m) = mX−1 j=0 n j F(x) j (1 − F(x))n−j . This implies that P(T(F ∗ n ) = X(k) (ω)|Fn) = mX−1 j=0 ( n j k − 1 n j 1 − k − 1 n n−j − n j k n j 1 − k n n−j ) for k = 1, . . . , n. Example 2.2 (Standard deviation of a correlation coefficient estimator). Let T(F) = ρ(F) where F is the bivariate distribution of a pair of random variables (X, Y ) with finite fourth moments. We know from chapter 2 that the sample correlation coefficient ˆρn ≡ T(Fn) satisfies √ n(ˆρn − ρ) ≡ √ n(ρ(Fn) − ρ(F)) →d N(0, V 2 ) where V 2 = V ar[Z1 − (ρ/2)[Z2 + Z3]] where Z ≡ (Z1, Z2, Z3) ∼ N3(0, Σ) and Σ is given by Σ = E(XsYs − ρ, X2 s − 1, Y 2 s − 1)⊗2 ; here Xs ≡ (X − µX)/σX and Ys ≡ (Y − µY )/σY are the standardized variables. If F is bivariate normal, then V 2 = (1 − ρ 2 ) 2 . Consider estimation of the standard deviation of ˆρn: σn(F) ≡ {V arF (ˆρn)} 1/2 . The normal theory estimator of σn(F) is (1 − ρˆ 2 n )/ √ n − 3. The delta-method estimate of σn(F) is Vˆ n √ n = {V ar d[Z1 − (ρ/2)[Z2 + Z3]]} 1/2 / √ n

2. BOOTSTRAP METHODS 7 The (Monte-Carlo approximation to)the bootstrap estimate of on(F)is B B-1m-p2. 1=1 Finally the jackknife estimate of on(F)is n- n -2: 11 see the beginning of section 2 for the notation used here.We will discuss the jackknife further in sections 2 and 4. Parametric Bootstrap Methods Once the idea of nonparametric bootstrapping(sampling from the empirical measure Pn)be- comes clear,it seems natural to consider sampling from other estimators of the unknown P.For example,if we are quite confident that some parametric model holds,then it seems that we should consider bootstrapping by sampling from an estimator of P based on the parametric model.Here is a formal description of this type of model-based bootstrap procedure. Let (A)be a measurable space,and let P={P:0e}be a model,parametric,semi- parametric or nonparametric.We do not insist that e be finite-dimensional.For example, in a parametric extreme case p could be the family of all normal (Gaussian)distributions on (,A)=(R4,Bd).Or,to give a nonparametric example with only a smoothness restriction,P could be the family of all distributions on(,A)=(Ra,Bd)with a density with respect to Lebesgue measure which is uniformly continuous. Let X1,...,Xn,...be i.i.d.with distribution PE P.We assume that there exists an estimator =(X1,...,Xn)of.Then Efron's parametric (or model-based)bootstrap proceeds by sam- pling from the estimated or fitted model P=P:suppose that ,..are independent and identically distributed with distribution P on (,A),and let (1) =the parametric bootstrap empirical measure. i=1 The key difference between this parametric bootstrap procedure and the nonparametric bootstrap discussed earlier in this section is that we are now sampling from the model-based estimator P=p of P rather than from the nonparametric estimator Pn. Example 2.3 Suppose that X1,...,Xn are i.i.d.Po=N(u,o2)where =(u,o2).Let on= (n,)=(n:2)where 2 is the usual unbiased estimator of o2,and hence n(an-四~tn-, On -)品心xX- 2 Now P=N(),and ifiare i.i.d.P then the bootstrap estimators=(2) satisfy,conditionally on Fn, Vn(inin)~tn-1, 壳 u-1)2~X2-r 6 Thus the bootstrap estimators have exactly the same distributions as the original estimators in this case
2. BOOTSTRAP METHODS 7 The (Monte-Carlo approximation to) the bootstrap estimate of σn(F) is vuutB−1X B j=1 [ρb ∗ j − ρ ∗] 2. Finally the jackknife estimate of σn(F) is vuut n − 1 n Xn j=1 [ρb(i) − ρb(·) ] 2; see the beginning of section 2 for the notation used here. We will discuss the jackknife further in sections 2 and 4. Parametric Bootstrap Methods Once the idea of nonparametric bootstrapping (sampling from the empirical measure Pn) becomes clear, it seems natural to consider sampling from other estimators of the unknown P. For example, if we are quite confident that some parametric model holds, then it seems that we should consider bootstrapping by sampling from an estimator of P based on the parametric model. Here is a formal description of this type of model - based bootstrap procedure. Let (X , A) be a measurable space, and let P = {Pθ : θ ∈ Θ} be a model, parametric, semiparametric or nonparametric. We do not insist that Θ be finite - dimensional. For example, in a parametric extreme case P could be the family of all normal (Gaussian) distributions on (X , A) = (R d , B d ). Or, to give a nonparametric example with only a smoothness restriction, P could be the family of all distributions on (X , A) = (R d , B d ) with a density with respect to Lebesgue measure which is uniformly continuous. Let X1, . . . , Xn, . . . be i.i.d. with distribution Pθ ∈ P. We assume that there exists an estimator ˆθn = ˆθn(X1, . . . , Xn) of θ. Then Efron’s parametric (or model - based) bootstrap proceeds by sampling from the estimated or fitted model Pθˆ(ω) ≡ Pˆω n : suppose that X∗ n,1 , . . . , X∗ n,n are independent and identically distributed with distribution Pˆω n on (X , A), and let P ∗ n ≡ n −1Xn i=1 δX∗ n,i (1) ≡ the parametric bootstrap empirical measure . The key difference between this parametric bootstrap procedure and the nonparametric bootstrap discussed earlier in this section is that we are now sampling from the model - based estimator Pˆ n = pθˆn of P rather than from the nonparametric estimator Pn. Example 2.3 Suppose that X1, . . . , Xn are i.i.d. Pθ = N(µ, σ2 ) where θ = (µ, σ2 ). Let ˆθn = (ˆµn, σˆ 2 n ) = (Xn, S2 n ) where S 2 n is the usual unbiased estimator of σ 2 , and hence √ n(ˆµn − µ) σˆn ∼ tn−1, (n − 1)ˆσ 2 n σ 2 ∼ χ 2 n−1 . Now Pθˆn = N(ˆµn, σˆ 2 n ), and if X∗ 1 , . . . , X∗ n are i.i.d. Pθˆn , then the bootstrap estimators ˆθ ∗ n = (ˆµ ∗ n , σˆ ∗2 n ) satisfy, conditionally on Fn, √ n(ˆµ ∗ n − µˆn) σˆ ∗ n ∼ tn−1, (n − 1)ˆσ ∗2 n σˆ 2 n ∼ χ 2 n−1 . Thus the bootstrap estimators have exactly the same distributions as the original estimators in this case

8CHAPTER 8.BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS Example 2.4 Suppose that X1,...,Xn are i.i.d.Po=exponential(1/0):Po(X1 >t)=exp(-t/0) for t 0.Then n =Xn and non/0 Gamma(n,1).Now Pi =exponential(1/n),and if Xi,...,are i.i.d.P,then n=n has (non/nn)Gamma(n,1),so the bootstrap distribution replicates the original estimator exactly. Example 2.5 (Bootstrapping from a "smoothed empirical measure";or the "smoothed boot- strap”).Suppose that P={P on (Ra,Bd):p= dp d入 exists and is uniformly continuous. Then one way to estimate P so that our estimator PnE P is via a kernel estimator of the density p: in(d)=i ∫() dPn(y) where k:Rd->R is a uniformly continuous density function.Then Pn is defined for CA by n(C)= pn(x)dx, and the model-based bootstrap proceeds by sampling from Pn There are many other examples of this type involving nonparametric or semiparametric models P.For some work on "smoothed bootstrap"methods see e.g.Silverman and Young (1987)and Hall,DiCiccio,and Romano (1989). Exchangeably-weighted and "Bayesian"bootstrap methods In the course of example 5.1 we introduced the vector M of counts of how many times the bootstrap variables X;equal the observations Xi(w)in the underlying sample.Thinking about the process of sampling at random (with replacement)from the population described by the empirical measure Pn,it becomes clear that we can think of the bootstrap empirical measure P as the empirical measure with multinomial random weights: P= 1∑x:= M:6x:(@) i= This view of Efron's nonparametric bootstrap as the empirical measure with random weights sug- gests that we could obtain other random measures which would behave much the same way as Efron's nonparametric bootstrap,but without the same random sampling interpretation,by re- placing the vector of multinomial weights by some other random vector W.One of the possible deficiencies of the nonparametric bootstrap involves its "discreteness"via missing observations in the original sample:note that the number of points of the original sample which are missed (or not given any bootstrap weight)is Nn=#jn:M=0)=>11{M=0).hence the proportion of observations missed by the bootstrap is n-1Nn,and the expected number proportion of missed observations is E(n-1Nn)=P(M=0)=(1-1/n)”→e-1=.36787.…
8CHAPTER 8. BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS Example 2.4 Suppose that X1, . . . , Xn are i.i.d. Pθ = exponential(1/θ): Pθ(X1 > t) = exp(−t/θ) for t ≥ 0. Then ˆθn = Xn and nˆθn/θ ∼ Gamma(n, 1). Now Pθˆn = exponential(1/ ˆθn), and if X∗ 1 , . . . , X∗ n are i.i.d. Pθˆn , then ˆθ ∗ n = X ∗ n has (nˆθ ∗ n/ ˆθn|Fn) ∼ Gamma(n, 1), so the bootstrap distribution replicates the original estimator exactly. Example 2.5 (Bootstrapping from a “smoothed empirical measure”; or the “smoothed bootstrap”). Suppose that P = {P on (R d , B d ) : p = dP dλ exists and is uniformly continuous}. Then one way to estimate P so that our estimator Pˆ n ∈ P is via a kernel estimator of the density p: pˆn(x) = 1 b d n Z k y − x bn dPn(y) where k : R d → R is a uniformly continuous density function. Then Pˆ n is defined for C ∈ A by Pˆ n(C) = Z C pˆn(x)dx, and the model- based bootstrap proceeds by sampling from Pˆ n. There are many other examples of this type involving nonparametric or semiparametric models P. For some work on “smoothed bootstrap” methods see e.g. Silverman and Young (1987) and Hall, DiCiccio, and Romano (1989). Exchangeably - weighted and “Bayesian” bootstrap methods In the course of example 5.1 we introduced the vector M of counts of how many times the bootstrap variables X∗ i equal the observations Xj (ω) in the underlying sample. Thinking about the process of sampling at random (with replacement) from the population described by the empirical measure Pn, it becomes clear that we can think of the bootstrap empirical measure P ∗ n as the empirical measure with multinomial random weights: P ∗ n = 1 n Xn i=1 δX∗ i = 1 n Xn i=1 MiδXi(ω) . This view of Efron’s nonparametric bootstrap as the empirical measure with random weights suggests that we could obtain other random measures which would behave much the same way as Efron’s nonparametric bootstrap, but without the same random sampling interpretation, by replacing the vector of multinomial weights by some other random vector W. One of the possible deficiencies of the nonparametric bootstrap involves its “discreteness” via missing observations in the original sample: note that the number of points of the original sample which are missed (or not given any bootstrap weight) is Nn ≡ #{j ≤ n : Mj = 0} = Pn j=1 1{Mj = 0}. hence the proportion of observations missed by the bootstrap is n −1Nn, and the expected number proportion of missed observations is E(n −1Nn) = P(M1 = 0) = (1 − 1/n) n → e −1 ˙=.36787 . . .

2. BOOTSTRAP METHODS 9 Moreover,from occupancy theory for urn models vn(n-1Nn-(1-1/m))aN(0,e-1(1-2e-1)=N(0,.09720887.…)月 see e.g.Johnson and Kotz(1977),page 317,3.with r =0.]By using some other vector of exchangeable weights W rather than Mn~Multn(n,(1/n,...,1/n)),we might be able to avoid some of this discreteness caused by multinomial weights. Since the resulting measure should be a probability measure,it seems reasonable to require that the components of W should sum to n.Since the multinomial random vector with cell probabilities all equal to 1/n is exchangeable,it seems reasonable to require that the vector W have an exchangeable distribution:i.e.W=(W(1),...,W(n))4W for all permutations of {1,..,n}.Then PW Wni6X:(w) i=1 is called the exchangeably weighted bootstrap empirical measure corresponding to the weight vector W.Here are several examples. Example 2.6 (Dirichlet weights).Suppose that Yi,Y2,...are i.i.d.exponential(1)random vari- ables,and set nYi Wni三 yi+…+Yn i=1,.,n. The resulting random vector W/n has a Dirichlet(1,...,1)distribution;i.e.n-WD where the Di's are the spacings of a random sample of n-1 Uniform(0,1)random variables Example 2.7 (More general continuous weights).Other weights W of the same for as in example 1.6 are obtained by replacing the exponential distribution of the Y's by some other distribution on R+.It will turn out that the limit theory can be established for any of these weights as long as the Yi's satisfy YiL2.1;i.e.P(Y>t)dt 0. Other weights W based on various urn schemes are also possible;see Praestgaard and Wellner (1993)for some of these
2. BOOTSTRAP METHODS 9 [Moreover, from occupancy theory for urn models √ n(n −1Nn − (1 − 1/n) n ) →d N(0, e−1 (1 − 2e −1 )) = N(0, .09720887 . . .); see e.g. Johnson and Kotz (1977), page 317, 3. with r = 0.] By using some other vector of exchangeable weights W rather than Mn ∼ Multn(n,(1/n, . . . , 1/n)), we might be able to avoid some of this discreteness caused by multinomial weights. Since the resulting measure should be a probability measure, it seems reasonable to require that the components of W should sum to n. Since the multinomial random vector with cell probabilities all equal to 1/n is exchangeable, it seems reasonable to require that the vector W have an exchangeable distribution: i.e. πW ≡ (Wπ(1), . . . , Wπ(n) ) d= W for all permutations π of {1, . . . , n}. Then P W n ≡ 1 n Xn i=1 WniδXi(ω) is called the exchangeably weighted bootstrap empirical measure corresponding to the weight vector W. Here are several examples. Example 2.6 (Dirichlet weights). Suppose that Y1, Y2, . . . are i.i.d. exponential(1) random variables, and set Wni ≡ nYi Y1 + · · · + Yn , i = 1, . . . , n. The resulting random vector W/n has a Dirichlet(1, . . . , 1) distribution; i.e. n −1W d= D where the Di ’s are the spacings of a random sample of n − 1 Uniform(0, 1) random variables. Example 2.7 (More general continuous weights). Other weights W of the same for as in example 1.6 are obtained by replacing the exponential distribution of the Y ’s by some other distribution on R +. It will turn out that the limit theory can be established for any of these weights as long as the Yi ’s satisfy Yi ∈ L2,1; i.e. R ∞ 0 p P(|Y | > t)dt 0. Other weights W based on various urn schemes are also possible; see Praestgaard and Wellner (1993) for some of these

10CHAPTER 8.BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS 3 The Jackknife The jackknife preceded the bootstrap,mostly due to its simplicity and relative ease of computation. The original work on the "delete -one"jackknife is due to Quenouille (1949)and Tukey (1958). Here is how it works. Suppose that T(Fn)estimates T(F).Let Tn:i三T(Fn-l,i) where -1=n-∑-X方 thus Tn.i is the estimator based on the data with Xi deleted or left out.Let n We also set Tta≡nTn-(n-l)Tn,i≡ith pseudo value and Tn≡n-1∑=1Tt&=nTn-(m-1)Tn The Jackknife estimator of bias,and the jackknife estimator of T(F) Now let En=EFTn=ErT(Fn),and suppose that we can expand En in powers of n-1 as follows: En=ErTn=T(F)+)+. n2 Then the bias of the estimator Tn=T(Fn)is biasn(P)=Er(T)-TE)=1(+2E+ n n2 We can also write T(F)=EF(Tn)-biasn(F). Note that 江=-1=0+9++ Hence it follows that EF(Tn)=nEn -(n-1)En-1 n+ar}+asn{信-n}+ = =T(F)- a2(F) n(n-1) 十···。 Thus Tr has bias O(n-2)whereas Tn has bias of the order O(n-1)if a(F)0.We call T the jackknife estimator of T(F);similarly,by writing Tn=Tn -biasn, we find that biasn Tn -Tn (n-1){Tn,-Tn}
10CHAPTER 8. BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS 3 The Jackknife The jackknife preceded the bootstrap, mostly due to its simplicity and relative ease of computation. The original work on the “delete -one” jackknife is due to Quenouille (1949) and Tukey (1958). Here is how it works. Suppose that T(Fn) estimates T(F). Let Tn:i ≡ T(Fn−1,i) where Fn−1,i(x) ≡ 1 n − 1 X j6=i 1(−∞,x] (Xj ); thus Tn,i is the estimator based on the data with Xi deleted or left out. Let Tn,· ≡ 1 n Xn i=1 Tn,i. We also set T ∗ n,i ≡ nTn − (n − 1)Tn,i ≡ ith pseudo value and T ∗ n ≡ n −1 Pn i=1 T ∗ n,i = nTn − (n − 1)Tn,· . The Jackknife estimator of bias, and the jackknife estimator of T(F) Now let En ≡ EF Tn = EF T(Fn), and suppose that we can expand En in powers of n −1 as follows: En ≡ EF Tn = T(F) + a1(F) n + a2(F) n2 + · · · . Then the bias of the estimator Tn = T(Fn) is biasn(F) ≡ EF (Tn) − T(F) = a1(F) n + a2(F) n2 + · · · . We can also write T(F) = EF (Tn) − biasn(F). Note that EF Tn,· = En−1 = T(F) + a1(F) n − 1 + a2(F) (n − 1)2 + · · · . Hence it follows that EF (T ∗ n ) = nEn − (n − 1)En−1 = T(F) + a2(F) 1 n − 1 n − 1 + a3(F) 1 n2 − 1 (n − 1)2 + · · · = T(F) − a2(F) n(n − 1) + · · · . Thus T ∗ n has bias O(n −2 ) whereas Tn has bias of the order O(n −1 ) if a1(F) 6= 0. We call T ∗ n the jackknife estimator of T(F); similarly, by writing T ∗ n = Tn − bias dn, we find that bias dn = Tn − T ∗ n = (n − 1){Tn,· − Tn}
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《实用非参数统计》课程教学资源(阅读材料)Bootstrap and Jackknife.pdf
- 《实用非参数统计》课程教学资源(阅读材料)A Review on Empirical Likelihood Methods for Regression.pdf
- 《实用非参数统计》课程教学资源(阅读材料)Empirical Likelihood notes.pdf
- 《实用非参数统计》课程教学资源(阅读材料)Statistical functionals and delta method.pdf
- 《实用非参数统计》课程教学资源(阅读材料)Asymtotic relative efficiency of two tests.pdf
- 《实用非参数统计》课程教学资源(阅读材料)回归与方差分析 Practical Regression and Anova using R(共16章)Faraway-PRA.pdf
- 《实用非参数统计》课程教学资源(阅读材料)A Comparative Study of Tests for Homogeneity of Variances, with Applications to the Outer Continental Shelf Bidding Data.pdf
- 《实用非参数统计》课程教学资源(阅读材料)Chi square test with both margins fixed.pdf
- 《实用非参数统计》课程教学资源(阅读材料)Conditonal versus unconditional exact tests for comparing two binomials.pdf
- 《实用非参数统计》课程教学资源(阅读材料)Fisher's Exact Test , When to use Fisher's exact test.pdf
- 《实用非参数统计》课程教学资源(阅读材料)Association Between Variables.pdf
- 《实用非参数统计》课程教学资源(阅读材料)with Implementation in S-PLUS.pdf
- 《实用非参数统计》课程教学资源(阅读材料)Binomial Distribution - Hypothesis Testing, Confidence Intervals(CI), and Reliability.pdf
- 《实用非参数统计》课程教学资源(阅读材料)Introduction Review of R.pdf
- 中国科学技术大学:《实用非参数统计》课程教学资源(课件讲义)第一讲 参数统计与非参数统计——回顾与简介.pdf
- 《多元统计分析》课程教学资源(阅读材料)Matrix Cook Book.pdf
- 《多元统计分析》课程教学资源(阅读材料)北师大矩阵代数讲稿.pdf
- 中国科学技术大学:《多元统计分析》课程教学资源(课件讲义)附录——矩阵代数(主讲:张伟平).pdf
- 《多元统计分析》课程教学资源(阅读材料)Types of Sums of Squares.pdf
- 中国科学技术大学:《多元统计分析》课程教学资源(课件讲义)第十四讲 多元方差分析.pdf
- 《实用非参数统计》课程教学资源(阅读材料)Permutation test.pdf
- 《实用非参数统计》课程教学资源(阅读材料)Density Estimation.pdf
- 《实用非参数统计》课程教学资源(阅读材料)W.J.Braun's Nonparametric regression notes.pdf
- 《实用非参数统计》课程教学资源(阅读材料)Patrick Breheny's Spline and penalized regression note.pdf
- 中国科学技术大学:《概率论与数理统计》课程教学资源(课件讲义)简介:Why Probability and Statistics - some examples.ppt
- 中国科学技术大学:《概率论与数理统计》课程教学资源(课件讲义)第一章 事件与概率 1.1 事件及其运算、概率及其性质.pdf
- 中国科学技术大学:《概率论与数理统计》课程教学资源(课件讲义)第一章 事件与概率 1.2 古典概型和几何概率.pdf
- 中国科学技术大学:《概率论与数理统计》课程教学资源(课件讲义)第一章 事件与概率 1.3 条件概率与独立性.pdf
- 中国科学技术大学:《概率论与数理统计》课程教学资源(课件讲义)第一章 随机事件与概率(主讲:张伟平)样本空间与概率.pdf
- 中国科学技术大学:《概率论与数理统计》课程教学资源(课件讲义)第二章 随机变量及其分布.pdf
- 中国科学技术大学:《概率论与数理统计》课程教学资源(课件讲义)第二章 随机变量及其分布 2.1 离散型随机变量.pdf
- 中国科学技术大学:《概率论与数理统计》课程教学资源(课件讲义)第二章 随机变量及其分布 2.2 连续型随机变量.pdf
- 中国科学技术大学:《概率论与数理统计》课程教学资源(课件讲义)第二章 随机变量及其分布 2.3 多维随机变量.pdf
- 中国科学技术大学:《概率论与数理统计》课程教学资源(课件讲义)第二章 随机变量及其分布 2.4 条件分布与独立性.pdf
- 中国科学技术大学:《概率论与数理统计》课程教学资源(课件讲义)第二章 随机变量及其分布 2.5 随机变量的函数的分布.pdf
- 中国科学技术大学:《概率论与数理统计》课程教学资源(课件讲义)第三章 随机变量的数字特征.pdf
- 中国科学技术大学:《概率论与数理统计》课程教学资源(课件讲义)第三章 随机变量的数字特征 3.1 中心位置数字特征.pdf
- 中国科学技术大学:《概率论与数理统计》课程教学资源(课件讲义)第三章 随机变量的数字特征 3.2 方差、相关系数以及其他数字特征.pdf
- 中国科学技术大学:《概率论与数理统计》课程教学资源(课件讲义)第四章 大数律与中心极限定理.pdf
- 中国科学技术大学:《概率论与数理统计》课程教学资源(课件讲义)第四章 大数律与中心极限定理 4.1 大数律与中心极限定理.pdf