guide:3285677816: Difference between revisions

From Stochiki
No edit summary
 
mNo edit summary
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
The '''delta method''' is a result concerning the approximate probability distribution for a function of an [[wikipedia:Asymptotic distribution|asymptotically normal]] statistical estimator from knowledge of the limiting variance of that estimator.
==Method==
While the delta method generalizes easily to a multivariate setting, careful motivation of the technique is more easily demonstrated in univariate terms. Roughly, if there is a sequence of random variables <math>X_n</math> satisfying
   
   
<math display="block">{\sqrt{n}[X_n-\theta]\,\xrightarrow{D}\,\mathcal{N}(0,\sigma^2)},</math>
where  <math>\theta</math> and <math>\sigma^2</math> are finite valued constants and <math>\xrightarrow{D}</math> denotes [[wikipedia:convergence in distribution|convergence in distribution]], then
<math display="block" >
{\sqrt{n}[g(X_n)-g(\theta)]\,\xrightarrow{D}\,\mathcal{N}(0,\sigma^2[g'(\theta)]^2)}
</math>
for any function <math>g</math> satisfying the property that <math>g'(\theta) </math> exists and is non-zero valued.
The method extends to the multivariate case. By definition, a [[wikipedia:consistency (statistics)|consistent]] estimator <math>B</math> converges in probability to its true value <math>\beta</math>, and often a [[wikipedia:central limit theorem|central limit theorem]] can be applied to obtain [[wikipedia:Estimator#Asymptotic normality|asymptotic normality]]:<math display="block">\sqrt{n} (B-\beta )\,\xrightarrow{D}\,\mathcal{N}(0, \Sigma ),</math>
where ''n'' is the number of observations and <math>\Sigma</math> is a covariance matrix. The ''multivariate delta method'' yields the following asymptotic property of a function <math>h</math> of the estimator <math>B</math> under the assumption that the gradient <math>\nabla h</math> is non-zero:
<div class="card mb-4"><div class="card-header"> Proposition (Multivariate delta method) </div><div class="card-body">
<p class="card-text">
<math display="block">\sqrt{n}(h(B)-h(\beta))\,\xrightarrow{D}\,\mathcal{N}(0, \nabla h(\beta)^T \cdot \Sigma \cdot \nabla h(\beta)).</math>
</p>
<span class="mw-customtoggle-theo.deltamethod btn btn-primary" >Show Proof</span><div class="mw-collapsible mw-collapsed" id="mw-customcollapsible-theo.deltamethod"><div class="mw-collapsible-content p-3">
We start with the univariate case. Demonstration of this result is fairly straightforward under the assumption that <math>g'(\theta)</math> is [[wikipedia:Continuous function|continuous]]. To begin, we use the [[wikipedia:mean value theorem|mean value theorem]]:
:<math display="block">g(X_n)=g(\theta)+g'(\tilde{\theta})(X_n-\theta),</math>
where <math>\tilde{\theta}</math> lies between <math>X_n</math> and <math>\theta</math>. Note that since <math>X_n\,\xrightarrow{P}\,\theta</math> and <math>X_n <  \tilde{\theta} < \theta  </math>, it must be that  <math>\tilde{\theta} \,\xrightarrow{P}\,\theta</math> and since <math>g'(\theta)</math> is continuous, applying the [[wikipedia:continuous mapping theorem|continuous mapping theorem]] yields
:<math display="block">g'(\tilde{\theta})\,\xrightarrow{P}\,g'(\theta),</math>
where <math>\xrightarrow{P}</math> denotes [[wikipedia:convergence in probability|convergence in probability]]. Rearranging the terms and multiplying by <math>\sqrt{n}</math> gives
:<math display="block">\sqrt{n}[g(X_n)-g(\theta)]=g'(\tilde{\theta})\sqrt{n}[X_n-\theta].</math> Since
:<math display="block">{\sqrt{n}[X_n-\theta] \xrightarrow{D} \mathcal{N}(0,\sigma^2)}</math>
by assumption, it follows immediately from appeal to [[wikipedia:Slutsky's theorem|Slutsky's Theorem]] that
:<math display="block">{\sqrt{n}[g(X_n)-g(\theta)] \xrightarrow{D} \mathcal{N}(0,\sigma^2[g'(\theta)]^2)}.</math>
Now we proceed to the multivariate case. Keeping only the first two terms of the [[wikipedia:Taylor series|Taylor series]], and using vector notation for the [[wikipedia:gradient|gradient]], we can estimate <math>h(B)</math> as
<math display="block">h(B) \approx h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)</math>
which implies the variance of ''h(B)'' is approximately
<math display="block">\begin{align*}
\operatorname{Var}(h(B)) & \approx \operatorname{Var}(h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)) \\
& = \operatorname{Var}(h(\beta) + \nabla h(\beta)^T \cdot B - \nabla h(\beta)^T \cdot \beta) \\
& = \operatorname{Var}(\nabla h(\beta)^T \cdot B) \\
& = \nabla h(\beta)^T \cdot \operatorname{Cov}(B) \cdot \nabla h(\beta) \\
& = \nabla h(\beta)^T \cdot (\Sigma / n) \cdot \nabla h(\beta)
\end{align*}
</math>
One can use the [[wikipedia:mean value theorem|mean value theorem]] (for real-valued functions of many variables) to see that this does not rely on taking first order approximation.
The univariate delta method therefore implies that
<math display="block">\sqrt{n}(h(B)-h(\beta))\,\xrightarrow{D}\,\mathcal{N}(0, \nabla h(\beta)^T \cdot \Sigma \cdot \nabla h(\beta)).</math> <div class="text-end">&#x25A0;</div></div></div></div></div>
==References==
*{{cite web |url = https://en.wikipedia.org/w/index.php?title=Delta_method&oldid=885377245 | title=  Delta method | author = Wikipedia contributors | website= Wikipedia |publisher= Wikipedia |access-date = 30 May 2019}}

Latest revision as of 15:29, 30 June 2023

The delta method is a result concerning the approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator.

Method

While the delta method generalizes easily to a multivariate setting, careful motivation of the technique is more easily demonstrated in univariate terms. Roughly, if there is a sequence of random variables [math]X_n[/math] satisfying

[[math]]{\sqrt{n}[X_n-\theta]\,\xrightarrow{D}\,\mathcal{N}(0,\sigma^2)},[[/math]]

where [math]\theta[/math] and [math]\sigma^2[/math] are finite valued constants and [math]\xrightarrow{D}[/math] denotes convergence in distribution, then

[[math]] {\sqrt{n}[g(X_n)-g(\theta)]\,\xrightarrow{D}\,\mathcal{N}(0,\sigma^2[g'(\theta)]^2)} [[/math]]

for any function [math]g[/math] satisfying the property that [math]g'(\theta) [/math] exists and is non-zero valued.


The method extends to the multivariate case. By definition, a consistent estimator [math]B[/math] converges in probability to its true value [math]\beta[/math], and often a central limit theorem can be applied to obtain asymptotic normality:

[[math]]\sqrt{n} (B-\beta )\,\xrightarrow{D}\,\mathcal{N}(0, \Sigma ),[[/math]]

where n is the number of observations and [math]\Sigma[/math] is a covariance matrix. The multivariate delta method yields the following asymptotic property of a function [math]h[/math] of the estimator [math]B[/math] under the assumption that the gradient [math]\nabla h[/math] is non-zero:

Proposition (Multivariate delta method)

[[math]]\sqrt{n}(h(B)-h(\beta))\,\xrightarrow{D}\,\mathcal{N}(0, \nabla h(\beta)^T \cdot \Sigma \cdot \nabla h(\beta)).[[/math]]

Show Proof

We start with the univariate case. Demonstration of this result is fairly straightforward under the assumption that [math]g'(\theta)[/math] is continuous. To begin, we use the mean value theorem:

[[math]]g(X_n)=g(\theta)+g'(\tilde{\theta})(X_n-\theta),[[/math]]

where [math]\tilde{\theta}[/math] lies between [math]X_n[/math] and [math]\theta[/math]. Note that since [math]X_n\,\xrightarrow{P}\,\theta[/math] and [math]X_n \lt \tilde{\theta} \lt \theta [/math], it must be that [math]\tilde{\theta} \,\xrightarrow{P}\,\theta[/math] and since [math]g'(\theta)[/math] is continuous, applying the continuous mapping theorem yields

[[math]]g'(\tilde{\theta})\,\xrightarrow{P}\,g'(\theta),[[/math]]

where [math]\xrightarrow{P}[/math] denotes convergence in probability. Rearranging the terms and multiplying by [math]\sqrt{n}[/math] gives

[[math]]\sqrt{n}[g(X_n)-g(\theta)]=g'(\tilde{\theta})\sqrt{n}[X_n-\theta].[[/math]]
Since
[[math]]{\sqrt{n}[X_n-\theta] \xrightarrow{D} \mathcal{N}(0,\sigma^2)}[[/math]]

by assumption, it follows immediately from appeal to Slutsky's Theorem that

[[math]]{\sqrt{n}[g(X_n)-g(\theta)] \xrightarrow{D} \mathcal{N}(0,\sigma^2[g'(\theta)]^2)}.[[/math]]


Now we proceed to the multivariate case. Keeping only the first two terms of the Taylor series, and using vector notation for the gradient, we can estimate [math]h(B)[/math] as

[[math]]h(B) \approx h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)[[/math]]

which implies the variance of h(B) is approximately

[[math]]\begin{align*} \operatorname{Var}(h(B)) & \approx \operatorname{Var}(h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)) \\ & = \operatorname{Var}(h(\beta) + \nabla h(\beta)^T \cdot B - \nabla h(\beta)^T \cdot \beta) \\ & = \operatorname{Var}(\nabla h(\beta)^T \cdot B) \\ & = \nabla h(\beta)^T \cdot \operatorname{Cov}(B) \cdot \nabla h(\beta) \\ & = \nabla h(\beta)^T \cdot (\Sigma / n) \cdot \nabla h(\beta) \end{align*} [[/math]]

One can use the mean value theorem (for real-valued functions of many variables) to see that this does not rely on taking first order approximation.

The univariate delta method therefore implies that

[[math]]\sqrt{n}(h(B)-h(\beta))\,\xrightarrow{D}\,\mathcal{N}(0, \nabla h(\beta)^T \cdot \Sigma \cdot \nabla h(\beta)).[[/math]]


References

  • Wikipedia contributors. "Delta method". Wikipedia. Wikipedia. Retrieved 30 May 2019.