guide:3285677816: Difference between revisions

Latest revision as of 16:29, 30 June 2023

The delta method is a result concerning the approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator.

Method

While the delta method generalizes easily to a multivariate setting, careful motivation of the technique is more easily demonstrated in univariate terms. Roughly, if there is a sequence of random variables [math]X_n[/math] satisfying

[[math]]{\sqrt{n}[X_n-\theta]\,\xrightarrow{D}\,\mathcal{N}(0,\sigma^2)},[[/math]]

where [math]\theta[/math] and [math]\sigma^2[/math] are finite valued constants and [math]\xrightarrow{D}[/math] denotes convergence in distribution, then

[[math]] {\sqrt{n}[g(X_n)-g(\theta)]\,\xrightarrow{D}\,\mathcal{N}(0,\sigma^2[g'(\theta)]^2)} [[/math]]

for any function [math]g[/math] satisfying the property that [math]g'(\theta) [/math] exists and is non-zero valued.

The method extends to the multivariate case. By definition, a consistent estimator [math]B[/math] converges in probability to its true value [math]\beta[/math], and often a central limit theorem can be applied to obtain asymptotic normality:

[[math]]\sqrt{n} (B-\beta )\,\xrightarrow{D}\,\mathcal{N}(0, \Sigma ),[[/math]]

where n is the number of observations and [math]\Sigma[/math] is a covariance matrix. The multivariate delta method yields the following asymptotic property of a function [math]h[/math] of the estimator [math]B[/math] under the assumption that the gradient [math]\nabla h[/math] is non-zero:

Proposition (Multivariate delta method)

[[math]]\sqrt{n}(h(B)-h(\beta))\,\xrightarrow{D}\,\mathcal{N}(0, \nabla h(\beta)^T \cdot \Sigma \cdot \nabla h(\beta)).[[/math]]

Show Proof

We start with the univariate case. Demonstration of this result is fairly straightforward under the assumption that [math]g'(\theta)[/math] is continuous. To begin, we use the mean value theorem:

[[math]]g(X_n)=g(\theta)+g'(\tilde{\theta})(X_n-\theta),[[/math]]

where [math]\tilde{\theta}[/math] lies between [math]X_n[/math] and [math]\theta[/math]. Note that since [math]X_n\,\xrightarrow{P}\,\theta[/math] and [math]X_n \lt \tilde{\theta} \lt \theta [/math], it must be that [math]\tilde{\theta} \,\xrightarrow{P}\,\theta[/math] and since [math]g'(\theta)[/math] is continuous, applying the continuous mapping theorem yields

[[math]]g'(\tilde{\theta})\,\xrightarrow{P}\,g'(\theta),[[/math]]

where [math]\xrightarrow{P}[/math] denotes convergence in probability. Rearranging the terms and multiplying by [math]\sqrt{n}[/math] gives

[[math]]\sqrt{n}[g(X_n)-g(\theta)]=g'(\tilde{\theta})\sqrt{n}[X_n-\theta].[[/math]]

Since

[[math]]{\sqrt{n}[X_n-\theta] \xrightarrow{D} \mathcal{N}(0,\sigma^2)}[[/math]]

by assumption, it follows immediately from appeal to Slutsky's Theorem that

[[math]]{\sqrt{n}[g(X_n)-g(\theta)] \xrightarrow{D} \mathcal{N}(0,\sigma^2[g'(\theta)]^2)}.[[/math]]

Now we proceed to the multivariate case. Keeping only the first two terms of the Taylor series, and using vector notation for the gradient, we can estimate [math]h(B)[/math] as

[[math]]h(B) \approx h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)[[/math]]

which implies the variance of h(B) is approximately

[[math]]\begin{align*} \operatorname{Var}(h(B)) & \approx \operatorname{Var}(h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)) \\ & = \operatorname{Var}(h(\beta) + \nabla h(\beta)^T \cdot B - \nabla h(\beta)^T \cdot \beta) \\ & = \operatorname{Var}(\nabla h(\beta)^T \cdot B) \\ & = \nabla h(\beta)^T \cdot \operatorname{Cov}(B) \cdot \nabla h(\beta) \\ & = \nabla h(\beta)^T \cdot (\Sigma / n) \cdot \nabla h(\beta) \end{align*} [[/math]]

One can use the mean value theorem (for real-valued functions of many variables) to see that this does not rely on taking first order approximation.

The univariate delta method therefore implies that

[[math]]\sqrt{n}(h(B)-h(\beta))\,\xrightarrow{D}\,\mathcal{N}(0, \nabla h(\beta)^T \cdot \Sigma \cdot \nabla h(\beta)).[[/math]]

■

References

Wikipedia contributors. "Delta method". Wikipedia. Wikipedia. Retrieved 30 May 2019.

@@ Line 1: / Line 1: @@
+The '''delta method''' is a result concerning the approximate probability distribution for a function of an [[wikipedia:Asymptotic distribution|asymptotically normal]] statistical estimator from knowledge of the limiting variance of that estimator.
+==Method==
+While the delta method generalizes easily to a multivariate setting, careful motivation of the technique is more easily demonstrated in univariate terms. Roughly, if there is a sequence of random variables <math>X_n</math> satisfying
+<math display="block">{\sqrt{n}[X_n-\theta]\,\xrightarrow{D}\,\mathcal{N}(0,\sigma^2)},</math>
+where  <math>\theta</math> and <math>\sigma^2</math> are finite valued constants and <math>\xrightarrow{D}</math> denotes [[wikipedia:convergence in distribution|convergence in distribution]], then
+<math display="block" >
+{\sqrt{n}[g(X_n)-g(\theta)]\,\xrightarrow{D}\,\mathcal{N}(0,\sigma^2[g'(\theta)]^2)}
+</math>
+for any function <math>g</math> satisfying the property that <math>g'(\theta) </math> exists and is non-zero valued.
+The method extends to the multivariate case. By definition, a [[wikipedia:consistency (statistics)|consistent]] estimator <math>B</math> converges in probability to its true value <math>\beta</math>, and often a [[wikipedia:central limit theorem|central limit theorem]] can be applied to obtain [[wikipedia:Estimator#Asymptotic normality|asymptotic normality]]:<math display="block">\sqrt{n} (B-\beta )\,\xrightarrow{D}\,\mathcal{N}(0, \Sigma ),</math>
+where ''n'' is the number of observations and <math>\Sigma</math> is a covariance matrix. The ''multivariate delta method'' yields the following asymptotic property of a function <math>h</math> of the estimator <math>B</math> under the assumption that the gradient <math>\nabla h</math> is non-zero:
+<div class="card mb-4"><div class="card-header"> Proposition (Multivariate delta method) </div><div class="card-body">
+<p class="card-text">
+<math display="block">\sqrt{n}(h(B)-h(\beta))\,\xrightarrow{D}\,\mathcal{N}(0, \nabla h(\beta)^T \cdot \Sigma \cdot \nabla h(\beta)).</math>
+</p>
+<span class="mw-customtoggle-theo.deltamethod btn btn-primary" >Show Proof</span><div class="mw-collapsible mw-collapsed" id="mw-customcollapsible-theo.deltamethod"><div class="mw-collapsible-content p-3">
+We start with the univariate case. Demonstration of this result is fairly straightforward under the assumption that <math>g'(\theta)</math> is [[wikipedia:Continuous function|continuous]]. To begin, we use the [[wikipedia:mean value theorem|mean value theorem]]:
+:<math display="block">g(X_n)=g(\theta)+g'(\tilde{\theta})(X_n-\theta),</math>
+where <math>\tilde{\theta}</math> lies between <math>X_n</math> and <math>\theta</math>. Note that since <math>X_n\,\xrightarrow{P}\,\theta</math> and <math>X_n <  \tilde{\theta} < \theta  </math>, it must be that  <math>\tilde{\theta} \,\xrightarrow{P}\,\theta</math> and since <math>g'(\theta)</math> is continuous, applying the [[wikipedia:continuous mapping theorem|continuous mapping theorem]] yields
+:<math display="block">g'(\tilde{\theta})\,\xrightarrow{P}\,g'(\theta),</math>
+where <math>\xrightarrow{P}</math> denotes [[wikipedia:convergence in probability|convergence in probability]]. Rearranging the terms and multiplying by <math>\sqrt{n}</math> gives
+:<math display="block">\sqrt{n}[g(X_n)-g(\theta)]=g'(\tilde{\theta})\sqrt{n}[X_n-\theta].</math> Since
+:<math display="block">{\sqrt{n}[X_n-\theta] \xrightarrow{D} \mathcal{N}(0,\sigma^2)}</math>
+by assumption, it follows immediately from appeal to [[wikipedia:Slutsky's theorem|Slutsky's Theorem]] that
+:<math display="block">{\sqrt{n}[g(X_n)-g(\theta)] \xrightarrow{D} \mathcal{N}(0,\sigma^2[g'(\theta)]^2)}.</math>
+Now we proceed to the multivariate case. Keeping only the first two terms of the [[wikipedia:Taylor series|Taylor series]], and using vector notation for the [[wikipedia:gradient|gradient]], we can estimate <math>h(B)</math> as
+<math display="block">h(B) \approx h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)</math>
+which implies the variance of ''h(B)'' is approximately
+<math display="block">\begin{align*}
+\operatorname{Var}(h(B)) & \approx \operatorname{Var}(h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)) \\
+ & = \operatorname{Var}(h(\beta) + \nabla h(\beta)^T \cdot B - \nabla h(\beta)^T \cdot \beta) \\
+ & = \operatorname{Var}(\nabla h(\beta)^T \cdot B) \\
+ & = \nabla h(\beta)^T \cdot \operatorname{Cov}(B) \cdot \nabla h(\beta) \\
+ & = \nabla h(\beta)^T \cdot (\Sigma / n) \cdot \nabla h(\beta)
+\end{align*}
+</math>
+One can use the [[wikipedia:mean value theorem|mean value theorem]] (for real-valued functions of many variables) to see that this does not rely on taking first order approximation.
+The univariate delta method therefore implies that
+<math display="block">\sqrt{n}(h(B)-h(\beta))\,\xrightarrow{D}\,\mathcal{N}(0, \nabla h(\beta)^T \cdot \Sigma \cdot \nabla h(\beta)).</math> <div class="text-end">&#x25A0;</div></div></div></div></div>
+==References==
+*{{cite web |url = https://en.wikipedia.org/w/index.php?title=Delta_method&oldid=885377245 | title=  Delta method | author = Wikipedia contributors | website= Wikipedia |publisher= Wikipedia |access-date = 30 May 2019}}