guide:7b0105e1fc: Difference between revisions

From Stochiki
No edit summary
mNo edit summary
 
(2 intermediate revisions by the same user not shown)
Line 147: Line 147:
</math>
</math>
</div>
</div>
\label{sec:misspec}


Although partial identification often results from reducing the number of assumptions maintained in counterpart point identified models, care still needs to be taken in assessing the possible consequences of misspecification.
Although partial identification often results from reducing the number of assumptions maintained in counterpart point identified models, care still needs to be taken in assessing the possible consequences of misspecification.
This section's goal is to discuss the existing literature on the topic, and to provide some additional observations.
This section's goal is to discuss the existing literature on the topic, and to provide some additional observations.
To keep the notation light, I refer to the functional of interest as <math>\theta</math> throughout, without explicitly distinguishing whether it belongs to an infinite dimensional parameter space (as in the nonparametric analysis in [[guide:Ec36399528#sec:prob:distr |Section]]), or to a finite dimensional one (as in the semiparametric analysis in [[guide:521939d27a#sec:structural |Section]]).
To keep the notation light, I refer to the functional of interest as <math>\theta</math> throughout, without explicitly distinguishing whether it belongs to an infinite dimensional parameter space (as in the nonparametric analysis in [[guide:Ec36399528#sec:prob:distr |Section]]), or to a finite dimensional one (as in the semiparametric analysis in [[guide:8d94784544 |Section]]).
The original nonparametric ‘`worst-case" bounds proposed by <ref name="man89"></ref> for the analysis of selectively observed data and discussed in [[guide:Ec36399528#sec:prob:distr |Section]] are not subject to the risk of misspecification, because they are based on the empirical evidence alone.
 
The original nonparametric “worst-case” bounds proposed by <ref name="man89"><span style="font-variant-caps:small-caps">Manski, C.F.</span>  (1989): “Anatomy of the Selection Problem” ''The  Journal of Human Resources'', 24(3), 343--360.</ref> for the analysis of selectively observed data and discussed in [[guide:Ec36399528#sec:prob:distr |Section]] are not subject to the risk of misspecification, because they are based on the empirical evidence alone.
However, often researchers are willing and eager to maintain additional assumptions that can help shrink the bounds, so that one can learn more from the available data.
However, often researchers are willing and eager to maintain additional assumptions that can help shrink the bounds, so that one can learn more from the available data.
Indeed, early on <ref name="man90"></ref> proposed the use of exclusion restrictions in the form of mean independence assumptions.
Indeed, early on <ref name="man90"><span style="font-variant-caps:small-caps">Manski, C.F.</span>  (1990): “Nonparametric Bounds on Treatment Effects”  ''The American Economic Review Papers and Proceedings'', 80(2), 319--323.</ref> proposed the use of exclusion restrictions in the form of mean independence assumptions.
Section [[guide:Ec36399528#subsec:programme:eval |Treatment Effects with and without Instrumental Variables]] discusses related ideas within the context of nonparametric bounds on treatment effects, and <ref name="man03"></ref>{{rp|at=Chapter 2}} provides a thorough treatment of other types of exclusion restriction.
Section [[guide:Ec36399528#subsec:programme:eval |Treatment Effects with and without Instrumental Variables]] discusses related ideas within the context of nonparametric bounds on treatment effects, and <ref name="man03"><span style="font-variant-caps:small-caps">Manski, C.F.</span>  (2003): ''Partial Identification of Probability  Distributions'', Springer Series in Statistics. Springer.</ref>{{rp|at=Chapter 2}} provides a thorough treatment of other types of exclusion restriction.
The literature reviewed throughout this chapter provides many more examples of assumptions that have proven useful for empirical research.
The literature reviewed throughout this chapter provides many more examples of assumptions that have proven useful for empirical research.
Broadly speaking, assumptions can be classified in two types <ref name="man03"></ref>{{rp|at=Chapter 2}}.
 
Broadly speaking, assumptions can be classified in two types <ref name="man03"/>{{rp|at=Chapter 2}}.
The first type is ’'non-refutable'': it may reduce the size of <math>\idr{\theta}</math>, but cannot lead to it being empty.
The first type is ’'non-refutable'': it may reduce the size of <math>\idr{\theta}</math>, but cannot lead to it being empty.
An example in the context of selectively observed data is that of exogenous selection, or data missing at random conditional on covariates and instruments (see Section [[guide:Ec36399528#subsec:missing_data |Selectively Observed Data]], p.~\pageref{subsec:missing_data}): under this assumption <math>\idr{\theta}</math> is a singleton, but the assumption cannot be refuted because it poses a distributional (independence) assumption on unobservables.
An example in the context of selectively observed data is that of exogenous selection, or data missing at random conditional on covariates and instruments (see Section [[guide:Ec36399528#subsec:missing_data |Selectively Observed Data]]): under this assumption <math>\idr{\theta}</math> is a singleton, but the assumption cannot be refuted because it poses a distributional (independence) assumption on unobservables.
 
The second type is ''refutable'': it may reduce the size of <math>\idr{\theta}</math>, and it may result in <math>\idr{\theta}=\emptyset</math> if it does not hold in the DGP.
The second type is ''refutable'': it may reduce the size of <math>\idr{\theta}</math>, and it may result in <math>\idr{\theta}=\emptyset</math> if it does not hold in the DGP.
An example in the context of treatment effects is the assumption of mean independence between response function at treatment <math>t</math> and instrumental variable <math>\ez</math>, see [[guide:Ec36399528#eq:ass:MI |eq:ass:MI]] in Section [[guide:Ec36399528#subsec:programme:eval |Treatment Effects with and without Instrumental Variables]].
An example in the context of treatment effects is the assumption of mean independence between response function at treatment <math>t</math> and instrumental variable <math>\ez</math>, see [[guide:Ec36399528#eq:ass:MI |eq:ass:MI]] in Section [[guide:Ec36399528#subsec:programme:eval |Treatment Effects with and without Instrumental Variables]].
There the sharp bounds on <math>\E_\sQ(\ey(t)|\ex=x)</math> are intersection bounds as in [[guide:Ec36399528#eq:intersection:bounds |eq:intersection:bounds]].
There the sharp bounds on <math>\E_\sQ(\ey(t)|\ex=x)</math> are intersection bounds as in [[guide:Ec36399528#eq:intersection:bounds |eq:intersection:bounds]].
If the instrument is invalid, the bounds can be empty.
If the instrument is invalid, the bounds can be empty.
<ref name="pon:tam11"></ref> consider the impact of misspecification on semiparametric partially identified models.
 
<ref name="pon:tam11"><span style="font-variant-caps:small-caps">Ponomareva, M.,  <span style="font-variant-caps:normal">and</span> E.Tamer</span>  (2011): “Misspecification in  moment inequality models: back to moment equalities?” ''The  Econometrics Journal'', 14(2), 186--203.</ref> consider the impact of misspecification on semiparametric partially identified models.
One of their examples concerns a linear regression model of the form <math>\E_\sQ(\ey|\ex)=\theta^\top\ex</math> when only interval data is available for <math>\ey</math> (as in Section [[guide:Ec36399528#subsec:interval_data |Interval Data]]).
One of their examples concerns a linear regression model of the form <math>\E_\sQ(\ey|\ex)=\theta^\top\ex</math> when only interval data is available for <math>\ey</math> (as in Section [[guide:Ec36399528#subsec:interval_data |Interval Data]]).
In this context, <math>\idr{\theta}=\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),~\ex\text{-a.s.}\}</math>.
In this context, <math>\idr{\theta}=\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),\ex\text{-a.s.}\}</math>.
The concern is that the conditional expectation might not be linear.
The concern is that the conditional expectation might not be linear.
<ref name="pon:tam11"></ref> make two important observations.
<ref name="pon:tam11"/> make two important observations.
First, they argue that the set <math>\idr{\theta}</math> is of difficult interpretation when the model is misspecified.
First, they argue that the set <math>\idr{\theta}</math> is of difficult interpretation when the model is misspecified.
When <math>\ey</math> is perfectly observed, if the conditional expectation is not linear, the output of ordinary least squares can be readily interpreted as the best linear approximation to <math>\E_\sQ(\ey|\ex)</math>.
When <math>\ey</math> is perfectly observed, if the conditional expectation is not linear, the output of ordinary least squares can be readily interpreted as the best linear approximation to <math>\E_\sQ(\ey|\ex)</math>.
This is not the case for <math>\idr{\theta}</math> when only the interval data <math>[\yL,\yU]</math> is observed.
This is not the case for <math>\idr{\theta}</math> when only the interval data <math>[\yL,\yU]</math> is observed.
They therefore propose to work with the set of best linear predictors for <math>\ey|\ex</math> even in the partially identified case (rather than fully exploit the linearity assumption).
They therefore propose to work with the set of best linear predictors for <math>\ey|\ex</math> even in the partially identified case (rather than fully exploit the linearity assumption).
The resulting set is the one derived by <ref name="ber:mol08"></ref> and reported in Theorem [[guide:Ec36399528#SIR:BLP_intervalY |SIR-]].
The resulting set is the one derived by <ref name="ber:mol08"><span style="font-variant-caps:small-caps">Beresteanu, A.,  <span style="font-variant-caps:normal">and</span> F.Molinari</span>  (2008): “Asymptotic  Properties for a Class of Partially Identified Models” ''Econometrica'',  76(4), 763--814.</ref> and reported in Theorem [[guide:Ec36399528#SIR:BLP_intervalY |SIR-]].
<ref name="pon:tam11"></ref> work with projections of this set, which coincide with the bounds in <ref name="sto07"></ref>.
<ref name="pon:tam11"/> work with projections of this set, which coincide with the bounds in <ref name="sto07"><span style="font-variant-caps:small-caps">Stoye, J.</span>  (2007): “Bounds on Generalized Linear Predictors with  Incomplete Outcome Data” ''Reliable Computing'', 13(3), 293--302.</ref>.
<ref name="pon:tam11"></ref> also point out that depending on the DGP, misspecification can cause <math>\idr{\theta}</math> to be spuriously tight.
<ref name="pon:tam11"/> also point out that depending on the DGP, misspecification can cause <math>\idr{\theta}</math> to be spuriously tight.
This can happen, for example, if <math>\E_\sP(\yL|\ex)</math> and <math>\E_\sP(\yU|\ex)</math> are sufficiently nonlinear, even if they are relatively far from each other (e.g., <ref name="pon:tam11"></ref>{{rp|at=Figure 1}}).
This can happen, for example, if <math>\E_\sP(\yL|\ex)</math> and <math>\E_\sP(\yU|\ex)</math> are sufficiently nonlinear, even if they are relatively far from each other (e.g., <ref name="pon:tam11"/>{{rp|at=Figure 1}}).
Hence, caution should be taken when interpreting very tight partial identification results as indicative of a highly informative model and empirical evidence, as the possibility of model misspecification has to be taken into account.
Hence, caution should be taken when interpreting very tight partial identification results as indicative of a highly informative model and empirical evidence, as the possibility of model misspecification has to be taken into account.
These observations naturally lead to the questions of how to test for model misspecification in the presence of partial identification, and of what are the consequences of misspecification for the confidence sets discussed in Section [[guide:6d1a428897#subsec:CS |Confidence Sets Satisfying Various Coverage Notions]].
These observations naturally lead to the questions of how to test for model misspecification in the presence of partial identification, and of what are the consequences of misspecification for the confidence sets discussed in Section [[guide:6d1a428897#subsec:CS |Confidence Sets Satisfying Various Coverage Notions]].
With partial identification, a null hypothesis of correct model specification (and its alternative) can be expressed as
With partial identification, a null hypothesis of correct model specification (and its alternative) can be expressed as
Line 187: Line 191:
</math>
</math>
Tests for this hypothesis have been proposed both for the case of nonparametric as well as semiparametric partially identified models.
Tests for this hypothesis have been proposed both for the case of nonparametric as well as semiparametric partially identified models.
I refer to <ref name="san12"></ref> for specification tests in a partially identified nonparametric instrumental variable model; to <ref name="kit:sto18"></ref> for a nonparametric test in random utility models that checks whether a repeated cross section of demand data might have been generated by a population of rational consumers (thereby testing for the Axiom of Revealed Stochastic Preference); and to <ref name="gug:hah:kim08"></ref> and <ref name="bon:mag:mau12"></ref> for specification tests in linear moment (in)equality models.  
I refer to <ref name="san12"><span style="font-variant-caps:small-caps">Santos, A.</span>  (2012): “Inference in nonparametric instrumental  variables with partial identification” ''Econometrica'', 80(1),  213--275.</ref> for specification tests in a partially identified nonparametric instrumental variable model; to <ref name="kit:sto18"><span style="font-variant-caps:small-caps">Kitamura, Y.,  <span style="font-variant-caps:normal">and</span> J.Stoye</span>  (2018): “Nonparametric Analysis  of Random Utility Models” ''Econometrica'', 86(6), 1883--1909.</ref> for a nonparametric test in random utility models that checks whether a repeated cross section of demand data might have been generated by a population of rational consumers (thereby testing for the Axiom of Revealed Stochastic Preference); and to <ref name="gug:hah:kim08"><span style="font-variant-caps:small-caps">Guggenberger, P., J.Hahn,  <span style="font-variant-caps:normal">and</span> K.Kim</span>  (2008):  “Specification testing under moment inequalities” ''Economics  Letters'', 99(2), 375 -- 378.</ref> and <ref name="bon:mag:mau12"><span style="font-variant-caps:small-caps">Bontemps, C., T.Magnac,  <span style="font-variant-caps:normal">and</span> E.Maurin</span>  (2012): “Set  identified linear models” ''Econometrica'', 80(3), 1129--1155.</ref> for specification tests in linear moment (in)equality models.  
For the general class of moment inequality models discussed in [[guide:6d1a428897#sec:inference |Section]], <ref name="rom:sha08"></ref>, <ref name="and:gug09b"></ref>, <ref name="gal:hen09"></ref>, and <ref name="and:soa10"></ref> propose a specification test that rejects the model if <math>\CS</math> in [[guide:6d1a428897#eq:CS |eq:CS]] is empty, where <math>\CS</math> is defined with <math>c_{1-\alpha}(\vartheta)</math> determined so as to satisfy [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]] and approximated according to the methods proposed in the respective papers.
 
For the general class of moment inequality models discussed in [[guide:6d1a428897#sec:inference |Section]], <ref name="rom:sha08"><span style="font-variant-caps:small-caps">Romano, J.P.,  <span style="font-variant-caps:normal">and</span> A.M. Shaikh</span>  (2008): “Inference for  identifiable parameters in partially identified econometric models”  ''Journal of Statistical Planning and Inference'', 138(9), 2786 -- 2807.</ref>, <ref name="and:gug09b"><span style="font-variant-caps:small-caps">Andrews, D. W.K.,  <span style="font-variant-caps:normal">and</span> P.Guggenberger</span>  (2009): “Validity  of Subsampling and `Plug-in Asymptotic' Inference for Parameters Defined by  Moment Inequalities” ''Econometric Theory'', 25(3), 669--709.</ref>, <ref name="gal:hen09"><span style="font-variant-caps:small-caps">Galichon, A.,  <span style="font-variant-caps:normal">and</span> M.Henry</span>  (2009): “A test of non-identifying restrictions and  confidence regions for partially identified parameters” ''Journal of  Econometrics'', 152(2), 186 -- 196.</ref>, and <ref name="and:soa10"><span style="font-variant-caps:small-caps">Andrews, D. W.K.,  <span style="font-variant-caps:normal">and</span> G.Soares</span>  (2010): “Inference for  Parameters Defined by Moment Inequalities Using Generalized Moment  Selection” ''Econometrica'', 78(1), 119--157.</ref> propose a specification test that rejects the model if <math>\CS</math> in [[guide:6d1a428897#eq:CS |eq:CS]] is empty, where <math>\CS</math> is defined with <math>c_{1-\alpha}(\vartheta)</math> determined so as to satisfy [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]] and approximated according to the methods proposed in the respective papers.
The resulting test, commonly referred to as ''by-product'' test because obtained as a by-product to the construction of a confidence set, takes the form
The resulting test, commonly referred to as ''by-product'' test because obtained as a by-product to the construction of a confidence set, takes the form


Line 196: Line 201:
\end{align*}
\end{align*}
</math>
</math>
Denoting by <math>\cP_0</math> the collection of <math>\sP\in\cP</math> such that <math>\idr{\theta}\neq\emptyset</math>, one has that the by-product test achieves uniform size control <ref name="bug:can:shi15"></ref>{{rp|at=Theorem C.2}}:
Denoting by <math>\cP_0</math> the collection of <math>\sP\in\cP</math> such that <math>\idr{\theta}\neq\emptyset</math>, one has that the by-product test achieves uniform size control <ref name="bug:can:shi15"><span style="font-variant-caps:small-caps">Bugni, F.A., I.A. Canay,  <span style="font-variant-caps:normal">and</span> X.Shi</span>  (2015):  “Specification tests for partially identified models defined by moment  inequalities” ''Journal of Econometrics'', 185(1), 259 -- 282.</ref>{{rp|at=Theorem C.2}}:


<math display="block">
<math display="block">
Line 203: Line 208:
\end{align}
\end{align}
</math>
</math>


An important feature of the by-product test is that the critical value <math>c_{1-\alpha}(\vartheta)</math> is not obtained to test for model misspecification, but it is obtained to insure the coverage requirement in [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]]; hence, it is obtained by working with the asymptotic distribution of <math>n\crit_n(\vartheta)</math>.
An important feature of the by-product test is that the critical value <math>c_{1-\alpha}(\vartheta)</math> is not obtained to test for model misspecification, but it is obtained to insure the coverage requirement in [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]]; hence, it is obtained by working with the asymptotic distribution of <math>n\crit_n(\vartheta)</math>.
<ref name="bug:can:shi15"></ref> propose more powerful model specification tests, using a critical value <math>c_{1-\alpha}</math> that they obtain to ensure that \eqref{eq:misp:test:uniform:size}, rather than [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]], holds.
<ref name="bug:can:shi15"/> propose more powerful model specification tests, using a critical value <math>c_{1-\alpha}</math> that they obtain to ensure that \eqref{eq:misp:test:uniform:size}, rather than [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]], holds.
In particular, they show that their tests dominate the by-product test in terms of power in any finite sample and in the asymptotic limit.
In particular, they show that their tests dominate the by-product test in terms of power in any finite sample and in the asymptotic limit.
Their critical value is obtained by working with the asymptotic distribution of <math>\inf_{\vartheta\in\Theta}n\crit_n(\vartheta)</math>.
Their critical value is obtained by working with the asymptotic distribution of <math>\inf_{\vartheta\in\Theta}n\crit_n(\vartheta)</math>.
As such, their proposal resembles the classic approach to model specification testing (<math>J</math>-test) in point identified generalized method of moments models.\medskip
As such, their proposal resembles the classic approach to model specification testing (<math>J</math>-test) in point identified generalized method of moments models.\medskip
While it is possible to test for misspecification also in partially identified models, a word of caution is due on what might be the effects of misspecification on confidence sets constructed as in [[guide:6d1a428897#eq:CS |eq:CS]] with <math>c_{1-\alpha}</math> determined to insure [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]], as it is often done in empirical work.
While it is possible to test for misspecification also in partially identified models, a word of caution is due on what might be the effects of misspecification on confidence sets constructed as in [[guide:6d1a428897#eq:CS |eq:CS]] with <math>c_{1-\alpha}</math> determined to insure [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]], as it is often done in empirical work.
<ref name="bug:can:gug12"></ref> show that in the presence of local misspecification, confidence sets <math>\CS</math> designed to satisfy [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]] fail to do so.
<ref name="bug:can:gug12"><span style="font-variant-caps:small-caps">Bugni, F.A., I.A. Canay,  <span style="font-variant-caps:normal">and</span> P.Guggenberger</span>  (2012):  “Distortions of Asymptotic Confidence Size in Locally Misspecified Moment  Inequality Models” ''Econometrica'', 80(4), 1741--1768.</ref> show that in the presence of local misspecification, confidence sets <math>\CS</math> designed to satisfy [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]] fail to do so.
In practice, the concern is that when the model is misspecified <math>\CS</math> might be spuriously small.
In practice, the concern is that when the model is misspecified <math>\CS</math> might be spuriously small.
Indeed, we have seen that it can be empty if the misspecification is sufficiently severe.
Indeed, we have seen that it can be empty if the misspecification is sufficiently severe.
If it is less severe but still present, it may lead to inference that is erroneously interpreted as precise.
If it is less severe but still present, it may lead to inference that is erroneously interpreted as precise.
It is natural to wonder how this compares to the effect of misspecification on inference in point identified models.<ref group="Notes" >The considerations that I report here are based on conversations with Joachim Freyberger and notes that he shared with me, for which I thank him.</ref>
It is natural to wonder how this compares to the effect of misspecification on inference in point identified models.<ref group="Notes" >The considerations that I report here are based on conversations with Joachim Freyberger and notes that he shared with me, for which I thank him.</ref>
In that case, the rich set of tools available for inference allows one to avoid this problem.
In that case, the rich set of tools available for inference allows one to avoid this problem.
Line 226: Line 232:
</math>
</math>
with <math>\hat\Xi</math> a consistent estimator of <math>\Xi=\E_\sP[m(\ew;\theta) m(\ew;\theta)^\top]</math> and <math>\bar{m}_n(\vartheta)</math> the sample analog of <math>\E_\sP(m(\ew;\vartheta))</math>.
with <math>\hat\Xi</math> a consistent estimator of <math>\Xi=\E_\sP[m(\ew;\theta) m(\ew;\theta)^\top]</math> and <math>\bar{m}_n(\vartheta)</math> the sample analog of <math>\E_\sP(m(\ew;\vartheta))</math>.
As shown by <ref name="han82"></ref> for correctly specified models, the distribution of <math>\sqrt{n}(\hat{\theta}_n-\theta)</math> converges to a Normal with mean vector equal to zero and covariance matrix <math>\Sigma</math>.
As shown by <ref name="han82"><span style="font-variant-caps:small-caps">Hansen, L.P.</span>  (1982b): “Large Sample Properties of Generalized Method of  Moments Estimators” ''Econometrica'', 50(4), 1029--1054.</ref> for correctly specified models, the distribution of <math>\sqrt{n}(\hat{\theta}_n-\theta)</math> converges to a Normal with mean vector equal to zero and covariance matrix <math>\Sigma</math>.
<ref name="hal:ino03"></ref> show that when the model is subject to non-local misspecification, <math>\sqrt{n}(\hat{\theta}_n-\theta_*)</math> converges to a Normal with mean vector equal to zero and covariance matrix <math>\Sigma_*</math>, where <math>\theta_*</math> is the pseudo-true vector (the probability limit of \eqref{eq:GMM:estimator}) and where <math>\Sigma_*</math> equals <math>\Sigma</math> if the model is correctly specified, and differs from it otherwise.
<ref name="hal:ino03"><span style="font-variant-caps:small-caps">Hall, A.R.,  <span style="font-variant-caps:normal">and</span> A.Inoue</span>  (2003): “The large sample  behaviour of the generalized method of moments estimator in misspecified  models” ''Journal of Econometrics'', 114(2), 361 -- 394.</ref> show that when the model is subject to non-local misspecification, <math>\sqrt{n}(\hat{\theta}_n-\theta_*)</math> converges to a Normal with mean vector equal to zero and covariance matrix <math>\Sigma_*</math>, where <math>\theta_*</math> is the pseudo-true vector (the probability limit of \eqref{eq:GMM:estimator}) and where <math>\Sigma_*</math> equals <math>\Sigma</math> if the model is correctly specified, and differs from it otherwise.
Let <math>\hat{\Sigma}_*</math> be a consistent estimator of <math>\Sigma_*</math> as in <ref name="hal:ino03"></ref>.
Let <math>\hat{\Sigma}_*</math> be a consistent estimator of <math>\Sigma_*</math> as in <ref name="hal:ino03"/>.
Define the Wald-statistic based confidence ellipsoid
Define the Wald-statistic based confidence ellipsoid


Line 237: Line 243:
</math>
</math>
with <math>c_{d,1-\alpha}</math> the <math>1-\alpha</math> critical value of a <math>\chi_d^2</math> (chi-squared random variable with <math>d</math> degrees of freedom).
with <math>c_{d,1-\alpha}</math> the <math>1-\alpha</math> critical value of a <math>\chi_d^2</math> (chi-squared random variable with <math>d</math> degrees of freedom).
Under standard regularity conditions (see <ref name="hal:ino03"></ref>) the confidence set in \eqref{eq:CS:Wald:point:id} covers with asymptotic probability <math>1-\alpha</math> the true vector <math>\theta</math> if the model is correctly specified, and the pseudo-true vector <math>\theta_*</math> if the model is incorrectly specified.
Under standard regularity conditions (see <ref name="hal:ino03"/>) the confidence set in \eqref{eq:CS:Wald:point:id} covers with asymptotic probability <math>1-\alpha</math> the true vector <math>\theta</math> if the model is correctly specified, and the pseudo-true vector <math>\theta_*</math> if the model is incorrectly specified.
In either case, \eqref{eq:CS:Wald:point:id} is never empty and its volume depends on <math>\hat{\Sigma}_*</math>.<ref group="Notes" >The effect of misspecification for maximum likelihood, least squares, and GMM estimators in ‘`point identified" models (by which I mean models where the population criterion function has a unique optimizer) has been studied in the literature; see, e.g., <ref name="whi82"></ref>, <ref name="gal:whi88"></ref>, <ref name="hal:ino03"></ref>, <ref name="han:lee19"></ref>, and references therein. These estimators have been shown to converge in probability to pseudo-true values, and it has been established that tests of hypotheses and confidence sets based on these estimators have correct asymptotic level with respect to the pseudo-true parameters, provided standard errors are computed appropriately. In the specific case of GMM discussed here, the pseudo-true value <math>\theta_*</math> depends on the choice of weighting matrix in \eqref{eq:GMM:estimator}: I have used <math>\hat\Xi</math>, but other choices are possible. I do not discuss this aspect of the problem here, but refer to <ref name="hal:ino03"></ref>.</ref>
In either case, \eqref{eq:CS:Wald:point:id} is never empty and its volume depends on <math>\hat{\Sigma}_*</math>.<ref group="Notes" >The effect of misspecification for maximum likelihood, least squares, and GMM estimators in ‘`point identified" models (by which I mean models where the population criterion function has a unique optimizer) has been studied in the literature; see, e.g., {{ref|name=whi82}}, {{ref|name=gal:whi88}}, {{ref|name=hal:ino03}}, {{ref|name=han:lee19}}, and references therein. These estimators have been shown to converge in probability to pseudo-true values, and it has been established that tests of hypotheses and confidence sets based on these estimators have correct asymptotic level with respect to the pseudo-true parameters, provided standard errors are computed appropriately. In the specific case of GMM discussed here, the pseudo-true value <math>\theta_*</math> depends on the choice of weighting matrix in \eqref{eq:GMM:estimator}: I have used <math>\hat\Xi</math>, but other choices are possible. I do not discuss this aspect of the problem here, but refer to {{ref|name=hal:ino03}}.</ref>
 
Even in the point identified case a confidence set constructed similarly to [[guide:6d1a428897#eq:CS |eq:CS]], i.e.,
Even in the point identified case a confidence set constructed similarly to [[guide:6d1a428897#eq:CS |eq:CS]], i.e.,


Line 251: Line 258:
However, this confidence set is empty with asymptotic probability <math>\P(\chi^2_{|\cJ|-d} > c_{|\cJ|,1-\alpha})</math>, due to the facts that <math>\P(\CS=\emptyset)=\P(\hat{\theta}_n\notin\CS)</math> and that <math>n\bar{m}_n(\hat{\theta}_n)\hat\Xi^{-1}\bar{m}_n(\hat{\theta}_n)\Rightarrow\chi^2_{|\cJ|-d}</math>.
However, this confidence set is empty with asymptotic probability <math>\P(\chi^2_{|\cJ|-d} > c_{|\cJ|,1-\alpha})</math>, due to the facts that <math>\P(\CS=\emptyset)=\P(\hat{\theta}_n\notin\CS)</math> and that <math>n\bar{m}_n(\hat{\theta}_n)\hat\Xi^{-1}\bar{m}_n(\hat{\theta}_n)\Rightarrow\chi^2_{|\cJ|-d}</math>.
Hence, it can be arbitrarily small.
Hence, it can be arbitrarily small.
In the very special case of a linear regression model with interval outcome data studied by <ref name="pon:tam11"></ref>, the procedure proposed by <ref name="ber:mol08"></ref> yields confidence sets that are always non-empty and whose volume depends on a covariance function that they derive (see <ref name="ber:mol08"></ref>{{rp|at=Theorem 4.3}}).
 
If the linear regression model is correctly specified, and hence <math>\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),~\ex\text{-a.s.}\}\neq\emptyset</math>, these confidence sets cover <math>\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),~\ex\text{-a.s.}\}</math> with asymptotic probability at least equal to <math>1-\alpha</math>, as in [[guide:6d1a428897#eq:CS_coverage:set:pw |eq:CS_coverage:set:pw]].
In the very special case of a linear regression model with interval outcome data studied by <ref name="pon:tam11"/>, the procedure proposed by <ref name="ber:mol08"/> yields confidence sets that are always non-empty and whose volume depends on a covariance function that they derive (see <ref name="ber:mol08"/>{{rp|at=Theorem 4.3}}).
Even if the model is misspecified and <math>\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),~\ex\text{-a.s.}\}=\emptyset</math>, the confidence sets cover the sharp identification region for the parameters of the best linear predictor of <math>\ey|\ex</math>, which can be viewed as a pseudo-true set, with probability exactly equal to <math>1-\alpha</math>.  
If the linear regression model is correctly specified, and hence <math>\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),\ex\text{-a.s.}\}\neq\emptyset</math>, these confidence sets cover <math>\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),\ex\text{-a.s.}\}</math> with asymptotic probability at least equal to <math>1-\alpha</math>, as in [[guide:6d1a428897#eq:CS_coverage:set:pw |eq:CS_coverage:set:pw]].
The test statistic that <ref name="ber:mol08"></ref> use is based on the Hausdorff distance between the estimator and the hypothesized set, and as such is a generalization of the standard Wald-statistic to the set-valued case.
Even if the model is misspecified and <math>\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),\ex\text{-a.s.}\}=\emptyset</math>, the confidence sets cover the sharp identification region for the parameters of the best linear predictor of <math>\ey|\ex</math>, which can be viewed as a pseudo-true set, with probability exactly equal to <math>1-\alpha</math>.  
The test statistic that <ref name="ber:mol08"/> use is based on the Hausdorff distance between the estimator and the hypothesized set, and as such is a generalization of the standard Wald-statistic to the set-valued case.
These considerations can be extended to other models.
These considerations can be extended to other models.
For example, <ref name="lee:bha19"></ref> study empirical measurement of Hicksian consumer welfare with interval data on income.
For example, <ref name="lee:bha19"><span style="font-variant-caps:small-caps">Lee, Y.-Y.,  <span style="font-variant-caps:normal">and</span> D.Bhattacharya</span>  (2019): “Applied welfare  analysis for discrete choice with interval-data on income” ''Journal of  Econometrics'', 211(2), 361--387.</ref> study empirical measurement of Hicksian consumer welfare with interval data on income.
When the model is misspecified, they provide a best parametric approximation to demand and welfare based on the support function method, and inference procedures for this approximation.
When the model is misspecified, they provide a best parametric approximation to demand and welfare based on the support function method, and inference procedures for this approximation.
For other moment inequality models, <ref name="kai:whi13"></ref> propose to build a pseudo-true set <math>\mathcal{H}_\sP^*[\theta]</math> that is obtained through a two-step procedure.
For other moment inequality models, <ref name="kai:whi13"><span style="font-variant-caps:small-caps">Kaido, H.,  <span style="font-variant-caps:normal">and</span> H.White</span>  (2013): “Estimating Misspecified  Moment Inequality Models” in ''Recent Advances and Future Directions in  Causality, Prediction, and Specification Analysis: Essays in Honor of Halbert  L. White Jr'', ed. by X.Chen,  <span style="font-variant-caps:normal">and</span> N.R. Swanson, pp. 331--361,  Springer, New York, NY.</ref> propose to build a pseudo-true set <math>\mathcal{H}_\sP^*[\theta]</math> that is obtained through a two-step procedure.
In the first step one obtains a nonparametric estimator of the function(s) for which the researcher wants to impose a parametric structure.
In the first step one obtains a nonparametric estimator of the function(s) for which the researcher wants to impose a parametric structure.
In the second step one obtains the set <math>\mathcal{H}_\sP^*[\theta]</math> as the collection of least squares projections of the set in the first step, on the parametric class imposed.
In the second step one obtains the set <math>\mathcal{H}_\sP^*[\theta]</math> as the collection of least squares projections of the set in the first step, on the parametric class imposed.
<ref name="kai:whi13"></ref> show that under regularity conditions the pseudo-true set can be consistently estimated, and derive rates of convergence for the estimator; however, they do not provide methods to obtain confidence sets.
<ref name="kai:whi13"/> show that under regularity conditions the pseudo-true set can be consistently estimated, and derive rates of convergence for the estimator; however, they do not provide methods to obtain confidence sets.
While conceptually valuable, their construction appears to be computationally difficult.
While conceptually valuable, their construction appears to be computationally difficult.
<ref name="mas:poi18"></ref> propose that when a model is falsified (in the sense that <math>\idr{\theta}</math> is empty) one should report the ’'falsification frontier'': the boundary between the set of assumptions which falsify the model and those which do not, obtained through continuous relaxations of the baseline assumptions of concern.  
<ref name="mas:poi18"><span style="font-variant-caps:small-caps">Masten, M.A.,  <span style="font-variant-caps:normal">and</span> A.Poirier</span>  (2018): “Salvaging Falsified  Instrumental Variable Models” available at  [https://arxiv.org/abs/1812.11598 https://arxiv.org/abs/1812.11598].</ref> propose that when a model is falsified (in the sense that <math>\idr{\theta}</math> is empty) one should report the ’'falsification frontier'': the boundary between the set of assumptions which falsify the model and those which do not, obtained through continuous relaxations of the baseline assumptions of concern.  
The researcher can then present the set <math>\idr{\theta}</math> that results if the true model lies somewhere on this frontier.  
The researcher can then present the set <math>\idr{\theta}</math> that results if the true model lies somewhere on this frontier.  
This set can be interpreted as a pseudo-true set.
This set can be interpreted as a pseudo-true set.
However, <ref name="mas:poi18"></ref> do not provide methods for inference.
However, <ref name="mas:poi18"/> do not provide methods for inference.
 
The implications of misspecification in partially identified models remain an open and important question in the literature.
The implications of misspecification in partially identified models remain an open and important question in the literature.
For example, it would be useful to have notions of pseudo-true set that parallel those of pseudo-true value in the point identified case.
For example, it would be useful to have notions of pseudo-true set that parallel those of pseudo-true value in the point identified case.
It would also be important to provide methods for the construction of confidence sets in general moment inequality models that do not exhibit spurious precision (i.e., are arbitrarily small) when the model is misspecified.
It would also be important to provide methods for the construction of confidence sets in general moment inequality models that do not exhibit spurious precision (i.e., are arbitrarily small) when the model is misspecified. Recent work by <ref name="and:kwo19"><span style="font-variant-caps:small-caps">Andrews, D. W.K.,  <span style="font-variant-caps:normal">and</span> S.Kwon</span>  (2019): “Inference in  Moment Inequality Models That Is Robust to Spurious Precision under Model  Misspecification” available at  [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3416831 https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3416831].</ref> addresses some of these questions.
Recent work by <ref name="and:kwo19"></ref> addresses some of these questions.
 
==General references==
==General references==
{{cite arXiv|last1=Molinari|first1=Francesca|year=2020|title=Microeconometrics with Partial Identification|eprint=2004.11751|class=econ.EM}}
{{cite arXiv|last1=Molinari|first1=Francesca|year=2020|title=Microeconometrics with Partial Identification|eprint=2004.11751|class=econ.EM}}

Latest revision as of 23:04, 19 June 2024

[math] \newcommand{\edis}{\stackrel{d}{=}} \newcommand{\fd}{\stackrel{f.d.}{\rightarrow}} \newcommand{\dom}{\operatorname{dom}} \newcommand{\eig}{\operatorname{eig}} \newcommand{\epi}{\operatorname{epi}} \newcommand{\lev}{\operatorname{lev}} \newcommand{\card}{\operatorname{card}} \newcommand{\comment}{\textcolor{Green}} \newcommand{\B}{\mathbb{B}} \newcommand{\C}{\mathbb{C}} \newcommand{\G}{\mathbb{G}} \newcommand{\M}{\mathbb{M}} \newcommand{\N}{\mathbb{N}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\T}{\mathbb{T}} \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\W}{\mathbb{W}} \newcommand{\bU}{\mathfrak{U}} \newcommand{\bu}{\mathfrak{u}} \newcommand{\bI}{\mathfrak{I}} \newcommand{\cA}{\mathcal{A}} \newcommand{\cB}{\mathcal{B}} \newcommand{\cC}{\mathcal{C}} \newcommand{\cD}{\mathcal{D}} \newcommand{\cE}{\mathcal{E}} \newcommand{\cF}{\mathcal{F}} \newcommand{\cG}{\mathcal{G}} \newcommand{\cg}{\mathcal{g}} \newcommand{\cH}{\mathcal{H}} \newcommand{\cI}{\mathcal{I}} \newcommand{\cJ}{\mathcal{J}} \newcommand{\cK}{\mathcal{K}} \newcommand{\cL}{\mathcal{L}} \newcommand{\cM}{\mathcal{M}} \newcommand{\cN}{\mathcal{N}} \newcommand{\cO}{\mathcal{O}} \newcommand{\cP}{\mathcal{P}} \newcommand{\cQ}{\mathcal{Q}} \newcommand{\cR}{\mathcal{R}} \newcommand{\cS}{\mathcal{S}} \newcommand{\cT}{\mathcal{T}} \newcommand{\cU}{\mathcal{U}} \newcommand{\cu}{\mathcal{u}} \newcommand{\cV}{\mathcal{V}} \newcommand{\cW}{\mathcal{W}} \newcommand{\cX}{\mathcal{X}} \newcommand{\cY}{\mathcal{Y}} \newcommand{\cZ}{\mathcal{Z}} \newcommand{\sF}{\mathsf{F}} \newcommand{\sM}{\mathsf{M}} \newcommand{\sG}{\mathsf{G}} \newcommand{\sT}{\mathsf{T}} \newcommand{\sB}{\mathsf{B}} \newcommand{\sC}{\mathsf{C}} \newcommand{\sP}{\mathsf{P}} \newcommand{\sQ}{\mathsf{Q}} \newcommand{\sq}{\mathsf{q}} \newcommand{\sR}{\mathsf{R}} \newcommand{\sS}{\mathsf{S}} \newcommand{\sd}{\mathsf{d}} \newcommand{\cp}{\mathsf{p}} \newcommand{\cc}{\mathsf{c}} \newcommand{\cf}{\mathsf{f}} \newcommand{\eU}{{\boldsymbol{U}}} \newcommand{\eb}{{\boldsymbol{b}}} \newcommand{\ed}{{\boldsymbol{d}}} \newcommand{\eu}{{\boldsymbol{u}}} \newcommand{\ew}{{\boldsymbol{w}}} \newcommand{\ep}{{\boldsymbol{p}}} \newcommand{\eX}{{\boldsymbol{X}}} \newcommand{\ex}{{\boldsymbol{x}}} \newcommand{\eY}{{\boldsymbol{Y}}} \newcommand{\eB}{{\boldsymbol{B}}} \newcommand{\eC}{{\boldsymbol{C}}} \newcommand{\eD}{{\boldsymbol{D}}} \newcommand{\eW}{{\boldsymbol{W}}} \newcommand{\eR}{{\boldsymbol{R}}} \newcommand{\eQ}{{\boldsymbol{Q}}} \newcommand{\eS}{{\boldsymbol{S}}} \newcommand{\eT}{{\boldsymbol{T}}} \newcommand{\eA}{{\boldsymbol{A}}} \newcommand{\eH}{{\boldsymbol{H}}} \newcommand{\ea}{{\boldsymbol{a}}} \newcommand{\ey}{{\boldsymbol{y}}} \newcommand{\eZ}{{\boldsymbol{Z}}} \newcommand{\eG}{{\boldsymbol{G}}} \newcommand{\ez}{{\boldsymbol{z}}} \newcommand{\es}{{\boldsymbol{s}}} \newcommand{\et}{{\boldsymbol{t}}} \newcommand{\ev}{{\boldsymbol{v}}} \newcommand{\ee}{{\boldsymbol{e}}} \newcommand{\eq}{{\boldsymbol{q}}} \newcommand{\bnu}{{\boldsymbol{\nu}}} \newcommand{\barX}{\overline{\eX}} \newcommand{\eps}{\varepsilon} \newcommand{\Eps}{\mathcal{E}} \newcommand{\carrier}{{\mathfrak{X}}} \newcommand{\Ball}{{\mathbb{B}}^{d}} \newcommand{\Sphere}{{\mathbb{S}}^{d-1}} \newcommand{\salg}{\mathfrak{F}} \newcommand{\ssalg}{\mathfrak{B}} \newcommand{\one}{\mathbf{1}} \newcommand{\Prob}[1]{\P\{#1\}} \newcommand{\yL}{\ey_{\mathrm{L}}} \newcommand{\yU}{\ey_{\mathrm{U}}} \newcommand{\yLi}{\ey_{\mathrm{L}i}} \newcommand{\yUi}{\ey_{\mathrm{U}i}} \newcommand{\xL}{\ex_{\mathrm{L}}} \newcommand{\xU}{\ex_{\mathrm{U}}} \newcommand{\vL}{\ev_{\mathrm{L}}} \newcommand{\vU}{\ev_{\mathrm{U}}} \newcommand{\dist}{\mathbf{d}} \newcommand{\rhoH}{\dist_{\mathrm{H}}} \newcommand{\ti}{\to\infty} \newcommand{\comp}[1]{#1^\mathrm{c}} \newcommand{\ThetaI}{\Theta_{\mathrm{I}}} \newcommand{\crit}{q} \newcommand{\CS}{CS_n} \newcommand{\CI}{CI_n} \newcommand{\cv}[1]{\hat{c}_{n,1-\alpha}(#1)} \newcommand{\idr}[1]{\mathcal{H}_\sP[#1]} \newcommand{\outr}[1]{\mathcal{O}_\sP[#1]} \newcommand{\idrn}[1]{\hat{\mathcal{H}}_{\sP_n}[#1]} \newcommand{\outrn}[1]{\mathcal{O}_{\sP_n}[#1]} \newcommand{\email}[1]{\texttt{#1}} \newcommand{\possessivecite}[1]{\ltref name="#1"\gt\lt/ref\gt's \citeyear{#1}} \newcommand\xqed[1]{% \leavevmode\unskip\penalty9999 \hbox{}\nobreak\hfill \quad\hbox{#1}} \newcommand\qedex{\xqed{$\triangle$}} \newcommand\independent{\protect\mathpalette{\protect\independenT}{\perp}} \DeclareMathOperator{\Int}{Int} \DeclareMathOperator{\conv}{conv} \DeclareMathOperator{\cov}{Cov} \DeclareMathOperator{\var}{Var} \DeclareMathOperator{\Sel}{Sel} \DeclareMathOperator{\Bel}{Bel} \DeclareMathOperator{\cl}{cl} \DeclareMathOperator{\sgn}{sgn} \DeclareMathOperator{\essinf}{essinf} \DeclareMathOperator{\esssup}{esssup} \newcommand{\mathds}{\mathbb} \renewcommand{\P}{\mathbb{P}} [/math]

Although partial identification often results from reducing the number of assumptions maintained in counterpart point identified models, care still needs to be taken in assessing the possible consequences of misspecification. This section's goal is to discuss the existing literature on the topic, and to provide some additional observations. To keep the notation light, I refer to the functional of interest as [math]\theta[/math] throughout, without explicitly distinguishing whether it belongs to an infinite dimensional parameter space (as in the nonparametric analysis in Section), or to a finite dimensional one (as in the semiparametric analysis in Section).

The original nonparametric “worst-case” bounds proposed by [1] for the analysis of selectively observed data and discussed in Section are not subject to the risk of misspecification, because they are based on the empirical evidence alone. However, often researchers are willing and eager to maintain additional assumptions that can help shrink the bounds, so that one can learn more from the available data. Indeed, early on [2] proposed the use of exclusion restrictions in the form of mean independence assumptions. Section Treatment Effects with and without Instrumental Variables discusses related ideas within the context of nonparametric bounds on treatment effects, and [3](Chapter 2) provides a thorough treatment of other types of exclusion restriction. The literature reviewed throughout this chapter provides many more examples of assumptions that have proven useful for empirical research.

Broadly speaking, assumptions can be classified in two types [3](Chapter 2). The first type is ’'non-refutable: it may reduce the size of [math]\idr{\theta}[/math], but cannot lead to it being empty. An example in the context of selectively observed data is that of exogenous selection, or data missing at random conditional on covariates and instruments (see Section Selectively Observed Data): under this assumption [math]\idr{\theta}[/math] is a singleton, but the assumption cannot be refuted because it poses a distributional (independence) assumption on unobservables.

The second type is refutable: it may reduce the size of [math]\idr{\theta}[/math], and it may result in [math]\idr{\theta}=\emptyset[/math] if it does not hold in the DGP. An example in the context of treatment effects is the assumption of mean independence between response function at treatment [math]t[/math] and instrumental variable [math]\ez[/math], see eq:ass:MI in Section Treatment Effects with and without Instrumental Variables. There the sharp bounds on [math]\E_\sQ(\ey(t)|\ex=x)[/math] are intersection bounds as in eq:intersection:bounds. If the instrument is invalid, the bounds can be empty.

[4] consider the impact of misspecification on semiparametric partially identified models. One of their examples concerns a linear regression model of the form [math]\E_\sQ(\ey|\ex)=\theta^\top\ex[/math] when only interval data is available for [math]\ey[/math] (as in Section Interval Data). In this context, [math]\idr{\theta}=\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),\ex\text{-a.s.}\}[/math]. The concern is that the conditional expectation might not be linear. [4] make two important observations. First, they argue that the set [math]\idr{\theta}[/math] is of difficult interpretation when the model is misspecified. When [math]\ey[/math] is perfectly observed, if the conditional expectation is not linear, the output of ordinary least squares can be readily interpreted as the best linear approximation to [math]\E_\sQ(\ey|\ex)[/math]. This is not the case for [math]\idr{\theta}[/math] when only the interval data [math][\yL,\yU][/math] is observed. They therefore propose to work with the set of best linear predictors for [math]\ey|\ex[/math] even in the partially identified case (rather than fully exploit the linearity assumption). The resulting set is the one derived by [5] and reported in Theorem SIR-. [4] work with projections of this set, which coincide with the bounds in [6]. [4] also point out that depending on the DGP, misspecification can cause [math]\idr{\theta}[/math] to be spuriously tight. This can happen, for example, if [math]\E_\sP(\yL|\ex)[/math] and [math]\E_\sP(\yU|\ex)[/math] are sufficiently nonlinear, even if they are relatively far from each other (e.g., [4](Figure 1)). Hence, caution should be taken when interpreting very tight partial identification results as indicative of a highly informative model and empirical evidence, as the possibility of model misspecification has to be taken into account.

These observations naturally lead to the questions of how to test for model misspecification in the presence of partial identification, and of what are the consequences of misspecification for the confidence sets discussed in Section Confidence Sets Satisfying Various Coverage Notions. With partial identification, a null hypothesis of correct model specification (and its alternative) can be expressed as

[[math]] \begin{align*} H_0:\idr{\theta}\neq\emptyset;\quad H_1:\idr{\theta}=\emptyset. \end{align*} [[/math]]

Tests for this hypothesis have been proposed both for the case of nonparametric as well as semiparametric partially identified models. I refer to [7] for specification tests in a partially identified nonparametric instrumental variable model; to [8] for a nonparametric test in random utility models that checks whether a repeated cross section of demand data might have been generated by a population of rational consumers (thereby testing for the Axiom of Revealed Stochastic Preference); and to [9] and [10] for specification tests in linear moment (in)equality models.

For the general class of moment inequality models discussed in Section, [11], [12], [13], and [14] propose a specification test that rejects the model if [math]\CS[/math] in eq:CS is empty, where [math]\CS[/math] is defined with [math]c_{1-\alpha}(\vartheta)[/math] determined so as to satisfy eq:CS_coverage:point and approximated according to the methods proposed in the respective papers. The resulting test, commonly referred to as by-product test because obtained as a by-product to the construction of a confidence set, takes the form

[[math]] \begin{align*} \phi=\one(\CS=\emptyset)=\one\left(\inf_{\vartheta\in\Theta}n\crit_n(\vartheta) \gt c_{1-\alpha}(\vartheta)\right). \end{align*} [[/math]]

Denoting by [math]\cP_0[/math] the collection of [math]\sP\in\cP[/math] such that [math]\idr{\theta}\neq\emptyset[/math], one has that the by-product test achieves uniform size control [15](Theorem C.2):

[[math]] \begin{align} \limsup_{n\to\infty}\sup_{\sP\in\cP_0}\E_\sP(\phi)\le\alpha.\label{eq:misp:test:uniform:size} \end{align} [[/math]]

An important feature of the by-product test is that the critical value [math]c_{1-\alpha}(\vartheta)[/math] is not obtained to test for model misspecification, but it is obtained to insure the coverage requirement in eq:CS_coverage:point; hence, it is obtained by working with the asymptotic distribution of [math]n\crit_n(\vartheta)[/math]. [15] propose more powerful model specification tests, using a critical value [math]c_{1-\alpha}[/math] that they obtain to ensure that \eqref{eq:misp:test:uniform:size}, rather than eq:CS_coverage:point, holds. In particular, they show that their tests dominate the by-product test in terms of power in any finite sample and in the asymptotic limit. Their critical value is obtained by working with the asymptotic distribution of [math]\inf_{\vartheta\in\Theta}n\crit_n(\vartheta)[/math]. As such, their proposal resembles the classic approach to model specification testing ([math]J[/math]-test) in point identified generalized method of moments models.\medskip

While it is possible to test for misspecification also in partially identified models, a word of caution is due on what might be the effects of misspecification on confidence sets constructed as in eq:CS with [math]c_{1-\alpha}[/math] determined to insure eq:CS_coverage:point, as it is often done in empirical work. [16] show that in the presence of local misspecification, confidence sets [math]\CS[/math] designed to satisfy eq:CS_coverage:point fail to do so. In practice, the concern is that when the model is misspecified [math]\CS[/math] might be spuriously small. Indeed, we have seen that it can be empty if the misspecification is sufficiently severe. If it is less severe but still present, it may lead to inference that is erroneously interpreted as precise.

It is natural to wonder how this compares to the effect of misspecification on inference in point identified models.[Notes 1] In that case, the rich set of tools available for inference allows one to avoid this problem. Consider for example a point identified generalized method of moments model with moment conditions [math]\E_\sP(m_j(\ew;\theta))=0[/math], [math]j=1,\dots,|\cJ|[/math], and [math]|\cJ| \gt d[/math]. Let [math]m[/math] denote the vector that stacks each of the [math]m_j[/math] functions, and let the estimator of [math]\theta[/math] be

[[math]] \begin{align} \hat{\theta}_n=\argmin_{\vartheta\in\Theta}n\bar{m}_n(\vartheta)^\top\hat\Xi^{-1}\bar{m}_n(\vartheta),\label{eq:GMM:estimator} \end{align} [[/math]]

with [math]\hat\Xi[/math] a consistent estimator of [math]\Xi=\E_\sP[m(\ew;\theta) m(\ew;\theta)^\top][/math] and [math]\bar{m}_n(\vartheta)[/math] the sample analog of [math]\E_\sP(m(\ew;\vartheta))[/math]. As shown by [17] for correctly specified models, the distribution of [math]\sqrt{n}(\hat{\theta}_n-\theta)[/math] converges to a Normal with mean vector equal to zero and covariance matrix [math]\Sigma[/math]. [18] show that when the model is subject to non-local misspecification, [math]\sqrt{n}(\hat{\theta}_n-\theta_*)[/math] converges to a Normal with mean vector equal to zero and covariance matrix [math]\Sigma_*[/math], where [math]\theta_*[/math] is the pseudo-true vector (the probability limit of \eqref{eq:GMM:estimator}) and where [math]\Sigma_*[/math] equals [math]\Sigma[/math] if the model is correctly specified, and differs from it otherwise. Let [math]\hat{\Sigma}_*[/math] be a consistent estimator of [math]\Sigma_*[/math] as in [18]. Define the Wald-statistic based confidence ellipsoid

[[math]] \begin{align} \{\vartheta\in\Theta:n(\hat{\theta}_n-\vartheta)^\top\hat{\Sigma}_*^{-1}(\hat{\theta}_n-\vartheta)\le c_{d,1-\alpha}\},\label{eq:CS:Wald:point:id} \end{align} [[/math]]

with [math]c_{d,1-\alpha}[/math] the [math]1-\alpha[/math] critical value of a [math]\chi_d^2[/math] (chi-squared random variable with [math]d[/math] degrees of freedom). Under standard regularity conditions (see [18]) the confidence set in \eqref{eq:CS:Wald:point:id} covers with asymptotic probability [math]1-\alpha[/math] the true vector [math]\theta[/math] if the model is correctly specified, and the pseudo-true vector [math]\theta_*[/math] if the model is incorrectly specified. In either case, \eqref{eq:CS:Wald:point:id} is never empty and its volume depends on [math]\hat{\Sigma}_*[/math].[Notes 2]

Even in the point identified case a confidence set constructed similarly to eq:CS, i.e.,

[[math]] \begin{align} \{\vartheta\in\Theta:n\bar{m}_n(\vartheta)\hat\Xi^{-1}\bar{m}_n(\vartheta)\le c_{|\cJ|,1-\alpha}\},\label{eq:CS:AR:point:id} \end{align} [[/math]]

where [math]c_{|\cJ|,1-\alpha}[/math] is the [math]1-\alpha[/math] critical value of a [math]\chi^2_{|\cJ|}[/math], incurs the same problems as its partial identification counterpart. Under standard regularity conditions, if the model is correctly specified, the confidence set in \eqref{eq:CS:AR:point:id} covers [math]\theta[/math] with asymptotic probability [math]1-\alpha[/math], because [math]n\bar{m}_n(\vartheta)\hat\Xi^{-1}\bar{m}_n(\vartheta)\Rightarrow\chi^2_{|\cJ|}[/math]. However, this confidence set is empty with asymptotic probability [math]\P(\chi^2_{|\cJ|-d} \gt c_{|\cJ|,1-\alpha})[/math], due to the facts that [math]\P(\CS=\emptyset)=\P(\hat{\theta}_n\notin\CS)[/math] and that [math]n\bar{m}_n(\hat{\theta}_n)\hat\Xi^{-1}\bar{m}_n(\hat{\theta}_n)\Rightarrow\chi^2_{|\cJ|-d}[/math]. Hence, it can be arbitrarily small.

In the very special case of a linear regression model with interval outcome data studied by [4], the procedure proposed by [5] yields confidence sets that are always non-empty and whose volume depends on a covariance function that they derive (see [5](Theorem 4.3)). If the linear regression model is correctly specified, and hence [math]\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),\ex\text{-a.s.}\}\neq\emptyset[/math], these confidence sets cover [math]\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),\ex\text{-a.s.}\}[/math] with asymptotic probability at least equal to [math]1-\alpha[/math], as in eq:CS_coverage:set:pw. Even if the model is misspecified and [math]\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),\ex\text{-a.s.}\}=\emptyset[/math], the confidence sets cover the sharp identification region for the parameters of the best linear predictor of [math]\ey|\ex[/math], which can be viewed as a pseudo-true set, with probability exactly equal to [math]1-\alpha[/math]. The test statistic that [5] use is based on the Hausdorff distance between the estimator and the hypothesized set, and as such is a generalization of the standard Wald-statistic to the set-valued case. These considerations can be extended to other models. For example, [19] study empirical measurement of Hicksian consumer welfare with interval data on income. When the model is misspecified, they provide a best parametric approximation to demand and welfare based on the support function method, and inference procedures for this approximation. For other moment inequality models, [20] propose to build a pseudo-true set [math]\mathcal{H}_\sP^*[\theta][/math] that is obtained through a two-step procedure. In the first step one obtains a nonparametric estimator of the function(s) for which the researcher wants to impose a parametric structure. In the second step one obtains the set [math]\mathcal{H}_\sP^*[\theta][/math] as the collection of least squares projections of the set in the first step, on the parametric class imposed. [20] show that under regularity conditions the pseudo-true set can be consistently estimated, and derive rates of convergence for the estimator; however, they do not provide methods to obtain confidence sets. While conceptually valuable, their construction appears to be computationally difficult. [21] propose that when a model is falsified (in the sense that [math]\idr{\theta}[/math] is empty) one should report the ’'falsification frontier: the boundary between the set of assumptions which falsify the model and those which do not, obtained through continuous relaxations of the baseline assumptions of concern. The researcher can then present the set [math]\idr{\theta}[/math] that results if the true model lies somewhere on this frontier. This set can be interpreted as a pseudo-true set. However, [21] do not provide methods for inference.

The implications of misspecification in partially identified models remain an open and important question in the literature. For example, it would be useful to have notions of pseudo-true set that parallel those of pseudo-true value in the point identified case. It would also be important to provide methods for the construction of confidence sets in general moment inequality models that do not exhibit spurious precision (i.e., are arbitrarily small) when the model is misspecified. Recent work by [22] addresses some of these questions.

General references

Molinari, Francesca (2020). "Microeconometrics with Partial Identification". arXiv:2004.11751 [econ.EM].

Notes

  1. The considerations that I report here are based on conversations with Joachim Freyberger and notes that he shared with me, for which I thank him.
  2. The effect of misspecification for maximum likelihood, least squares, and GMM estimators in ‘`point identified" models (by which I mean models where the population criterion function has a unique optimizer) has been studied in the literature; see, e.g., [1], [2], [3], [4], and references therein. These estimators have been shown to converge in probability to pseudo-true values, and it has been established that tests of hypotheses and confidence sets based on these estimators have correct asymptotic level with respect to the pseudo-true parameters, provided standard errors are computed appropriately. In the specific case of GMM discussed here, the pseudo-true value [math]\theta_*[/math] depends on the choice of weighting matrix in \eqref{eq:GMM:estimator}: I have used [math]\hat\Xi[/math], but other choices are possible. I do not discuss this aspect of the problem here, but refer to [5].

References

  1. Manski, C.F. (1989): “Anatomy of the Selection Problem” The Journal of Human Resources, 24(3), 343--360.
  2. Manski, C.F. (1990): “Nonparametric Bounds on Treatment Effects” The American Economic Review Papers and Proceedings, 80(2), 319--323.
  3. 3.0 3.1 Manski, C.F. (2003): Partial Identification of Probability Distributions, Springer Series in Statistics. Springer.
  4. 4.0 4.1 4.2 4.3 4.4 4.5 Ponomareva, M., and E.Tamer (2011): “Misspecification in moment inequality models: back to moment equalities?” The Econometrics Journal, 14(2), 186--203.
  5. 5.0 5.1 5.2 5.3 Beresteanu, A., and F.Molinari (2008): “Asymptotic Properties for a Class of Partially Identified Models” Econometrica, 76(4), 763--814.
  6. Stoye, J. (2007): “Bounds on Generalized Linear Predictors with Incomplete Outcome Data” Reliable Computing, 13(3), 293--302.
  7. Santos, A. (2012): “Inference in nonparametric instrumental variables with partial identification” Econometrica, 80(1), 213--275.
  8. Kitamura, Y., and J.Stoye (2018): “Nonparametric Analysis of Random Utility Models” Econometrica, 86(6), 1883--1909.
  9. Guggenberger, P., J.Hahn, and K.Kim (2008): “Specification testing under moment inequalities” Economics Letters, 99(2), 375 -- 378.
  10. Bontemps, C., T.Magnac, and E.Maurin (2012): “Set identified linear models” Econometrica, 80(3), 1129--1155.
  11. Romano, J.P., and A.M. Shaikh (2008): “Inference for identifiable parameters in partially identified econometric models” Journal of Statistical Planning and Inference, 138(9), 2786 -- 2807.
  12. Andrews, D. W.K., and P.Guggenberger (2009): “Validity of Subsampling and `Plug-in Asymptotic' Inference for Parameters Defined by Moment Inequalities” Econometric Theory, 25(3), 669--709.
  13. Galichon, A., and M.Henry (2009): “A test of non-identifying restrictions and confidence regions for partially identified parameters” Journal of Econometrics, 152(2), 186 -- 196.
  14. Andrews, D. W.K., and G.Soares (2010): “Inference for Parameters Defined by Moment Inequalities Using Generalized Moment Selection” Econometrica, 78(1), 119--157.
  15. 15.0 15.1 Bugni, F.A., I.A. Canay, and X.Shi (2015): “Specification tests for partially identified models defined by moment inequalities” Journal of Econometrics, 185(1), 259 -- 282.
  16. Bugni, F.A., I.A. Canay, and P.Guggenberger (2012): “Distortions of Asymptotic Confidence Size in Locally Misspecified Moment Inequality Models” Econometrica, 80(4), 1741--1768.
  17. Hansen, L.P. (1982b): “Large Sample Properties of Generalized Method of Moments Estimators” Econometrica, 50(4), 1029--1054.
  18. 18.0 18.1 18.2 Hall, A.R., and A.Inoue (2003): “The large sample behaviour of the generalized method of moments estimator in misspecified models” Journal of Econometrics, 114(2), 361 -- 394.
  19. Lee, Y.-Y., and D.Bhattacharya (2019): “Applied welfare analysis for discrete choice with interval-data on income” Journal of Econometrics, 211(2), 361--387.
  20. 20.0 20.1 Kaido, H., and H.White (2013): “Estimating Misspecified Moment Inequality Models” in Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis: Essays in Honor of Halbert L. White Jr, ed. by X.Chen, and N.R. Swanson, pp. 331--361, Springer, New York, NY.
  21. 21.0 21.1 Masten, M.A., and A.Poirier (2018): “Salvaging Falsified Instrumental Variable Models” available at https://arxiv.org/abs/1812.11598.
  22. Andrews, D. W.K., and S.Kwon (2019): “Inference in Moment Inequality Models That Is Robust to Spurious Precision under Model Misspecification” available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3416831.