guide:7b0105e1fc: Difference between revisions

From Stochiki
No edit summary
mNo edit summary
Line 147: Line 147:
</math>
</math>
</div>
</div>
\label{sec:misspec}


Although partial identification often results from reducing the number of assumptions maintained in counterpart point identified models, care still needs to be taken in assessing the possible consequences of misspecification.
Although partial identification often results from reducing the number of assumptions maintained in counterpart point identified models, care still needs to be taken in assessing the possible consequences of misspecification.
This section's goal is to discuss the existing literature on the topic, and to provide some additional observations.
This section's goal is to discuss the existing literature on the topic, and to provide some additional observations.
To keep the notation light, I refer to the functional of interest as <math>\theta</math> throughout, without explicitly distinguishing whether it belongs to an infinite dimensional parameter space (as in the nonparametric analysis in [[guide:Ec36399528#sec:prob:distr |Section]]), or to a finite dimensional one (as in the semiparametric analysis in [[guide:521939d27a#sec:structural |Section]]).
To keep the notation light, I refer to the functional of interest as <math>\theta</math> throughout, without explicitly distinguishing whether it belongs to an infinite dimensional parameter space (as in the nonparametric analysis in [[guide:Ec36399528#sec:prob:distr |Section]]), or to a finite dimensional one (as in the semiparametric analysis in [[guide:521939d27a#sec:structural |Section]]).
The original nonparametric ‘`worst-case" bounds proposed by <ref name="man89"></ref> for the analysis of selectively observed data and discussed in [[guide:Ec36399528#sec:prob:distr |Section]] are not subject to the risk of misspecification, because they are based on the empirical evidence alone.
 
The original nonparametric “worst-case” bounds proposed by <ref name="man89"></ref> for the analysis of selectively observed data and discussed in [[guide:Ec36399528#sec:prob:distr |Section]] are not subject to the risk of misspecification, because they are based on the empirical evidence alone.
However, often researchers are willing and eager to maintain additional assumptions that can help shrink the bounds, so that one can learn more from the available data.
However, often researchers are willing and eager to maintain additional assumptions that can help shrink the bounds, so that one can learn more from the available data.
Indeed, early on <ref name="man90"></ref> proposed the use of exclusion restrictions in the form of mean independence assumptions.
Indeed, early on <ref name="man90"></ref> proposed the use of exclusion restrictions in the form of mean independence assumptions.
Section [[guide:Ec36399528#subsec:programme:eval |Treatment Effects with and without Instrumental Variables]] discusses related ideas within the context of nonparametric bounds on treatment effects, and <ref name="man03"></ref>{{rp|at=Chapter 2}} provides a thorough treatment of other types of exclusion restriction.
Section [[guide:Ec36399528#subsec:programme:eval |Treatment Effects with and without Instrumental Variables]] discusses related ideas within the context of nonparametric bounds on treatment effects, and <ref name="man03"></ref>{{rp|at=Chapter 2}} provides a thorough treatment of other types of exclusion restriction.
The literature reviewed throughout this chapter provides many more examples of assumptions that have proven useful for empirical research.
The literature reviewed throughout this chapter provides many more examples of assumptions that have proven useful for empirical research.
Broadly speaking, assumptions can be classified in two types <ref name="man03"></ref>{{rp|at=Chapter 2}}.
Broadly speaking, assumptions can be classified in two types <ref name="man03"></ref>{{rp|at=Chapter 2}}.
The first type is ’'non-refutable'': it may reduce the size of <math>\idr{\theta}</math>, but cannot lead to it being empty.
The first type is ’'non-refutable'': it may reduce the size of <math>\idr{\theta}</math>, but cannot lead to it being empty.
An example in the context of selectively observed data is that of exogenous selection, or data missing at random conditional on covariates and instruments (see Section [[guide:Ec36399528#subsec:missing_data |Selectively Observed Data]], p.~\pageref{subsec:missing_data}): under this assumption <math>\idr{\theta}</math> is a singleton, but the assumption cannot be refuted because it poses a distributional (independence) assumption on unobservables.
An example in the context of selectively observed data is that of exogenous selection, or data missing at random conditional on covariates and instruments (see Section [[guide:Ec36399528#subsec:missing_data |Selectively Observed Data]]): under this assumption <math>\idr{\theta}</math> is a singleton, but the assumption cannot be refuted because it poses a distributional (independence) assumption on unobservables.
 
The second type is ''refutable'': it may reduce the size of <math>\idr{\theta}</math>, and it may result in <math>\idr{\theta}=\emptyset</math> if it does not hold in the DGP.
The second type is ''refutable'': it may reduce the size of <math>\idr{\theta}</math>, and it may result in <math>\idr{\theta}=\emptyset</math> if it does not hold in the DGP.
An example in the context of treatment effects is the assumption of mean independence between response function at treatment <math>t</math> and instrumental variable <math>\ez</math>, see [[guide:Ec36399528#eq:ass:MI |eq:ass:MI]] in Section [[guide:Ec36399528#subsec:programme:eval |Treatment Effects with and without Instrumental Variables]].
An example in the context of treatment effects is the assumption of mean independence between response function at treatment <math>t</math> and instrumental variable <math>\ez</math>, see [[guide:Ec36399528#eq:ass:MI |eq:ass:MI]] in Section [[guide:Ec36399528#subsec:programme:eval |Treatment Effects with and without Instrumental Variables]].
There the sharp bounds on <math>\E_\sQ(\ey(t)|\ex=x)</math> are intersection bounds as in [[guide:Ec36399528#eq:intersection:bounds |eq:intersection:bounds]].
There the sharp bounds on <math>\E_\sQ(\ey(t)|\ex=x)</math> are intersection bounds as in [[guide:Ec36399528#eq:intersection:bounds |eq:intersection:bounds]].
If the instrument is invalid, the bounds can be empty.
If the instrument is invalid, the bounds can be empty.
<ref name="pon:tam11"></ref> consider the impact of misspecification on semiparametric partially identified models.
<ref name="pon:tam11"></ref> consider the impact of misspecification on semiparametric partially identified models.
One of their examples concerns a linear regression model of the form <math>\E_\sQ(\ey|\ex)=\theta^\top\ex</math> when only interval data is available for <math>\ey</math> (as in Section [[guide:Ec36399528#subsec:interval_data |Interval Data]]).
One of their examples concerns a linear regression model of the form <math>\E_\sQ(\ey|\ex)=\theta^\top\ex</math> when only interval data is available for <math>\ey</math> (as in Section [[guide:Ec36399528#subsec:interval_data |Interval Data]]).
In this context, <math>\idr{\theta}=\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),~\ex\text{-a.s.}\}</math>.
In this context, <math>\idr{\theta}=\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),\ex\text{-a.s.}\}</math>.
The concern is that the conditional expectation might not be linear.
The concern is that the conditional expectation might not be linear.
<ref name="pon:tam11"></ref> make two important observations.
<ref name="pon:tam11"></ref> make two important observations.
Line 178: Line 181:
This can happen, for example, if <math>\E_\sP(\yL|\ex)</math> and <math>\E_\sP(\yU|\ex)</math> are sufficiently nonlinear, even if they are relatively far from each other (e.g., <ref name="pon:tam11"></ref>{{rp|at=Figure 1}}).
This can happen, for example, if <math>\E_\sP(\yL|\ex)</math> and <math>\E_\sP(\yU|\ex)</math> are sufficiently nonlinear, even if they are relatively far from each other (e.g., <ref name="pon:tam11"></ref>{{rp|at=Figure 1}}).
Hence, caution should be taken when interpreting very tight partial identification results as indicative of a highly informative model and empirical evidence, as the possibility of model misspecification has to be taken into account.
Hence, caution should be taken when interpreting very tight partial identification results as indicative of a highly informative model and empirical evidence, as the possibility of model misspecification has to be taken into account.
These observations naturally lead to the questions of how to test for model misspecification in the presence of partial identification, and of what are the consequences of misspecification for the confidence sets discussed in Section [[guide:6d1a428897#subsec:CS |Confidence Sets Satisfying Various Coverage Notions]].
These observations naturally lead to the questions of how to test for model misspecification in the presence of partial identification, and of what are the consequences of misspecification for the confidence sets discussed in Section [[guide:6d1a428897#subsec:CS |Confidence Sets Satisfying Various Coverage Notions]].
With partial identification, a null hypothesis of correct model specification (and its alternative) can be expressed as
With partial identification, a null hypothesis of correct model specification (and its alternative) can be expressed as
Line 188: Line 192:
Tests for this hypothesis have been proposed both for the case of nonparametric as well as semiparametric partially identified models.
Tests for this hypothesis have been proposed both for the case of nonparametric as well as semiparametric partially identified models.
I refer to <ref name="san12"></ref> for specification tests in a partially identified nonparametric instrumental variable model; to <ref name="kit:sto18"></ref> for a nonparametric test in random utility models that checks whether a repeated cross section of demand data might have been generated by a population of rational consumers (thereby testing for the Axiom of Revealed Stochastic Preference); and to <ref name="gug:hah:kim08"></ref> and <ref name="bon:mag:mau12"></ref> for specification tests in linear moment (in)equality models.  
I refer to <ref name="san12"></ref> for specification tests in a partially identified nonparametric instrumental variable model; to <ref name="kit:sto18"></ref> for a nonparametric test in random utility models that checks whether a repeated cross section of demand data might have been generated by a population of rational consumers (thereby testing for the Axiom of Revealed Stochastic Preference); and to <ref name="gug:hah:kim08"></ref> and <ref name="bon:mag:mau12"></ref> for specification tests in linear moment (in)equality models.  
For the general class of moment inequality models discussed in [[guide:6d1a428897#sec:inference |Section]], <ref name="rom:sha08"></ref>, <ref name="and:gug09b"></ref>, <ref name="gal:hen09"></ref>, and <ref name="and:soa10"></ref> propose a specification test that rejects the model if <math>\CS</math> in [[guide:6d1a428897#eq:CS |eq:CS]] is empty, where <math>\CS</math> is defined with <math>c_{1-\alpha}(\vartheta)</math> determined so as to satisfy [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]] and approximated according to the methods proposed in the respective papers.
For the general class of moment inequality models discussed in [[guide:6d1a428897#sec:inference |Section]], <ref name="rom:sha08"></ref>, <ref name="and:gug09b"></ref>, <ref name="gal:hen09"></ref>, and <ref name="and:soa10"></ref> propose a specification test that rejects the model if <math>\CS</math> in [[guide:6d1a428897#eq:CS |eq:CS]] is empty, where <math>\CS</math> is defined with <math>c_{1-\alpha}(\vartheta)</math> determined so as to satisfy [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]] and approximated according to the methods proposed in the respective papers.
The resulting test, commonly referred to as ''by-product'' test because obtained as a by-product to the construction of a confidence set, takes the form
The resulting test, commonly referred to as ''by-product'' test because obtained as a by-product to the construction of a confidence set, takes the form
Line 203: Line 208:
\end{align}
\end{align}
</math>
</math>


An important feature of the by-product test is that the critical value <math>c_{1-\alpha}(\vartheta)</math> is not obtained to test for model misspecification, but it is obtained to insure the coverage requirement in [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]]; hence, it is obtained by working with the asymptotic distribution of <math>n\crit_n(\vartheta)</math>.
An important feature of the by-product test is that the critical value <math>c_{1-\alpha}(\vartheta)</math> is not obtained to test for model misspecification, but it is obtained to insure the coverage requirement in [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]]; hence, it is obtained by working with the asymptotic distribution of <math>n\crit_n(\vartheta)</math>.
Line 210: Line 214:
Their critical value is obtained by working with the asymptotic distribution of <math>\inf_{\vartheta\in\Theta}n\crit_n(\vartheta)</math>.
Their critical value is obtained by working with the asymptotic distribution of <math>\inf_{\vartheta\in\Theta}n\crit_n(\vartheta)</math>.
As such, their proposal resembles the classic approach to model specification testing (<math>J</math>-test) in point identified generalized method of moments models.\medskip
As such, their proposal resembles the classic approach to model specification testing (<math>J</math>-test) in point identified generalized method of moments models.\medskip
While it is possible to test for misspecification also in partially identified models, a word of caution is due on what might be the effects of misspecification on confidence sets constructed as in [[guide:6d1a428897#eq:CS |eq:CS]] with <math>c_{1-\alpha}</math> determined to insure [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]], as it is often done in empirical work.
While it is possible to test for misspecification also in partially identified models, a word of caution is due on what might be the effects of misspecification on confidence sets constructed as in [[guide:6d1a428897#eq:CS |eq:CS]] with <math>c_{1-\alpha}</math> determined to insure [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]], as it is often done in empirical work.
<ref name="bug:can:gug12"></ref> show that in the presence of local misspecification, confidence sets <math>\CS</math> designed to satisfy [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]] fail to do so.
<ref name="bug:can:gug12"></ref> show that in the presence of local misspecification, confidence sets <math>\CS</math> designed to satisfy [[guide:6d1a428897#eq:CS_coverage:point |eq:CS_coverage:point]] fail to do so.
Line 215: Line 220:
Indeed, we have seen that it can be empty if the misspecification is sufficiently severe.
Indeed, we have seen that it can be empty if the misspecification is sufficiently severe.
If it is less severe but still present, it may lead to inference that is erroneously interpreted as precise.
If it is less severe but still present, it may lead to inference that is erroneously interpreted as precise.
It is natural to wonder how this compares to the effect of misspecification on inference in point identified models.<ref group="Notes" >The considerations that I report here are based on conversations with Joachim Freyberger and notes that he shared with me, for which I thank him.</ref>
It is natural to wonder how this compares to the effect of misspecification on inference in point identified models.<ref group="Notes" >The considerations that I report here are based on conversations with Joachim Freyberger and notes that he shared with me, for which I thank him.</ref>
In that case, the rich set of tools available for inference allows one to avoid this problem.
In that case, the rich set of tools available for inference allows one to avoid this problem.
Line 238: Line 244:
with <math>c_{d,1-\alpha}</math> the <math>1-\alpha</math> critical value of a <math>\chi_d^2</math> (chi-squared random variable with <math>d</math> degrees of freedom).
with <math>c_{d,1-\alpha}</math> the <math>1-\alpha</math> critical value of a <math>\chi_d^2</math> (chi-squared random variable with <math>d</math> degrees of freedom).
Under standard regularity conditions (see <ref name="hal:ino03"></ref>) the confidence set in \eqref{eq:CS:Wald:point:id} covers with asymptotic probability <math>1-\alpha</math> the true vector <math>\theta</math> if the model is correctly specified, and the pseudo-true vector <math>\theta_*</math> if the model is incorrectly specified.
Under standard regularity conditions (see <ref name="hal:ino03"></ref>) the confidence set in \eqref{eq:CS:Wald:point:id} covers with asymptotic probability <math>1-\alpha</math> the true vector <math>\theta</math> if the model is correctly specified, and the pseudo-true vector <math>\theta_*</math> if the model is incorrectly specified.
In either case, \eqref{eq:CS:Wald:point:id} is never empty and its volume depends on <math>\hat{\Sigma}_*</math>.<ref group="Notes" >The effect of misspecification for maximum likelihood, least squares, and GMM estimators in ‘`point identified" models (by which I mean models where the population criterion function has a unique optimizer) has been studied in the literature; see, e.g., <ref name="whi82"></ref>, <ref name="gal:whi88"></ref>, <ref name="hal:ino03"></ref>, <ref name="han:lee19"></ref>, and references therein. These estimators have been shown to converge in probability to pseudo-true values, and it has been established that tests of hypotheses and confidence sets based on these estimators have correct asymptotic level with respect to the pseudo-true parameters, provided standard errors are computed appropriately. In the specific case of GMM discussed here, the pseudo-true value <math>\theta_*</math> depends on the choice of weighting matrix in \eqref{eq:GMM:estimator}: I have used <math>\hat\Xi</math>, but other choices are possible. I do not discuss this aspect of the problem here, but refer to <ref name="hal:ino03"></ref>.</ref>
In either case, \eqref{eq:CS:Wald:point:id} is never empty and its volume depends on <math>\hat{\Sigma}_*</math>.<ref group="Notes" >The effect of misspecification for maximum likelihood, least squares, and GMM estimators in ‘`point identified" models (by which I mean models where the population criterion function has a unique optimizer) has been studied in the literature; see, e.g., {{ref|name=whi82}}, {{ref|name=gal:whi88}}, {{ref|name=hal:ino03}}, {{ref|name=han:lee19}}, and references therein. These estimators have been shown to converge in probability to pseudo-true values, and it has been established that tests of hypotheses and confidence sets based on these estimators have correct asymptotic level with respect to the pseudo-true parameters, provided standard errors are computed appropriately. In the specific case of GMM discussed here, the pseudo-true value <math>\theta_*</math> depends on the choice of weighting matrix in \eqref{eq:GMM:estimator}: I have used <math>\hat\Xi</math>, but other choices are possible. I do not discuss this aspect of the problem here, but refer to {{ref|name=hal:ino03}}.</ref>
 
Even in the point identified case a confidence set constructed similarly to [[guide:6d1a428897#eq:CS |eq:CS]], i.e.,
Even in the point identified case a confidence set constructed similarly to [[guide:6d1a428897#eq:CS |eq:CS]], i.e.,


Line 251: Line 258:
However, this confidence set is empty with asymptotic probability <math>\P(\chi^2_{|\cJ|-d} > c_{|\cJ|,1-\alpha})</math>, due to the facts that <math>\P(\CS=\emptyset)=\P(\hat{\theta}_n\notin\CS)</math> and that <math>n\bar{m}_n(\hat{\theta}_n)\hat\Xi^{-1}\bar{m}_n(\hat{\theta}_n)\Rightarrow\chi^2_{|\cJ|-d}</math>.
However, this confidence set is empty with asymptotic probability <math>\P(\chi^2_{|\cJ|-d} > c_{|\cJ|,1-\alpha})</math>, due to the facts that <math>\P(\CS=\emptyset)=\P(\hat{\theta}_n\notin\CS)</math> and that <math>n\bar{m}_n(\hat{\theta}_n)\hat\Xi^{-1}\bar{m}_n(\hat{\theta}_n)\Rightarrow\chi^2_{|\cJ|-d}</math>.
Hence, it can be arbitrarily small.
Hence, it can be arbitrarily small.
In the very special case of a linear regression model with interval outcome data studied by <ref name="pon:tam11"></ref>, the procedure proposed by <ref name="ber:mol08"></ref> yields confidence sets that are always non-empty and whose volume depends on a covariance function that they derive (see <ref name="ber:mol08"></ref>{{rp|at=Theorem 4.3}}).
In the very special case of a linear regression model with interval outcome data studied by <ref name="pon:tam11"></ref>, the procedure proposed by <ref name="ber:mol08"></ref> yields confidence sets that are always non-empty and whose volume depends on a covariance function that they derive (see <ref name="ber:mol08"></ref>{{rp|at=Theorem 4.3}}).
If the linear regression model is correctly specified, and hence <math>\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),~\ex\text{-a.s.}\}\neq\emptyset</math>, these confidence sets cover <math>\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),~\ex\text{-a.s.}\}</math> with asymptotic probability at least equal to <math>1-\alpha</math>, as in [[guide:6d1a428897#eq:CS_coverage:set:pw |eq:CS_coverage:set:pw]].
If the linear regression model is correctly specified, and hence <math>\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),\ex\text{-a.s.}\}\neq\emptyset</math>, these confidence sets cover <math>\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),\ex\text{-a.s.}\}</math> with asymptotic probability at least equal to <math>1-\alpha</math>, as in [[guide:6d1a428897#eq:CS_coverage:set:pw |eq:CS_coverage:set:pw]].
Even if the model is misspecified and <math>\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),~\ex\text{-a.s.}\}=\emptyset</math>, the confidence sets cover the sharp identification region for the parameters of the best linear predictor of <math>\ey|\ex</math>, which can be viewed as a pseudo-true set, with probability exactly equal to <math>1-\alpha</math>.  
Even if the model is misspecified and <math>\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),\ex\text{-a.s.}\}=\emptyset</math>, the confidence sets cover the sharp identification region for the parameters of the best linear predictor of <math>\ey|\ex</math>, which can be viewed as a pseudo-true set, with probability exactly equal to <math>1-\alpha</math>.  
The test statistic that <ref name="ber:mol08"></ref> use is based on the Hausdorff distance between the estimator and the hypothesized set, and as such is a generalization of the standard Wald-statistic to the set-valued case.
The test statistic that <ref name="ber:mol08"></ref> use is based on the Hausdorff distance between the estimator and the hypothesized set, and as such is a generalization of the standard Wald-statistic to the set-valued case.
These considerations can be extended to other models.
These considerations can be extended to other models.
Line 267: Line 275:
This set can be interpreted as a pseudo-true set.
This set can be interpreted as a pseudo-true set.
However, <ref name="mas:poi18"></ref> do not provide methods for inference.
However, <ref name="mas:poi18"></ref> do not provide methods for inference.
The implications of misspecification in partially identified models remain an open and important question in the literature.
The implications of misspecification in partially identified models remain an open and important question in the literature.
For example, it would be useful to have notions of pseudo-true set that parallel those of pseudo-true value in the point identified case.
For example, it would be useful to have notions of pseudo-true set that parallel those of pseudo-true value in the point identified case.
It would also be important to provide methods for the construction of confidence sets in general moment inequality models that do not exhibit spurious precision (i.e., are arbitrarily small) when the model is misspecified.
It would also be important to provide methods for the construction of confidence sets in general moment inequality models that do not exhibit spurious precision (i.e., are arbitrarily small) when the model is misspecified. Recent work by <ref name="and:kwo19"></ref> addresses some of these questions.
Recent work by <ref name="and:kwo19"></ref> addresses some of these questions.
 
==General references==
==General references==
{{cite arXiv|last1=Molinari|first1=Francesca|year=2020|title=Microeconometrics with Partial Identification|eprint=2004.11751|class=econ.EM}}
{{cite arXiv|last1=Molinari|first1=Francesca|year=2020|title=Microeconometrics with Partial Identification|eprint=2004.11751|class=econ.EM}}

Revision as of 13:35, 30 May 2024

[math] \newcommand{\edis}{\stackrel{d}{=}} \newcommand{\fd}{\stackrel{f.d.}{\rightarrow}} \newcommand{\dom}{\operatorname{dom}} \newcommand{\eig}{\operatorname{eig}} \newcommand{\epi}{\operatorname{epi}} \newcommand{\lev}{\operatorname{lev}} \newcommand{\card}{\operatorname{card}} \newcommand{\comment}{\textcolor{Green}} \newcommand{\B}{\mathbb{B}} \newcommand{\C}{\mathbb{C}} \newcommand{\G}{\mathbb{G}} \newcommand{\M}{\mathbb{M}} \newcommand{\N}{\mathbb{N}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\T}{\mathbb{T}} \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\W}{\mathbb{W}} \newcommand{\bU}{\mathfrak{U}} \newcommand{\bu}{\mathfrak{u}} \newcommand{\bI}{\mathfrak{I}} \newcommand{\cA}{\mathcal{A}} \newcommand{\cB}{\mathcal{B}} \newcommand{\cC}{\mathcal{C}} \newcommand{\cD}{\mathcal{D}} \newcommand{\cE}{\mathcal{E}} \newcommand{\cF}{\mathcal{F}} \newcommand{\cG}{\mathcal{G}} \newcommand{\cg}{\mathcal{g}} \newcommand{\cH}{\mathcal{H}} \newcommand{\cI}{\mathcal{I}} \newcommand{\cJ}{\mathcal{J}} \newcommand{\cK}{\mathcal{K}} \newcommand{\cL}{\mathcal{L}} \newcommand{\cM}{\mathcal{M}} \newcommand{\cN}{\mathcal{N}} \newcommand{\cO}{\mathcal{O}} \newcommand{\cP}{\mathcal{P}} \newcommand{\cQ}{\mathcal{Q}} \newcommand{\cR}{\mathcal{R}} \newcommand{\cS}{\mathcal{S}} \newcommand{\cT}{\mathcal{T}} \newcommand{\cU}{\mathcal{U}} \newcommand{\cu}{\mathcal{u}} \newcommand{\cV}{\mathcal{V}} \newcommand{\cW}{\mathcal{W}} \newcommand{\cX}{\mathcal{X}} \newcommand{\cY}{\mathcal{Y}} \newcommand{\cZ}{\mathcal{Z}} \newcommand{\sF}{\mathsf{F}} \newcommand{\sM}{\mathsf{M}} \newcommand{\sG}{\mathsf{G}} \newcommand{\sT}{\mathsf{T}} \newcommand{\sB}{\mathsf{B}} \newcommand{\sC}{\mathsf{C}} \newcommand{\sP}{\mathsf{P}} \newcommand{\sQ}{\mathsf{Q}} \newcommand{\sq}{\mathsf{q}} \newcommand{\sR}{\mathsf{R}} \newcommand{\sS}{\mathsf{S}} \newcommand{\sd}{\mathsf{d}} \newcommand{\cp}{\mathsf{p}} \newcommand{\cc}{\mathsf{c}} \newcommand{\cf}{\mathsf{f}} \newcommand{\eU}{{\boldsymbol{U}}} \newcommand{\eb}{{\boldsymbol{b}}} \newcommand{\ed}{{\boldsymbol{d}}} \newcommand{\eu}{{\boldsymbol{u}}} \newcommand{\ew}{{\boldsymbol{w}}} \newcommand{\ep}{{\boldsymbol{p}}} \newcommand{\eX}{{\boldsymbol{X}}} \newcommand{\ex}{{\boldsymbol{x}}} \newcommand{\eY}{{\boldsymbol{Y}}} \newcommand{\eB}{{\boldsymbol{B}}} \newcommand{\eC}{{\boldsymbol{C}}} \newcommand{\eD}{{\boldsymbol{D}}} \newcommand{\eW}{{\boldsymbol{W}}} \newcommand{\eR}{{\boldsymbol{R}}} \newcommand{\eQ}{{\boldsymbol{Q}}} \newcommand{\eS}{{\boldsymbol{S}}} \newcommand{\eT}{{\boldsymbol{T}}} \newcommand{\eA}{{\boldsymbol{A}}} \newcommand{\eH}{{\boldsymbol{H}}} \newcommand{\ea}{{\boldsymbol{a}}} \newcommand{\ey}{{\boldsymbol{y}}} \newcommand{\eZ}{{\boldsymbol{Z}}} \newcommand{\eG}{{\boldsymbol{G}}} \newcommand{\ez}{{\boldsymbol{z}}} \newcommand{\es}{{\boldsymbol{s}}} \newcommand{\et}{{\boldsymbol{t}}} \newcommand{\ev}{{\boldsymbol{v}}} \newcommand{\ee}{{\boldsymbol{e}}} \newcommand{\eq}{{\boldsymbol{q}}} \newcommand{\bnu}{{\boldsymbol{\nu}}} \newcommand{\barX}{\overline{\eX}} \newcommand{\eps}{\varepsilon} \newcommand{\Eps}{\mathcal{E}} \newcommand{\carrier}{{\mathfrak{X}}} \newcommand{\Ball}{{\mathbb{B}}^{d}} \newcommand{\Sphere}{{\mathbb{S}}^{d-1}} \newcommand{\salg}{\mathfrak{F}} \newcommand{\ssalg}{\mathfrak{B}} \newcommand{\one}{\mathbf{1}} \newcommand{\Prob}[1]{\P\{#1\}} \newcommand{\yL}{\ey_{\mathrm{L}}} \newcommand{\yU}{\ey_{\mathrm{U}}} \newcommand{\yLi}{\ey_{\mathrm{L}i}} \newcommand{\yUi}{\ey_{\mathrm{U}i}} \newcommand{\xL}{\ex_{\mathrm{L}}} \newcommand{\xU}{\ex_{\mathrm{U}}} \newcommand{\vL}{\ev_{\mathrm{L}}} \newcommand{\vU}{\ev_{\mathrm{U}}} \newcommand{\dist}{\mathbf{d}} \newcommand{\rhoH}{\dist_{\mathrm{H}}} \newcommand{\ti}{\to\infty} \newcommand{\comp}[1]{#1^\mathrm{c}} \newcommand{\ThetaI}{\Theta_{\mathrm{I}}} \newcommand{\crit}{q} \newcommand{\CS}{CS_n} \newcommand{\CI}{CI_n} \newcommand{\cv}[1]{\hat{c}_{n,1-\alpha}(#1)} \newcommand{\idr}[1]{\mathcal{H}_\sP[#1]} \newcommand{\outr}[1]{\mathcal{O}_\sP[#1]} \newcommand{\idrn}[1]{\hat{\mathcal{H}}_{\sP_n}[#1]} \newcommand{\outrn}[1]{\mathcal{O}_{\sP_n}[#1]} \newcommand{\email}[1]{\texttt{#1}} \newcommand{\possessivecite}[1]{\ltref name="#1"\gt\lt/ref\gt's \citeyear{#1}} \newcommand\xqed[1]{% \leavevmode\unskip\penalty9999 \hbox{}\nobreak\hfill \quad\hbox{#1}} \newcommand\qedex{\xqed{$\triangle$}} \newcommand\independent{\protect\mathpalette{\protect\independenT}{\perp}} \DeclareMathOperator{\Int}{Int} \DeclareMathOperator{\conv}{conv} \DeclareMathOperator{\cov}{Cov} \DeclareMathOperator{\var}{Var} \DeclareMathOperator{\Sel}{Sel} \DeclareMathOperator{\Bel}{Bel} \DeclareMathOperator{\cl}{cl} \DeclareMathOperator{\sgn}{sgn} \DeclareMathOperator{\essinf}{essinf} \DeclareMathOperator{\esssup}{esssup} \newcommand{\mathds}{\mathbb} \renewcommand{\P}{\mathbb{P}} [/math]

Although partial identification often results from reducing the number of assumptions maintained in counterpart point identified models, care still needs to be taken in assessing the possible consequences of misspecification. This section's goal is to discuss the existing literature on the topic, and to provide some additional observations. To keep the notation light, I refer to the functional of interest as [math]\theta[/math] throughout, without explicitly distinguishing whether it belongs to an infinite dimensional parameter space (as in the nonparametric analysis in Section), or to a finite dimensional one (as in the semiparametric analysis in Section).

The original nonparametric “worst-case” bounds proposed by [1] for the analysis of selectively observed data and discussed in Section are not subject to the risk of misspecification, because they are based on the empirical evidence alone. However, often researchers are willing and eager to maintain additional assumptions that can help shrink the bounds, so that one can learn more from the available data. Indeed, early on [2] proposed the use of exclusion restrictions in the form of mean independence assumptions. Section Treatment Effects with and without Instrumental Variables discusses related ideas within the context of nonparametric bounds on treatment effects, and [3](Chapter 2) provides a thorough treatment of other types of exclusion restriction. The literature reviewed throughout this chapter provides many more examples of assumptions that have proven useful for empirical research.

Broadly speaking, assumptions can be classified in two types [3](Chapter 2). The first type is ’'non-refutable: it may reduce the size of [math]\idr{\theta}[/math], but cannot lead to it being empty. An example in the context of selectively observed data is that of exogenous selection, or data missing at random conditional on covariates and instruments (see Section Selectively Observed Data): under this assumption [math]\idr{\theta}[/math] is a singleton, but the assumption cannot be refuted because it poses a distributional (independence) assumption on unobservables.

The second type is refutable: it may reduce the size of [math]\idr{\theta}[/math], and it may result in [math]\idr{\theta}=\emptyset[/math] if it does not hold in the DGP. An example in the context of treatment effects is the assumption of mean independence between response function at treatment [math]t[/math] and instrumental variable [math]\ez[/math], see eq:ass:MI in Section Treatment Effects with and without Instrumental Variables. There the sharp bounds on [math]\E_\sQ(\ey(t)|\ex=x)[/math] are intersection bounds as in eq:intersection:bounds. If the instrument is invalid, the bounds can be empty.

[4] consider the impact of misspecification on semiparametric partially identified models. One of their examples concerns a linear regression model of the form [math]\E_\sQ(\ey|\ex)=\theta^\top\ex[/math] when only interval data is available for [math]\ey[/math] (as in Section Interval Data). In this context, [math]\idr{\theta}=\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),\ex\text{-a.s.}\}[/math]. The concern is that the conditional expectation might not be linear. [4] make two important observations. First, they argue that the set [math]\idr{\theta}[/math] is of difficult interpretation when the model is misspecified. When [math]\ey[/math] is perfectly observed, if the conditional expectation is not linear, the output of ordinary least squares can be readily interpreted as the best linear approximation to [math]\E_\sQ(\ey|\ex)[/math]. This is not the case for [math]\idr{\theta}[/math] when only the interval data [math][\yL,\yU][/math] is observed. They therefore propose to work with the set of best linear predictors for [math]\ey|\ex[/math] even in the partially identified case (rather than fully exploit the linearity assumption). The resulting set is the one derived by [5] and reported in Theorem SIR-. [4] work with projections of this set, which coincide with the bounds in [6]. [4] also point out that depending on the DGP, misspecification can cause [math]\idr{\theta}[/math] to be spuriously tight. This can happen, for example, if [math]\E_\sP(\yL|\ex)[/math] and [math]\E_\sP(\yU|\ex)[/math] are sufficiently nonlinear, even if they are relatively far from each other (e.g., [4](Figure 1)). Hence, caution should be taken when interpreting very tight partial identification results as indicative of a highly informative model and empirical evidence, as the possibility of model misspecification has to be taken into account.

These observations naturally lead to the questions of how to test for model misspecification in the presence of partial identification, and of what are the consequences of misspecification for the confidence sets discussed in Section Confidence Sets Satisfying Various Coverage Notions. With partial identification, a null hypothesis of correct model specification (and its alternative) can be expressed as

[[math]] \begin{align*} H_0:\idr{\theta}\neq\emptyset;\quad H_1:\idr{\theta}=\emptyset. \end{align*} [[/math]]

Tests for this hypothesis have been proposed both for the case of nonparametric as well as semiparametric partially identified models. I refer to [7] for specification tests in a partially identified nonparametric instrumental variable model; to [8] for a nonparametric test in random utility models that checks whether a repeated cross section of demand data might have been generated by a population of rational consumers (thereby testing for the Axiom of Revealed Stochastic Preference); and to [9] and [10] for specification tests in linear moment (in)equality models.

For the general class of moment inequality models discussed in Section, [11], [12], [13], and [14] propose a specification test that rejects the model if [math]\CS[/math] in eq:CS is empty, where [math]\CS[/math] is defined with [math]c_{1-\alpha}(\vartheta)[/math] determined so as to satisfy eq:CS_coverage:point and approximated according to the methods proposed in the respective papers. The resulting test, commonly referred to as by-product test because obtained as a by-product to the construction of a confidence set, takes the form

[[math]] \begin{align*} \phi=\one(\CS=\emptyset)=\one\left(\inf_{\vartheta\in\Theta}n\crit_n(\vartheta) \gt c_{1-\alpha}(\vartheta)\right). \end{align*} [[/math]]

Denoting by [math]\cP_0[/math] the collection of [math]\sP\in\cP[/math] such that [math]\idr{\theta}\neq\emptyset[/math], one has that the by-product test achieves uniform size control [15](Theorem C.2):

[[math]] \begin{align} \limsup_{n\to\infty}\sup_{\sP\in\cP_0}\E_\sP(\phi)\le\alpha.\label{eq:misp:test:uniform:size} \end{align} [[/math]]

An important feature of the by-product test is that the critical value [math]c_{1-\alpha}(\vartheta)[/math] is not obtained to test for model misspecification, but it is obtained to insure the coverage requirement in eq:CS_coverage:point; hence, it is obtained by working with the asymptotic distribution of [math]n\crit_n(\vartheta)[/math]. [15] propose more powerful model specification tests, using a critical value [math]c_{1-\alpha}[/math] that they obtain to ensure that \eqref{eq:misp:test:uniform:size}, rather than eq:CS_coverage:point, holds. In particular, they show that their tests dominate the by-product test in terms of power in any finite sample and in the asymptotic limit. Their critical value is obtained by working with the asymptotic distribution of [math]\inf_{\vartheta\in\Theta}n\crit_n(\vartheta)[/math]. As such, their proposal resembles the classic approach to model specification testing ([math]J[/math]-test) in point identified generalized method of moments models.\medskip

While it is possible to test for misspecification also in partially identified models, a word of caution is due on what might be the effects of misspecification on confidence sets constructed as in eq:CS with [math]c_{1-\alpha}[/math] determined to insure eq:CS_coverage:point, as it is often done in empirical work. [16] show that in the presence of local misspecification, confidence sets [math]\CS[/math] designed to satisfy eq:CS_coverage:point fail to do so. In practice, the concern is that when the model is misspecified [math]\CS[/math] might be spuriously small. Indeed, we have seen that it can be empty if the misspecification is sufficiently severe. If it is less severe but still present, it may lead to inference that is erroneously interpreted as precise.

It is natural to wonder how this compares to the effect of misspecification on inference in point identified models.[Notes 1] In that case, the rich set of tools available for inference allows one to avoid this problem. Consider for example a point identified generalized method of moments model with moment conditions [math]\E_\sP(m_j(\ew;\theta))=0[/math], [math]j=1,\dots,|\cJ|[/math], and [math]|\cJ| \gt d[/math]. Let [math]m[/math] denote the vector that stacks each of the [math]m_j[/math] functions, and let the estimator of [math]\theta[/math] be

[[math]] \begin{align} \hat{\theta}_n=\argmin_{\vartheta\in\Theta}n\bar{m}_n(\vartheta)^\top\hat\Xi^{-1}\bar{m}_n(\vartheta),\label{eq:GMM:estimator} \end{align} [[/math]]

with [math]\hat\Xi[/math] a consistent estimator of [math]\Xi=\E_\sP[m(\ew;\theta) m(\ew;\theta)^\top][/math] and [math]\bar{m}_n(\vartheta)[/math] the sample analog of [math]\E_\sP(m(\ew;\vartheta))[/math]. As shown by [17] for correctly specified models, the distribution of [math]\sqrt{n}(\hat{\theta}_n-\theta)[/math] converges to a Normal with mean vector equal to zero and covariance matrix [math]\Sigma[/math]. [18] show that when the model is subject to non-local misspecification, [math]\sqrt{n}(\hat{\theta}_n-\theta_*)[/math] converges to a Normal with mean vector equal to zero and covariance matrix [math]\Sigma_*[/math], where [math]\theta_*[/math] is the pseudo-true vector (the probability limit of \eqref{eq:GMM:estimator}) and where [math]\Sigma_*[/math] equals [math]\Sigma[/math] if the model is correctly specified, and differs from it otherwise. Let [math]\hat{\Sigma}_*[/math] be a consistent estimator of [math]\Sigma_*[/math] as in [18]. Define the Wald-statistic based confidence ellipsoid

[[math]] \begin{align} \{\vartheta\in\Theta:n(\hat{\theta}_n-\vartheta)^\top\hat{\Sigma}_*^{-1}(\hat{\theta}_n-\vartheta)\le c_{d,1-\alpha}\},\label{eq:CS:Wald:point:id} \end{align} [[/math]]

with [math]c_{d,1-\alpha}[/math] the [math]1-\alpha[/math] critical value of a [math]\chi_d^2[/math] (chi-squared random variable with [math]d[/math] degrees of freedom). Under standard regularity conditions (see [18]) the confidence set in \eqref{eq:CS:Wald:point:id} covers with asymptotic probability [math]1-\alpha[/math] the true vector [math]\theta[/math] if the model is correctly specified, and the pseudo-true vector [math]\theta_*[/math] if the model is incorrectly specified. In either case, \eqref{eq:CS:Wald:point:id} is never empty and its volume depends on [math]\hat{\Sigma}_*[/math].[Notes 2]

Even in the point identified case a confidence set constructed similarly to eq:CS, i.e.,

[[math]] \begin{align} \{\vartheta\in\Theta:n\bar{m}_n(\vartheta)\hat\Xi^{-1}\bar{m}_n(\vartheta)\le c_{|\cJ|,1-\alpha}\},\label{eq:CS:AR:point:id} \end{align} [[/math]]

where [math]c_{|\cJ|,1-\alpha}[/math] is the [math]1-\alpha[/math] critical value of a [math]\chi^2_{|\cJ|}[/math], incurs the same problems as its partial identification counterpart. Under standard regularity conditions, if the model is correctly specified, the confidence set in \eqref{eq:CS:AR:point:id} covers [math]\theta[/math] with asymptotic probability [math]1-\alpha[/math], because [math]n\bar{m}_n(\vartheta)\hat\Xi^{-1}\bar{m}_n(\vartheta)\Rightarrow\chi^2_{|\cJ|}[/math]. However, this confidence set is empty with asymptotic probability [math]\P(\chi^2_{|\cJ|-d} \gt c_{|\cJ|,1-\alpha})[/math], due to the facts that [math]\P(\CS=\emptyset)=\P(\hat{\theta}_n\notin\CS)[/math] and that [math]n\bar{m}_n(\hat{\theta}_n)\hat\Xi^{-1}\bar{m}_n(\hat{\theta}_n)\Rightarrow\chi^2_{|\cJ|-d}[/math]. Hence, it can be arbitrarily small.

In the very special case of a linear regression model with interval outcome data studied by [4], the procedure proposed by [5] yields confidence sets that are always non-empty and whose volume depends on a covariance function that they derive (see [5](Theorem 4.3)). If the linear regression model is correctly specified, and hence [math]\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),\ex\text{-a.s.}\}\neq\emptyset[/math], these confidence sets cover [math]\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),\ex\text{-a.s.}\}[/math] with asymptotic probability at least equal to [math]1-\alpha[/math], as in eq:CS_coverage:set:pw. Even if the model is misspecified and [math]\{\vartheta\in\Theta:\E_\sP(\yL|\ex)\le \vartheta^\top\ex \le\E_\sP(\yU|\ex),\ex\text{-a.s.}\}=\emptyset[/math], the confidence sets cover the sharp identification region for the parameters of the best linear predictor of [math]\ey|\ex[/math], which can be viewed as a pseudo-true set, with probability exactly equal to [math]1-\alpha[/math]. The test statistic that [5] use is based on the Hausdorff distance between the estimator and the hypothesized set, and as such is a generalization of the standard Wald-statistic to the set-valued case. These considerations can be extended to other models. For example, [19] study empirical measurement of Hicksian consumer welfare with interval data on income. When the model is misspecified, they provide a best parametric approximation to demand and welfare based on the support function method, and inference procedures for this approximation. For other moment inequality models, [20] propose to build a pseudo-true set [math]\mathcal{H}_\sP^*[\theta][/math] that is obtained through a two-step procedure. In the first step one obtains a nonparametric estimator of the function(s) for which the researcher wants to impose a parametric structure. In the second step one obtains the set [math]\mathcal{H}_\sP^*[\theta][/math] as the collection of least squares projections of the set in the first step, on the parametric class imposed. [20] show that under regularity conditions the pseudo-true set can be consistently estimated, and derive rates of convergence for the estimator; however, they do not provide methods to obtain confidence sets. While conceptually valuable, their construction appears to be computationally difficult. [21] propose that when a model is falsified (in the sense that [math]\idr{\theta}[/math] is empty) one should report the ’'falsification frontier: the boundary between the set of assumptions which falsify the model and those which do not, obtained through continuous relaxations of the baseline assumptions of concern. The researcher can then present the set [math]\idr{\theta}[/math] that results if the true model lies somewhere on this frontier. This set can be interpreted as a pseudo-true set. However, [21] do not provide methods for inference.

The implications of misspecification in partially identified models remain an open and important question in the literature. For example, it would be useful to have notions of pseudo-true set that parallel those of pseudo-true value in the point identified case. It would also be important to provide methods for the construction of confidence sets in general moment inequality models that do not exhibit spurious precision (i.e., are arbitrarily small) when the model is misspecified. Recent work by [22] addresses some of these questions.

General references

Molinari, Francesca (2020). "Microeconometrics with Partial Identification". arXiv:2004.11751 [econ.EM].

Notes

  1. The considerations that I report here are based on conversations with Joachim Freyberger and notes that he shared with me, for which I thank him.
  2. The effect of misspecification for maximum likelihood, least squares, and GMM estimators in ‘`point identified" models (by which I mean models where the population criterion function has a unique optimizer) has been studied in the literature; see, e.g., [1], [2], [3], [4], and references therein. These estimators have been shown to converge in probability to pseudo-true values, and it has been established that tests of hypotheses and confidence sets based on these estimators have correct asymptotic level with respect to the pseudo-true parameters, provided standard errors are computed appropriately. In the specific case of GMM discussed here, the pseudo-true value [math]\theta_*[/math] depends on the choice of weighting matrix in \eqref{eq:GMM:estimator}: I have used [math]\hat\Xi[/math], but other choices are possible. I do not discuss this aspect of the problem here, but refer to [5].

References

  1. Cite error: Invalid <ref> tag; no text was provided for refs named man89
  2. Cite error: Invalid <ref> tag; no text was provided for refs named man90
  3. 3.0 3.1 Cite error: Invalid <ref> tag; no text was provided for refs named man03
  4. 4.0 4.1 4.2 4.3 4.4 4.5 Cite error: Invalid <ref> tag; no text was provided for refs named pon:tam11
  5. 5.0 5.1 5.2 5.3 Cite error: Invalid <ref> tag; no text was provided for refs named ber:mol08
  6. Cite error: Invalid <ref> tag; no text was provided for refs named sto07
  7. Cite error: Invalid <ref> tag; no text was provided for refs named san12
  8. Cite error: Invalid <ref> tag; no text was provided for refs named kit:sto18
  9. Cite error: Invalid <ref> tag; no text was provided for refs named gug:hah:kim08
  10. Cite error: Invalid <ref> tag; no text was provided for refs named bon:mag:mau12
  11. Cite error: Invalid <ref> tag; no text was provided for refs named rom:sha08
  12. Cite error: Invalid <ref> tag; no text was provided for refs named and:gug09b
  13. Cite error: Invalid <ref> tag; no text was provided for refs named gal:hen09
  14. Cite error: Invalid <ref> tag; no text was provided for refs named and:soa10
  15. 15.0 15.1 Cite error: Invalid <ref> tag; no text was provided for refs named bug:can:shi15
  16. Cite error: Invalid <ref> tag; no text was provided for refs named bug:can:gug12
  17. Cite error: Invalid <ref> tag; no text was provided for refs named han82
  18. 18.0 18.1 18.2 Cite error: Invalid <ref> tag; no text was provided for refs named hal:ino03
  19. Cite error: Invalid <ref> tag; no text was provided for refs named lee:bha19
  20. 20.0 20.1 Cite error: Invalid <ref> tag; no text was provided for refs named kai:whi13
  21. 21.0 21.1 Cite error: Invalid <ref> tag; no text was provided for refs named mas:poi18
  22. Cite error: Invalid <ref> tag; no text was provided for refs named and:kwo19