guide:6d1a428897: Difference between revisions

From Stochiki
mNo edit summary
mNo edit summary
 
(2 intermediate revisions by the same user not shown)
Line 150: Line 150:
==<span id="subsec:framework:inference"></span>Framework and Scope of the Discussion==
==<span id="subsec:framework:inference"></span>Framework and Scope of the Discussion==


The identification analysis carried out in [[guide:Ec36399528#sec:prob:distr |Section-]] [[guide:521939d27a#sec:structural |Section]] presumes knowledge of the joint distribution <math>\sP</math> of the observable variables.
The identification analysis carried out in [[guide:Ec36399528#sec:prob:distr |Section-]] [[guide:8d94784544#sec:structural |Section]] presumes knowledge of the joint distribution <math>\sP</math> of the observable variables.
That is, it presumes that <math>\sP</math> can be learned with certainty from observation of the entire population.
That is, it presumes that <math>\sP</math> can be learned with certainty from observation of the entire population.
In practice, one observes a sample of size <math>n</math> drawn from <math>\sP</math>.
In practice, one observes a sample of size <math>n</math> drawn from <math>\sP</math>.
For simplicity I assume it to be a random sample.<ref group="Notes" >This assumption is often maintained in the literature. See, e.g., {{ref|name=and:soa10}} for a treatment of inference with dependent observations. {{ref|name=eps:kai:seo16}} study inference in games of complete information as in Identification [[guide:521939d27a#IP:entry_game |Problem]], imposing the i.i.d. assumption on the unobserved payoff shifters <math>\{\eps_{i1},\eps_{i2}\}_{i=1}^n</math>. The authors note that because the selection mechanism picking the equilibrium played in the regions of multiplicity (see Section [[guide:521939d27a#subsec:multiple:eq |Static, Simultaneous-Move Finite Games with Multiple Equilibria]]) is left completely unspecified and may be arbitrarily correlated across markets, the resulting observed variables <math>\{\ew_i\}_{i=1}^n</math> may not be independent and identically distributed, and they propose an inference method to address this issue.</ref>
For simplicity I assume it to be a random sample.<ref group="Notes" >This assumption is often maintained in the literature. See, e.g., {{ref|name=and:soa10}} for a treatment of inference with dependent observations. {{ref|name=eps:kai:seo16}} study inference in games of complete information as in Identification [[guide:D084086519#IP:entry_game |Problem]], imposing the i.i.d. assumption on the unobserved payoff shifters <math>\{\eps_{i1},\eps_{i2}\}_{i=1}^n</math>. The authors note that because the selection mechanism picking the equilibrium played in the regions of multiplicity (see Section [[guide:D084086519#subsec:multiple:eq |Static, Simultaneous-Move Finite Games with Multiple Equilibria]]) is left completely unspecified and may be arbitrarily correlated across markets, the resulting observed variables <math>\{\ew_i\}_{i=1}^n</math> may not be independent and identically distributed, and they propose an inference method to address this issue.</ref>
Statistical inference on <math>\idr{\theta}</math> needs to be conducted using knowledge of <math>\sP_n</math>, the empirical distribution of the observable outcomes and covariates.
Statistical inference on <math>\idr{\theta}</math> needs to be conducted using knowledge of <math>\sP_n</math>, the empirical distribution of the observable outcomes and covariates.
Because <math>\idr{\theta}</math> is not a singleton, this task is particularly delicate.  
Because <math>\idr{\theta}</math> is not a singleton, this task is particularly delicate.  
To start, care is required to choose a proper notion of consistency for a set estimator <math>\idrn{\theta}</math> and to obtain palatable conditions under which such consistency attains.
To start, care is required to choose a proper notion of consistency for a set estimator <math>\idrn{\theta}</math> and to obtain palatable conditions under which such consistency attains.
Next, the asymptotic behavior of statistics designed to test hypothesis or build confidence sets for <math>\idr{\theta}</math> or for <math>\vartheta\in\idr{\theta}</math> might change with <math>\vartheta</math>, creating technical challenges for the construction of confidence sets that are not encountered when <math>\theta</math> is point identified.
Next, the asymptotic behavior of statistics designed to test hypothesis or build confidence sets for <math>\idr{\theta}</math> or for <math>\vartheta\in\idr{\theta}</math> might change with <math>\vartheta</math>, creating technical challenges for the construction of confidence sets that are not encountered when <math>\theta</math> is point identified.
Many of the sharp identification regions derived in [[guide:Ec36399528#sec:prob:distr|Section-]] [[guide:521939d27a#sec:structural |Section]] can be written as collections of vectors <math>\vartheta\in\Theta</math> that satisfy conditional or unconditional moment (in)equalities.
Many of the sharp identification regions derived in [[guide:Ec36399528#sec:prob:distr|Section-]] [[guide:8d94784544#sec:structural |Section]] can be written as collections of vectors <math>\vartheta\in\Theta</math> that satisfy conditional or unconditional moment (in)equalities.
For simplicity, I assume that <math>\Theta</math> is a compact and convex subset of <math>\R^d</math>, and I use the formalization for the case of a finite number of unconditional moment (in)equalities:
For simplicity, I assume that <math>\Theta</math> is a compact and convex subset of <math>\R^d</math>, and I use the formalization for the case of a finite number of unconditional moment (in)equalities:


Line 167: Line 167:
\end{align}
\end{align}
</math>
</math>
In \eqref{eq:sharp_id_for_inference}, <math>\ew_i\in\cW\subseteq\R^{d_\cW}</math> is a random vector collecting all observable variables, with <math>\ew\sim\sP</math>; <math>m_j:\cW\times\Theta\to\R</math>, <math>j\in\cJ\equiv\cJ_1\cup\cJ_2</math>, are known measurable functions characterizing the model; and <math>\cJ</math> is a finite set equal to <math>\{1,\dots,|\cJ|\}</math>.<ref group="Notes" >Examples where the set <math>\cJ</math> is a compact set (e.g., a unit ball) rather than a finite set include the case of best linear prediction with interval outcome and covariate data, see characterization [[guide:Ec36399528#eq:ThetaI:BLP |eq:ThetaI:BLP]] on p.\pageref{eq:ThetaI:BLP}, and the case of entry games with multiple mixed strategy Nash equilibria, see characterization [[guide:521939d27a#eq:SIR_sharp_mixed_sup |eq:SIR_sharp_mixed_sup]] on p.\pageref{eq:SIR_sharp_mixed_sup}.
In \eqref{eq:sharp_id_for_inference}, <math>\ew_i\in\cW\subseteq\R^{d_\cW}</math> is a random vector collecting all observable variables, with <math>\ew\sim\sP</math>; <math>m_j:\cW\times\Theta\to\R</math>, <math>j\in\cJ\equiv\cJ_1\cup\cJ_2</math>, are known measurable functions characterizing the model; and <math>\cJ</math> is a finite set equal to <math>\{1,\dots,|\cJ|\}</math>.<ref group="Notes" >Examples where the set <math>\cJ</math> is a compact set (e.g., a unit ball) rather than a finite set include the case of best linear prediction with interval outcome and covariate data, see characterization [[guide:Ec36399528#eq:ThetaI:BLP |eq:ThetaI:BLP]] on p.\pageref{eq:ThetaI:BLP}, and the case of entry games with multiple mixed strategy Nash equilibria, see characterization [[guide:D084086519#eq:SIR_sharp_mixed_sup |eq:SIR_sharp_mixed_sup]] on p.\pageref{eq:SIR_sharp_mixed_sup}.
A more general continuum of inequalities is also possible, as in the case of discrete choice with endogenous explanatory variables, see characterization [[guide:521939d27a#eq:SIR:discrete:choice:endogenous |eq:SIR:discrete:choice:endogenous]] on p.\pageref{eq:SIR:discrete:choice:endogenous}.
A more general continuum of inequalities is also possible, as in the case of discrete choice with endogenous explanatory variables, see characterization [[guide:8d94784544#eq:SIR:discrete:choice:endogenous |eq:SIR:discrete:choice:endogenous]] on p.\pageref{eq:SIR:discrete:choice:endogenous}.
I refer to {{ref|name=and:shi17}} and {{ref|name=ber:mol:mol11}}{{rp|at=Supplementary Appendix B}} for inference methods in the presence of a continuum of conditional moment (in)equalities.</ref>
I refer to {{ref|name=and:shi17}} and {{ref|name=ber:mol:mol11}}{{rp|at=Supplementary Appendix B}} for inference methods in the presence of a continuum of conditional moment (in)equalities.</ref>
Instances where <math>\idr{\theta}</math> is characterized through a finite number of conditional moment (in)equalities and the conditioning variables have finite support can easily be recast as in \eqref{eq:sharp_id_for_inference}.<ref group="Notes" >I refer to {{ref|name=kha:tam09}}, {{ref|name=and:shi13}}, {{ref|name=che:lee:ros13}}, {{ref|name=lee:son:wha13}}, {{ref|name=arm14b}}{{ref|name=arm15}}, {{ref|name=arm:cha16}}, {{ref|name=che:che:kat18}}, and {{ref|name=che18}}, for inference methods in the case that the conditioning variables have a continuous distribution.</ref>
Instances where <math>\idr{\theta}</math> is characterized through a finite number of conditional moment (in)equalities and the conditioning variables have finite support can easily be recast as in \eqref{eq:sharp_id_for_inference}.<ref group="Notes" >I refer to {{ref|name=kha:tam09}}, {{ref|name=and:shi13}}, {{ref|name=che:lee:ros13}}, {{ref|name=lee:son:wha13}}, {{ref|name=arm14b}}{{ref|name=arm15}}, {{ref|name=arm:cha16}}, {{ref|name=che:che:kat18}}, and {{ref|name=che18}}, for inference methods in the case that the conditioning variables have a continuous distribution.</ref>
Consider, for example, the two player entry game model in Identification [[guide:521939d27a#IP:entry_game |Problem]] on p.\pageref{IP:entry_game}, where <math>\ew=(\ey_1,\ey_2,\ex_1,\ex_2)</math>.
Consider, for example, the two player entry game model in Identification [[guide:D084086519#IP:entry_game |Problem]], where <math>\ew=(\ey_1,\ey_2,\ex_1,\ex_2)</math>.
Using (in)equalities [[guide:521939d27a#eq:CT_00 |eq:CT_00]]-[[guide:521939d27a#eq:CT_01L |eq:CT_01L]] and assuming that the distribution of <math>(\ex_1,\ex_2)</math> has <math>\bar{k}</math> points of support, denoted <math>(x_{1,k},x_{2,k}),k=1,\dots,\bar{k}</math>, we have <math>|\cJ|=4\bar{k}</math> and for <math>k=1,\dots,\bar{k}</math>,<ref group="Notes" >In these expressions an index of the form <math>jk</math> not separated by a comma equals the product of <math>j</math> with <math>k</math>.</ref>
Using (in)equalities [[guide:D084086519#eq:CT_00 |eq:CT_00]]-[[guide:D084086519#eq:CT_01L |eq:CT_01L]] and assuming that the distribution of <math>(\ex_1,\ex_2)</math> has <math>\bar{k}</math> points of support, denoted <math>(x_{1,k},x_{2,k}),k=1,\dots,\bar{k}</math>, we have <math>|\cJ|=4\bar{k}</math> and for <math>k=1,\dots,\bar{k}</math>,<ref group="Notes" >In these expressions an index of the form <math>jk</math> not separated by a comma equals the product of <math>j</math> with <math>k</math>.</ref>


<math display="block">
<math display="block">
Line 278: Line 278:
</math>
</math>
i.e., that each point in <math>\idr{\theta}</math> is arbitrarily close to a point in <math>\idrn{\theta}</math>, or more formally that <math>\sP(\idr{\theta}\subseteq\idrn{\theta})\to 1</math>.
i.e., that each point in <math>\idr{\theta}</math> is arbitrarily close to a point in <math>\idrn{\theta}</math>, or more formally that <math>\sP(\idr{\theta}\subseteq\idrn{\theta})\to 1</math>.
To establish this result for the sharp identification regions in Theorem [[guide:Ec36399528#SIR:man:tam02_param |SIR-]] (parametric regression with interval covariate) and Theorem [[guide:521939d27a#SIR:man:tam02_binary |SIR-]] (semiparametric binary model with interval covariate), <ref name="man:tam02"/>{{rp|at=Propositions 3 and 5}} require the rate at which <math>\tau_n\stackrel{p}{\rightarrow} 0</math> to be slower than the rate at which <math>\crit_n</math> converges uniformly to <math>\crit_\sP</math> over <math>\Theta</math>.
To establish this result for the sharp identification regions in Theorem [[guide:Ec36399528#SIR:man:tam02_param |SIR-]] (parametric regression with interval covariate) and Theorem [[guide:8d94784544#SIR:man:tam02_binary |SIR-(semiparametric binary model with interval covariate)]], <ref name="man:tam02"/>{{rp|at=Propositions 3 and 5}} require the rate at which <math>\tau_n\stackrel{p}{\rightarrow} 0</math> to be slower than the rate at which <math>\crit_n</math> converges uniformly to <math>\crit_\sP</math> over <math>\Theta</math>.
What might go wrong in the absence of such a restriction?
What might go wrong in the absence of such a restriction?
A simple example can help understand the issue.
A simple example can help understand the issue.
Line 295: Line 295:
However, with positive probability in any finite sample <math>\crit_n(\vartheta)=0</math> for <math>\vartheta</math> in a random region (e.g., a triangle if <math>\crit_n</math> is the sample analog of \eqref{eq:criterion_fn_max}) that only includes points that are close to a subset of the points in <math>\idr{\theta}</math>.
However, with positive probability in any finite sample <math>\crit_n(\vartheta)=0</math> for <math>\vartheta</math> in a random region (e.g., a triangle if <math>\crit_n</math> is the sample analog of \eqref{eq:criterion_fn_max}) that only includes points that are close to a subset of the points in <math>\idr{\theta}</math>.
Hence, with positive probability the minimizer of <math>\crit_n</math> cycles between consistent estimators of subsets of <math>\idr{\theta}</math>, but does not estimate the entire set.
Hence, with positive probability the minimizer of <math>\crit_n</math> cycles between consistent estimators of subsets of <math>\idr{\theta}</math>, but does not estimate the entire set.
Enlarging the estimator to include all points that are close to minimizing <math>\crit_n</math> up to a tolerance that converges to zero sufficiently slowly removes this problem.\medskip
Enlarging the estimator to include all points that are close to minimizing <math>\crit_n</math> up to a tolerance that converges to zero sufficiently slowly removes this problem.
 
<ref name="che:hon:tam07"/> significantly generalize the consistency results in <ref name="man:tam02"/>.
<ref name="che:hon:tam07"/> significantly generalize the consistency results in <ref name="man:tam02"/>.
They work with a normalized criterion function equal to <math>\crit_n(\vartheta)-\inf_{\tilde\vartheta\in\Theta}\crit_n(\tilde\vartheta)</math>, but to keep notation light I simply refer to it as <math>\crit_n</math>.<ref group="Notes" >Using this normalized criterion function is especially important in light of possible model misspecification, see [[guide:7b0105e1fc#sec:misspec |Section]].</ref>
They work with a normalized criterion function equal to <math>\crit_n(\vartheta)-\inf_{\tilde\vartheta\in\Theta}\crit_n(\tilde\vartheta)</math>, but to keep notation light I simply refer to it as <math>\crit_n</math>.<ref group="Notes" >Using this normalized criterion function is especially important in light of possible model misspecification, see [[guide:7b0105e1fc#sec:misspec |Section]].</ref>
Line 339: Line 340:
On the other hand, with positive probability <math>\crit_n(\vartheta_n)=(\bar{\ew}_3-\vartheta_{1n}\vartheta_{2n})^2=O_p\left(n^{-1}\right)</math>, so that for <math>n</math> large enough <math>\crit_n(\vartheta_n) < [\dist(\vartheta_n,\idr{\theta})]^\gamma</math>, violating the assumption.
On the other hand, with positive probability <math>\crit_n(\vartheta_n)=(\bar{\ew}_3-\vartheta_{1n}\vartheta_{2n})^2=O_p\left(n^{-1}\right)</math>, so that for <math>n</math> large enough <math>\crit_n(\vartheta_n) < [\dist(\vartheta_n,\idr{\theta})]^\gamma</math>, violating the assumption.
This occurs because the gradient of the moment equality vanishes as <math>\vartheta</math> approaches zero, rendering the criterion function flat in a neighborhood of <math>\idr{\theta}</math>.
This occurs because the gradient of the moment equality vanishes as <math>\vartheta</math> approaches zero, rendering the criterion function flat in a neighborhood of <math>\idr{\theta}</math>.
As intuition would suggest, rates of convergence are slower the flatter <math>\crit_n</math> is outside <math>\idr{\theta}</math>.
As intuition would suggest, rates of convergence are slower the flatter <math>\crit_n</math> is outside <math>\idr{\theta}</math>.
<ref name="kai:mol:sto19CQ"><span style="font-variant-caps:small-caps">{Kaido}, H., F.{Molinari},  <span style="font-variant-caps:normal">and</span> J.{Stoye}</span>  (2019b): “{Constraint Qualifications in Partial  Identification}” working paper, available at  [https://arxiv.org/pdf/1908.09103.pdf https://arxiv.org/pdf/1908.09103.pdf].</ref> show that in moment inequality models with smooth moment conditions, the polynomial minorant assumption with <math>\gamma=2</math> implies the Abadie constraint qualification (ACQ); see, e.g., <ref name="baz:she:she06"><span style="font-variant-caps:small-caps">Bazaraa, M.S., H.D. Sherali,  <span style="font-variant-caps:normal">and</span> C.Shetty</span>  (2006):  ''Nonlinear programming: theory and algorithms''. Hoboken, N.J. :  Wiley-Interscience, 3rd edn.</ref>{{rp|at=Chapter 5}} for a definition and discussion of ACQ.
<ref name="kai:mol:sto19CQ"><span style="font-variant-caps:small-caps">{Kaido}, H., F.{Molinari},  <span style="font-variant-caps:normal">and</span> J.{Stoye}</span>  (2019b): “{Constraint Qualifications in Partial  Identification}” working paper, available at  [https://arxiv.org/pdf/1908.09103.pdf https://arxiv.org/pdf/1908.09103.pdf].</ref> show that in moment inequality models with smooth moment conditions, the polynomial minorant assumption with <math>\gamma=2</math> implies the Abadie constraint qualification (ACQ); see, e.g., <ref name="baz:she:she06"><span style="font-variant-caps:small-caps">Bazaraa, M.S., H.D. Sherali,  <span style="font-variant-caps:normal">and</span> C.Shetty</span>  (2006):  ''Nonlinear programming: theory and algorithms''. Hoboken, N.J. :  Wiley-Interscience, 3rd edn.</ref>{{rp|at=Chapter 5}} for a definition and discussion of ACQ.
The example just given to discuss failures of the polynomial minorant condition is in fact a known example where ACQ fails at <math>\vartheta=[00]^\top</math>.
The example just given to discuss failures of the polynomial minorant condition is in fact a known example where ACQ fails at <math>\vartheta=[00]^\top</math>.
<ref name="che:hon:tam07"/>{{rp|at=Condition C.3, referred to as ''degeneracy''}} also consider the case that <math>\crit_n</math> vanishes on subsets of <math>\Theta</math> that converge in Hausdorff distance to <math>\idr{\theta}</math> at rate <math>a_n^{-1/\gamma}</math>.
<ref name="che:hon:tam07"/>{{rp|at=Condition C.3, referred to as ''degeneracy''}} also consider the case that <math>\crit_n</math> vanishes on subsets of <math>\Theta</math> that converge in Hausdorff distance to <math>\idr{\theta}</math> at rate <math>a_n^{-1/\gamma}</math>.
Line 346: Line 349:
<ref name="yil12"><span style="font-variant-caps:small-caps">Yildiz, N.</span>  (2012): “Consistency of plug-in estimators of upper  contour and level sets” ''Econometric Theory'', 28(2), 309--327.</ref> provides conditions on the moment functions, which are closely related to constraint qualifications (as discussed in <ref name="kai:mol:sto19CQ"/>) under which it is possible to set <math>\tau_n=0</math>.
<ref name="yil12"><span style="font-variant-caps:small-caps">Yildiz, N.</span>  (2012): “Consistency of plug-in estimators of upper  contour and level sets” ''Econometric Theory'', 28(2), 309--327.</ref> provides conditions on the moment functions, which are closely related to constraint qualifications (as discussed in <ref name="kai:mol:sto19CQ"/>) under which it is possible to set <math>\tau_n=0</math>.
<ref name="men14"><span style="font-variant-caps:small-caps">Menzel, K.</span>  (2014): “Consistent estimation with many moment  inequalities” ''Journal of Econometrics'', 182(2), 329 -- 350.</ref> studies estimation of <math>\idr{\theta}</math> when the number of moment inequalities is large relative to sample size (possibly infinite).
<ref name="men14"><span style="font-variant-caps:small-caps">Menzel, K.</span>  (2014): “Consistent estimation with many moment  inequalities” ''Journal of Econometrics'', 182(2), 329 -- 350.</ref> studies estimation of <math>\idr{\theta}</math> when the number of moment inequalities is large relative to sample size (possibly infinite).
He provides a consistency result for criterion-based estimators that use a number of unconditional moment inequalities that grows with sample size.
He provides a consistency result for criterion-based estimators that use a number of unconditional moment inequalities that grows with sample size.
He also considers estimators based on conditional moment inequalities, and derives the fastest possible rate for estimating <math>\idr{\theta}</math> under smoothness conditions on the conditional moment functions.  
He also considers estimators based on conditional moment inequalities, and derives the fastest possible rate for estimating <math>\idr{\theta}</math> under smoothness conditions on the conditional moment functions.  
He shows that the rates achieved by the procedures in <ref name="arm14b"><span style="font-variant-caps:small-caps">Armstrong, T.B.</span>  (2014): “Weighted KS statistics for inference on  conditional moment inequalities” ''Journal of Econometrics'', 181(2), 92  -- 116.</ref><ref name="arm15"><span style="font-variant-caps:small-caps">Armstrong, T.B.</span>  (2015): “Asymptotically exact inference in conditional  moment inequality models” ''Journal of Econometrics'', 186(1), 51 -- 65.</ref> are (minimax) optimal, and cannot be improved upon.
He shows that the rates achieved by the procedures in <ref name="arm14b"><span style="font-variant-caps:small-caps">Armstrong, T.B.</span>  (2014): “Weighted KS statistics for inference on  conditional moment inequalities” ''Journal of Econometrics'', 181(2), 92  -- 116.</ref><ref name="arm15"><span style="font-variant-caps:small-caps">Armstrong, T.B.</span>  (2015): “Asymptotically exact inference in conditional  moment inequality models” ''Journal of Econometrics'', 186(1), 51 -- 65.</ref> are (minimax) optimal, and cannot be improved upon.


Line 404: Line 409:
The lower and upper functions defining the band are allowed to be any functions, including ones carrying an index, and can be estimated parametrically or nonparametrically.  
The lower and upper functions defining the band are allowed to be any functions, including ones carrying an index, and can be estimated parametrically or nonparametrically.  
The method allows for estimation of the parameters of the best linear approximations to the set identified functions in many of the identification problems described in [[guide:Ec36399528#sec:prob:distr |Section]].
The method allows for estimation of the parameters of the best linear approximations to the set identified functions in many of the identification problems described in [[guide:Ec36399528#sec:prob:distr |Section]].
It can also be used to estimate the sharp identification region for the parameters of a binary choice model with interval or discrete regressors under the assumptions of <ref name="mag:mau08"><span style="font-variant-caps:small-caps">Magnac, T.,  <span style="font-variant-caps:normal">and</span> E.Maurin</span>  (2008): “Partial Identification  in Monotone Binary Models: Discrete Regressors and Interval Data” ''The  Review of Economic Studies'', 75(3), 835--864.</ref>, characterized in [[guide:521939d27a#eq:SIR:mag:mau |eq:SIR:mag:mau]] in Section [[guide:521939d27a#subsubsec:man:tam02 |Semiparametric Binary Choice Models with Interval Valued Covariates]].  
It can also be used to estimate the sharp identification region for the parameters of a binary choice model with interval or discrete regressors under the assumptions of <ref name="mag:mau08"><span style="font-variant-caps:small-caps">Magnac, T.,  <span style="font-variant-caps:normal">and</span> E.Maurin</span>  (2008): “Partial Identification  in Monotone Binary Models: Discrete Regressors and Interval Data” ''The  Review of Economic Studies'', 75(3), 835--864.</ref>, characterized in [[guide:8d94784544#eq:SIR:mag:mau |eq:SIR:mag:mau]] in Section [[guide:8d94784544#subsubsec:man:tam02 |Semiparametric Binary Choice Models with Interval Valued Covariates]].  
<ref name="kai:san14"><span style="font-variant-caps:small-caps">Kaido, H.,  <span style="font-variant-caps:normal">and</span> A.Santos</span>  (2014): “Asymptotically efficient  estimation of models defined by convex moment inequalities”  ''Econometrica'', 82(1), 387--413.</ref> develop a theory of efficiency for estimators of sets <math>\idr{\theta}</math> as in \eqref{eq:sharp_id_for_inference} under the additional requirements that the inequalities <math>\E_\sP(m_j(\ew,\vartheta))</math> are convex in <math>\vartheta\in\Theta</math> and smooth as functionals of the distribution of the data.
<ref name="kai:san14"><span style="font-variant-caps:small-caps">Kaido, H.,  <span style="font-variant-caps:normal">and</span> A.Santos</span>  (2014): “Asymptotically efficient  estimation of models defined by convex moment inequalities”  ''Econometrica'', 82(1), 387--413.</ref> develop a theory of efficiency for estimators of sets <math>\idr{\theta}</math> as in \eqref{eq:sharp_id_for_inference} under the additional requirements that the inequalities <math>\E_\sP(m_j(\ew,\vartheta))</math> are convex in <math>\vartheta\in\Theta</math> and smooth as functionals of the distribution of the data.
Because of the convexity of the moment inequalities, <math>\idr{\theta}</math> is convex and can be represented through its support function.   
Because of the convexity of the moment inequalities, <math>\idr{\theta}</math> is convex and can be represented through its support function.   
Line 475: Line 480:
It is chosen to that <math>\CS</math> satisfies (asymptotically) a certain coverage property with respect to either <math>\idr{\theta}</math> or each <math>\vartheta\in\idr{\theta}</math>.
It is chosen to that <math>\CS</math> satisfies (asymptotically) a certain coverage property with respect to either <math>\idr{\theta}</math> or each <math>\vartheta\in\idr{\theta}</math>.
Correspondingly, different appearances of <math>c_{1-\alpha}(\vartheta)</math> may refer to different critical values associated with different coverage notions.
Correspondingly, different appearances of <math>c_{1-\alpha}(\vartheta)</math> may refer to different critical values associated with different coverage notions.
The challenging theoretical aspect of inference in partial identification is the determination of <math>c_{1-\alpha}</math> and of methods to approximate it.\medskip
The challenging theoretical aspect of inference in partial identification is the determination of <math>c_{1-\alpha}</math> and of methods to approximate it.
 
A first classification of coverage notions pertains to whether the confidence set should cover <math>\idr{\theta}</math> or each of its elements with a prespecified asymptotic probability.
A first classification of coverage notions pertains to whether the confidence set should cover <math>\idr{\theta}</math> or each of its elements with a prespecified asymptotic probability.
Early on, within the study of interval-identified parameters, <ref name="hor:man98"><span style="font-variant-caps:small-caps">Horowitz, J.L.,  <span style="font-variant-caps:normal">and</span> C.F. Manski</span>  (1998): “Censoring of outcomes and regressors due to  survey nonresponse: Identification and estimation using weights and  imputations” ''Journal of Econometrics'', 84(1), 37 -- 58.</ref><ref name="hor:man00"><span style="font-variant-caps:small-caps">Horowitz, J.L.,  <span style="font-variant-caps:normal">and</span> C.F. Manski</span>  (2000): “Nonparametric Analysis of Randomized Experiments  with Missing Covariate and Outcome Data” ''Journal of the American  Statistical Association'', 95(449), 77--84.</ref> put forward a confidence interval that expands each of the sample analogs of the extreme points of the population bounds by an amount designed so that the confidence interval asymptotically covers the population bounds with prespecified probability.
Early on, within the study of interval-identified parameters, <ref name="hor:man98"><span style="font-variant-caps:small-caps">Horowitz, J.L.,  <span style="font-variant-caps:normal">and</span> C.F. Manski</span>  (1998): “Censoring of outcomes and regressors due to  survey nonresponse: Identification and estimation using weights and  imputations” ''Journal of Econometrics'', 84(1), 37 -- 58.</ref><ref name="hor:man00"><span style="font-variant-caps:small-caps">Horowitz, J.L.,  <span style="font-variant-caps:normal">and</span> C.F. Manski</span>  (2000): “Nonparametric Analysis of Randomized Experiments  with Missing Covariate and Outcome Data” ''Journal of the American  Statistical Association'', 95(449), 77--84.</ref> put forward a confidence interval that expands each of the sample analogs of the extreme points of the population bounds by an amount designed so that the confidence interval asymptotically covers the population bounds with prespecified probability.
Line 497: Line 503:
<ref name="adu:ots16"><span style="font-variant-caps:small-caps">Adusumilli, K.,  <span style="font-variant-caps:normal">and</span> T.Otsu</span>  (2017): “{Empirical Likelihood  for Random Sets}” ''Journal of the American Statistical Association'',  112(519), 1064--1075.</ref> provide empirical likelihood based inference methods for the support function approach.
<ref name="adu:ots16"><span style="font-variant-caps:small-caps">Adusumilli, K.,  <span style="font-variant-caps:normal">and</span> T.Otsu</span>  (2017): “{Empirical Likelihood  for Random Sets}” ''Journal of the American Statistical Association'',  112(519), 1064--1075.</ref> provide empirical likelihood based inference methods for the support function approach.
The test statistics employed in the criterion function approach and in the support function approach are asymptotically equivalent in specific moment inequality models <ref name="ber:mol08"/><ref name="kai16"/>, but the criterion function approach is more broadly applicable.
The test statistics employed in the criterion function approach and in the support function approach are asymptotically equivalent in specific moment inequality models <ref name="ber:mol08"/><ref name="kai16"/>, but the criterion function approach is more broadly applicable.
\medskip
 
 
The field's interest changed to a different notion of coverage when <ref name="imb:man04"><span style="font-variant-caps:small-caps">Imbens, G.W.,  <span style="font-variant-caps:normal">and</span> C.F. Manski</span>  (2004): “Confidence  Intervals for Partially Identified Parameters” ''Econometrica'', 72(6),  1845--1857.</ref> pointed out that often there is one “true” data generating <math>\theta</math>, even if it is only partially identified.
The field's interest changed to a different notion of coverage when <ref name="imb:man04"><span style="font-variant-caps:small-caps">Imbens, G.W.,  <span style="font-variant-caps:normal">and</span> C.F. Manski</span>  (2004): “Confidence  Intervals for Partially Identified Parameters” ''Econometrica'', 72(6),  1845--1857.</ref> pointed out that often there is one “true” data generating <math>\theta</math>, even if it is only partially identified.
Hence, they proposed confidence sets that cover each <math>\vartheta\in\idr{\theta}</math> with a prespecified probability.
Hence, they proposed confidence sets that cover each <math>\vartheta\in\idr{\theta}</math> with a prespecified probability.
Line 567: Line 574:
The extent of the conservatism increases with the dimension of <math>\theta</math> and is easily appreciated in the case of a point identified parameter.
The extent of the conservatism increases with the dimension of <math>\theta</math> and is easily appreciated in the case of a point identified parameter.
Consider, for example, a linear regression in <math>\R^{10}</math>, and suppose for simplicity that the limiting covariance matrix of the estimator is the identity matrix.  
Consider, for example, a linear regression in <math>\R^{10}</math>, and suppose for simplicity that the limiting covariance matrix of the estimator is the identity matrix.  
Then a 95\% confidence interval for <math>u^\top\theta</math> is obtained by adding and subtracting <math>1.96</math> to that component's estimate.  
Then a 95% confidence interval for <math>u^\top\theta</math> is obtained by adding and subtracting <math>1.96</math> to that component's estimate.  
In contrast, projection of a 95\% confidence ellipsoid for <math>\theta</math> on each component amounts to adding and subtracting <math>4.28</math> to that component's estimate.  
In contrast, projection of a 95% confidence ellipsoid for <math>\theta</math> on each component amounts to adding and subtracting <math>4.28</math> to that component's estimate.  
It is therefore desirable to provide confidence intervals <math>\CI</math> specifically designed to cover <math>u^\top\theta</math> rather then the entire <math>\theta</math>.
It is therefore desirable to provide confidence intervals <math>\CI</math> specifically designed to cover <math>u^\top\theta</math> rather then the entire <math>\theta</math>.
Natural counterparts to \eqref{eq:CS_coverage:set}-\eqref{eq:CS_coverage:point} are
Natural counterparts to \eqref{eq:CS_coverage:set}-\eqref{eq:CS_coverage:point} are

Latest revision as of 23:22, 19 June 2024

[math] \newcommand{\edis}{\stackrel{d}{=}} \newcommand{\fd}{\stackrel{f.d.}{\rightarrow}} \newcommand{\dom}{\operatorname{dom}} \newcommand{\eig}{\operatorname{eig}} \newcommand{\epi}{\operatorname{epi}} \newcommand{\lev}{\operatorname{lev}} \newcommand{\card}{\operatorname{card}} \newcommand{\comment}{\textcolor{Green}} \newcommand{\B}{\mathbb{B}} \newcommand{\C}{\mathbb{C}} \newcommand{\G}{\mathbb{G}} \newcommand{\M}{\mathbb{M}} \newcommand{\N}{\mathbb{N}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\T}{\mathbb{T}} \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\W}{\mathbb{W}} \newcommand{\bU}{\mathfrak{U}} \newcommand{\bu}{\mathfrak{u}} \newcommand{\bI}{\mathfrak{I}} \newcommand{\cA}{\mathcal{A}} \newcommand{\cB}{\mathcal{B}} \newcommand{\cC}{\mathcal{C}} \newcommand{\cD}{\mathcal{D}} \newcommand{\cE}{\mathcal{E}} \newcommand{\cF}{\mathcal{F}} \newcommand{\cG}{\mathcal{G}} \newcommand{\cg}{\mathcal{g}} \newcommand{\cH}{\mathcal{H}} \newcommand{\cI}{\mathcal{I}} \newcommand{\cJ}{\mathcal{J}} \newcommand{\cK}{\mathcal{K}} \newcommand{\cL}{\mathcal{L}} \newcommand{\cM}{\mathcal{M}} \newcommand{\cN}{\mathcal{N}} \newcommand{\cO}{\mathcal{O}} \newcommand{\cP}{\mathcal{P}} \newcommand{\cQ}{\mathcal{Q}} \newcommand{\cR}{\mathcal{R}} \newcommand{\cS}{\mathcal{S}} \newcommand{\cT}{\mathcal{T}} \newcommand{\cU}{\mathcal{U}} \newcommand{\cu}{\mathcal{u}} \newcommand{\cV}{\mathcal{V}} \newcommand{\cW}{\mathcal{W}} \newcommand{\cX}{\mathcal{X}} \newcommand{\cY}{\mathcal{Y}} \newcommand{\cZ}{\mathcal{Z}} \newcommand{\sF}{\mathsf{F}} \newcommand{\sM}{\mathsf{M}} \newcommand{\sG}{\mathsf{G}} \newcommand{\sT}{\mathsf{T}} \newcommand{\sB}{\mathsf{B}} \newcommand{\sC}{\mathsf{C}} \newcommand{\sP}{\mathsf{P}} \newcommand{\sQ}{\mathsf{Q}} \newcommand{\sq}{\mathsf{q}} \newcommand{\sR}{\mathsf{R}} \newcommand{\sS}{\mathsf{S}} \newcommand{\sd}{\mathsf{d}} \newcommand{\cp}{\mathsf{p}} \newcommand{\cc}{\mathsf{c}} \newcommand{\cf}{\mathsf{f}} \newcommand{\eU}{{\boldsymbol{U}}} \newcommand{\eb}{{\boldsymbol{b}}} \newcommand{\ed}{{\boldsymbol{d}}} \newcommand{\eu}{{\boldsymbol{u}}} \newcommand{\ew}{{\boldsymbol{w}}} \newcommand{\ep}{{\boldsymbol{p}}} \newcommand{\eX}{{\boldsymbol{X}}} \newcommand{\ex}{{\boldsymbol{x}}} \newcommand{\eY}{{\boldsymbol{Y}}} \newcommand{\eB}{{\boldsymbol{B}}} \newcommand{\eC}{{\boldsymbol{C}}} \newcommand{\eD}{{\boldsymbol{D}}} \newcommand{\eW}{{\boldsymbol{W}}} \newcommand{\eR}{{\boldsymbol{R}}} \newcommand{\eQ}{{\boldsymbol{Q}}} \newcommand{\eS}{{\boldsymbol{S}}} \newcommand{\eT}{{\boldsymbol{T}}} \newcommand{\eA}{{\boldsymbol{A}}} \newcommand{\eH}{{\boldsymbol{H}}} \newcommand{\ea}{{\boldsymbol{a}}} \newcommand{\ey}{{\boldsymbol{y}}} \newcommand{\eZ}{{\boldsymbol{Z}}} \newcommand{\eG}{{\boldsymbol{G}}} \newcommand{\ez}{{\boldsymbol{z}}} \newcommand{\es}{{\boldsymbol{s}}} \newcommand{\et}{{\boldsymbol{t}}} \newcommand{\ev}{{\boldsymbol{v}}} \newcommand{\ee}{{\boldsymbol{e}}} \newcommand{\eq}{{\boldsymbol{q}}} \newcommand{\bnu}{{\boldsymbol{\nu}}} \newcommand{\barX}{\overline{\eX}} \newcommand{\eps}{\varepsilon} \newcommand{\Eps}{\mathcal{E}} \newcommand{\carrier}{{\mathfrak{X}}} \newcommand{\Ball}{{\mathbb{B}}^{d}} \newcommand{\Sphere}{{\mathbb{S}}^{d-1}} \newcommand{\salg}{\mathfrak{F}} \newcommand{\ssalg}{\mathfrak{B}} \newcommand{\one}{\mathbf{1}} \newcommand{\Prob}[1]{\P\{#1\}} \newcommand{\yL}{\ey_{\mathrm{L}}} \newcommand{\yU}{\ey_{\mathrm{U}}} \newcommand{\yLi}{\ey_{\mathrm{L}i}} \newcommand{\yUi}{\ey_{\mathrm{U}i}} \newcommand{\xL}{\ex_{\mathrm{L}}} \newcommand{\xU}{\ex_{\mathrm{U}}} \newcommand{\vL}{\ev_{\mathrm{L}}} \newcommand{\vU}{\ev_{\mathrm{U}}} \newcommand{\dist}{\mathbf{d}} \newcommand{\rhoH}{\dist_{\mathrm{H}}} \newcommand{\ti}{\to\infty} \newcommand{\comp}[1]{#1^\mathrm{c}} \newcommand{\ThetaI}{\Theta_{\mathrm{I}}} \newcommand{\crit}{q} \newcommand{\CS}{CS_n} \newcommand{\CI}{CI_n} \newcommand{\cv}[1]{\hat{c}_{n,1-\alpha}(#1)} \newcommand{\idr}[1]{\mathcal{H}_\sP[#1]} \newcommand{\outr}[1]{\mathcal{O}_\sP[#1]} \newcommand{\idrn}[1]{\hat{\mathcal{H}}_{\sP_n}[#1]} \newcommand{\outrn}[1]{\mathcal{O}_{\sP_n}[#1]} \newcommand{\email}[1]{\texttt{#1}} \newcommand{\possessivecite}[1]{\ltref name="#1"\gt\lt/ref\gt's \citeyear{#1}} \newcommand\xqed[1]{% \leavevmode\unskip\penalty9999 \hbox{}\nobreak\hfill \quad\hbox{#1}} \newcommand\qedex{\xqed{$\triangle$}} \newcommand\independent{\protect\mathpalette{\protect\independenT}{\perp}} \DeclareMathOperator{\Int}{Int} \DeclareMathOperator{\conv}{conv} \DeclareMathOperator{\cov}{Cov} \DeclareMathOperator{\var}{Var} \DeclareMathOperator{\Sel}{Sel} \DeclareMathOperator{\Bel}{Bel} \DeclareMathOperator{\cl}{cl} \DeclareMathOperator{\sgn}{sgn} \DeclareMathOperator{\essinf}{essinf} \DeclareMathOperator{\esssup}{esssup} \newcommand{\mathds}{\mathbb} \renewcommand{\P}{\mathbb{P}} [/math]

Framework and Scope of the Discussion

The identification analysis carried out in Section- Section presumes knowledge of the joint distribution [math]\sP[/math] of the observable variables. That is, it presumes that [math]\sP[/math] can be learned with certainty from observation of the entire population. In practice, one observes a sample of size [math]n[/math] drawn from [math]\sP[/math]. For simplicity I assume it to be a random sample.[Notes 1] Statistical inference on [math]\idr{\theta}[/math] needs to be conducted using knowledge of [math]\sP_n[/math], the empirical distribution of the observable outcomes and covariates. Because [math]\idr{\theta}[/math] is not a singleton, this task is particularly delicate. To start, care is required to choose a proper notion of consistency for a set estimator [math]\idrn{\theta}[/math] and to obtain palatable conditions under which such consistency attains. Next, the asymptotic behavior of statistics designed to test hypothesis or build confidence sets for [math]\idr{\theta}[/math] or for [math]\vartheta\in\idr{\theta}[/math] might change with [math]\vartheta[/math], creating technical challenges for the construction of confidence sets that are not encountered when [math]\theta[/math] is point identified. Many of the sharp identification regions derived in Section- Section can be written as collections of vectors [math]\vartheta\in\Theta[/math] that satisfy conditional or unconditional moment (in)equalities. For simplicity, I assume that [math]\Theta[/math] is a compact and convex subset of [math]\R^d[/math], and I use the formalization for the case of a finite number of unconditional moment (in)equalities:

[[math]] \begin{align} \idr{\theta}=\{\vartheta\in\Theta: \E_\sP(m_j(\ew_i;\vartheta))&\le 0\forall j\in\cJ_1, \E_\sP(m_j(\ew_i;\vartheta))=0\forall j\in\cJ_2\}.\label{eq:sharp_id_for_inference} \end{align} [[/math]]

In \eqref{eq:sharp_id_for_inference}, [math]\ew_i\in\cW\subseteq\R^{d_\cW}[/math] is a random vector collecting all observable variables, with [math]\ew\sim\sP[/math]; [math]m_j:\cW\times\Theta\to\R[/math], [math]j\in\cJ\equiv\cJ_1\cup\cJ_2[/math], are known measurable functions characterizing the model; and [math]\cJ[/math] is a finite set equal to [math]\{1,\dots,|\cJ|\}[/math].[Notes 2] Instances where [math]\idr{\theta}[/math] is characterized through a finite number of conditional moment (in)equalities and the conditioning variables have finite support can easily be recast as in \eqref{eq:sharp_id_for_inference}.[Notes 3] Consider, for example, the two player entry game model in Identification Problem, where [math]\ew=(\ey_1,\ey_2,\ex_1,\ex_2)[/math]. Using (in)equalities eq:CT_00-eq:CT_01L and assuming that the distribution of [math](\ex_1,\ex_2)[/math] has [math]\bar{k}[/math] points of support, denoted [math](x_{1,k},x_{2,k}),k=1,\dots,\bar{k}[/math], we have [math]|\cJ|=4\bar{k}[/math] and for [math]k=1,\dots,\bar{k}[/math],[Notes 4]

[[math]] \begin{align*} m_{4k-3}(\ew_i;\vartheta)&=[\one((\ey_1,\ey_2)=(0,0))-\Phi((-\infty,-\ex_1b_1),(-\infty,-\ex_2b_2);r)]\one((\ex_1,\ex_2)=(x_{1,k},x_{2,k}))\\ m_{4k-2}(\ew_i;\vartheta)&=[\one((\ey_1,\ey_2)=(1,1))-\Phi([-\ex_1b_1-d_1,\infty),[-\ex_2b_2-d_2,\infty);r)]\one((\ex_1,\ex_2)=(x_{1,k},x_{2,k}))\\ m_{4k-1}(\ew_i;\vartheta)&=[\one((\ey_1,\ey_2)=(0,1))-\Phi((-\infty,-\ex_1b_1-d_1),(-\ex_2b_2,\infty);r)]\one((\ex_1,\ex_2)=(x_{1,k},x_{2,k}))\\ m_{4k}(\ew_i;\vartheta)&=\Big[\one((\ey_1,\ey_2)=(0,1))-\Big\{\Phi((-\infty,-\ex_1b_1-d_1),(-\ex_2b_2,\infty);r)\notag\\ &\quad\quad-\Phi((-\ex_1b_1,-\ex_1b_1-d_1),(-\ex_2b_2,-\ex_2b_2-d_2);r)\Big\}\Big]\one((\ex_1,\ex_2)=(x_{1,k},x_{2,k})). \end{align*} [[/math]]


In point identified moment equality models it has been common to conduct estimation and inference using a criterion function that aggregates moment violations [1]. [2] adapt this idea to the partially identified case, through a criterion function [math]\crit_\sP:\Theta\to\R_+[/math] such that [math]\crit_\sP(\vartheta)=0[/math] if and only if [math]\vartheta\in\idr{\theta}[/math]. Many criterion functions can be used (see, e.g. [2][3][4][5][6][7][8][9][10]). Some simple and commonly employed ones include

[[math]] \begin{align} \crit_{\sP,\mathrm{sum}}(\vartheta) &= \sum_{j\in\cJ_1}\left[\frac{\E_\sP(m_j(\ew_i;\vartheta))}{\sigma_{\sP,j}(\vartheta)}\right]_+^2 + \sum_{j\in\cJ_2}\left[\frac{\E_\sP(m_j(\ew_i;\vartheta))}{\sigma_{\sP,j}(\vartheta)}\right]^2,\label{eq:criterion_fn_sum}\\ \crit_{\sP,\mathrm{max}}(\vartheta) &= \max\left\{\max_{j\in\cJ_1}\left[\frac{\E_\sP(m_j(\ew_i;\vartheta))}{\sigma_{\sP,j}(\vartheta)}\right]_+,\max_{j\in\cJ_2}\left|\frac{\E_\sP(m_j(\ew_i;\vartheta))}{\sigma_{\sP,j}(\vartheta)}\right|\right\}^2,\label{eq:criterion_fn_max} \end{align} [[/math]]

where [math][x]_+=\max\{x,0\}[/math] and [math]\sigma_{\sP,j}(\vartheta)[/math] is the population standard deviation of [math]m_j(\ew_i;\vartheta)[/math]. In \eqref{eq:criterion_fn_sum}-\eqref{eq:criterion_fn_max} the moment functions are standardized, as doing so is important for statistical power (see, e.g., [8](p. 127)). To simplify notation, I omit the label and simply use [math]\crit_\sP(\vartheta)[/math]. Given the criterion function, one can rewrite \eqref{eq:sharp_id_for_inference} as

[[math]] \begin{align} \label{eq:define:idr} \idr{\theta}=\{\vartheta\in\Theta:\crit_\sP(\vartheta)=0\}.\end{align} [[/math]]


To keep this chapter to a manageable length, I focus my discussion of statistical inference exclusively on consistent estimation and on different notions of coverage that a confidence set may be required to satisfy and that have proven useful in the literature.[Notes 5] The topics of test of hypotheses and construction of confidence sets in partially identified models are covered in [11], who provide a comprehensive survey devoted entirely to them in the context of moment inequality models. [12](Chapters 4 and 5) provide a thorough discussion of related methods based on the use of random set theory.

Consistent Estimation

When the identified object is a set, it is natural that its estimator is also a set. In order to discuss statistical properties of a set-valued estimator [math]\idrn{\theta}[/math] (to be defined below), and in particular its consistency, one needs to specify how to measure the distance between [math]\idrn{\theta}[/math] and [math]\idr{\theta}[/math]. Several distance measures among sets exist (see, e.g., [13](Appendix D)). A natural generalization of the commonly used Euclidean distance is the Hausdorff distance, see Definition, which for given [math]A,B\subset\R^d[/math] can be written as

[[math]] \begin{align*} \dist_H(A,B) = \inf\Big\{r \gt 0:\; A\subseteq B^r,\; B\subseteq A^r\Big\}=\max\left\{\sup_{a \in A} \dist(a,B), \sup_{b \in B} \dist(b,A) \right\},\end{align*} [[/math]]

with [math]\dist(a,B)\equiv\inf_{b\in B}\Vert a-b\Vert[/math].[Notes 6] In words, the Hausdorff distance between two sets measures the furthest distance from an arbitrary point in one of the sets to its closest neighbor in the other set. It is easy to verify that [math]\dist_H[/math] metrizes the family of non-empty compact sets; in particular, given non-empty compact sets [math]A,B\subset\R^d[/math], [math]\dist_H(A,B) =0[/math] if and only if [math]A=B[/math]. If either [math]A[/math] or [math]B[/math] is empty, [math]\dist_H(A,B) =\infty[/math]. The use of the Hausdorff distance to conceptualize consistency of set valued estimators in econometrics was proposed by [14](Section 2.4) and [2](Section 3.2).[Notes 7]

Definition (Hausdorff Consistency)

An estimator [math]\idrn{\theta}[/math] is consistent for [math]\idr{\theta}[/math] if

[[math]] \begin{align*} \dist_H(\idrn{\theta},\idr{\theta}) \stackrel{p}{\rightarrow} 0 \text{as } n\to \infty. \end{align*} [[/math]]

[15] establishes Hausdorff consistency of a plug-in estimator of the set [math]\{\vartheta\in\Theta:g_\sP(\vartheta)\le 0\}[/math], with [math]g_\sP:\cW\times\Theta \to \R[/math] a lower semicontinuous function of [math]\vartheta\in\Theta[/math] that can be consistently estimated by a lower semicontinuous function [math]g_n[/math] uniformly over [math]\Theta[/math]. The set estimator is [math]\{\vartheta\in\Theta:g_n(\vartheta)\le 0\}[/math]. The fundamental assumption in [15] is that [math]\{\vartheta\in\Theta:g_\sP(\vartheta)\le 0\}\subseteq\cl(\{\vartheta\in\Theta:g_\sP(\vartheta) \lt 0\})[/math], see [12](Section 5.2) for a discussion. There are important applications where this condition holds. [16] provide results related to [15], as well as important extensions for the construction of confidence sets, and show that these can be applied to carry out statistical inference on the Hansen–Jagannathan sets of admissible stochastic discount factors [17], the Markowitz–Fama mean–variance sets for asset portfolio returns [18], and the set of structural elasticities in [19]'s analysis of demand with optimization frictions. However, these methods are not broadly applicable in the general moment (in)equalities framework of this section, as [15]'s key condition generally fails for the set [math]\idr{\theta}[/math] in \eqref{eq:define:idr}.\medskip

Criterion Function Based Estimators

[2] extend the standard theory of extremum estimation of point identified parameters to partial identification, and propose to estimate [math]\idr{\theta}[/math] using the collection of values [math]\vartheta\in\Theta[/math] that approximately minimize a sample analog of [math]\crit_\sP[/math]:

[[math]] \begin{align} \idrn{\theta}=\left\{\vartheta\in\Theta:\crit_n(\vartheta)\le \inf_{\tilde\vartheta\in\Theta}\crit_n(\tilde\vartheta)+\tau_n\right\},\label{eq:define:idrn} \end{align} [[/math]]

with [math]\tau_n[/math] a sequence of non-negative random variables such that [math]\tau_n\stackrel{p}{\rightarrow} 0[/math]. In \eqref{eq:define:idrn}, [math]\crit_n(\vartheta)[/math] is a sample analog of [math]\crit_\sP(\vartheta)[/math] that replaces [math]\E_\sP(m_j(\ew_i;\vartheta))[/math] and [math]\sigma_{\sP,j}(\vartheta)[/math] in \eqref{eq:criterion_fn_sum}-\eqref{eq:criterion_fn_max} with properly chosen estimators, e.g.,

[[math]] \begin{align*} \bar m_{n,j}(\vartheta) &\equiv {\frac{1}{n}\sum_{i=1}^n m_j(\ew_i,\vartheta)},j=1,\dots, |\cJ| \\ \hat{\sigma}_{n,j}(\vartheta) &\equiv {\left(\frac{1}{n}\sum_{i=1}^n [m_j(\ew_i,\vartheta)]^2-[\bar m_{n,j}(\vartheta)]^2\right)^{1/2}},j=1,\dots, |\cJ|. \end{align*} [[/math]]


It can be shown that as long as [math]\tau_n=o_p(1)[/math], under the same assumptions used to prove consistency of extremum estimators of point identified parameters (e.g., with uniform convergence of [math]\crit_n[/math] to [math]\crit_\sP[/math] and continuity of [math]\crit_\sP[/math] on [math]\Theta[/math]),

[[math]] \begin{align} \sup_{\vartheta \in \idrn{\theta}} \inf_{\tilde\vartheta \in \idr{\theta}} \Vert \vartheta-\tilde\vartheta \Vert\stackrel{p}{\rightarrow} 0\text{as } n\to \infty.\label{eq:inner_consistent} \end{align} [[/math]]

This yields that asymptotically each point in [math]\idrn{\theta}[/math] is arbitrarily close to a point in [math]\idr{\theta}[/math], or more formally that [math]\sP(\idrn{\theta}\subseteq\idr{\theta})\to 1[/math]. I refer to \eqref{eq:inner_consistent} as inner consistency henceforth.[Notes 8] [20] provides an early contribution establishing this type of inner consistency for maximum likelihood estimators when the true parameter is not point identified. However, Hausdorff consistency requires also that

[[math]] \begin{align*} \sup_{\vartheta \in \idr{\theta}} \inf_{\tilde\vartheta \in \idrn{\theta}} \Vert \vartheta-\tilde\vartheta \Vert\stackrel{p}{\rightarrow} 0\text{as } n\to \infty, \end{align*} [[/math]]

i.e., that each point in [math]\idr{\theta}[/math] is arbitrarily close to a point in [math]\idrn{\theta}[/math], or more formally that [math]\sP(\idr{\theta}\subseteq\idrn{\theta})\to 1[/math]. To establish this result for the sharp identification regions in Theorem SIR- (parametric regression with interval covariate) and Theorem SIR-(semiparametric binary model with interval covariate), [2](Propositions 3 and 5) require the rate at which [math]\tau_n\stackrel{p}{\rightarrow} 0[/math] to be slower than the rate at which [math]\crit_n[/math] converges uniformly to [math]\crit_\sP[/math] over [math]\Theta[/math]. What might go wrong in the absence of such a restriction? A simple example can help understand the issue. Consider a model with linear inequalities of the form

[[math]] \begin{align*} \theta_1 &\le \E_\sP(\ew_1),\\ -\theta_1 &\le \E_\sP(\ew_2),\\ \theta_2 &\le \E_\sP(\ew_3)+ \E_\sP(\ew_4)\theta_1,\\ -\theta_2 &\le \E_\sP(\ew_5)+ \E_\sP(\ew_6)\theta_1. \end{align*} [[/math]]

Suppose [math]\ew\equiv(\ew_1,\dots,\ew_6)[/math] is distributed multivariate normal, with [math]\E_\sP(\ew)=[6020{-2}0]^\top[/math] and [math]\cov_\sP(\ew)[/math] equal to the identity matrix. Then [math]\idr{\theta}=\{\vartheta=[\vartheta_1\vartheta_2]^\top\in\Theta:\vartheta_1\in[0,6]\text{and}\vartheta_2=2\}[/math]. However, with positive probability in any finite sample [math]\crit_n(\vartheta)=0[/math] for [math]\vartheta[/math] in a random region (e.g., a triangle if [math]\crit_n[/math] is the sample analog of \eqref{eq:criterion_fn_max}) that only includes points that are close to a subset of the points in [math]\idr{\theta}[/math]. Hence, with positive probability the minimizer of [math]\crit_n[/math] cycles between consistent estimators of subsets of [math]\idr{\theta}[/math], but does not estimate the entire set. Enlarging the estimator to include all points that are close to minimizing [math]\crit_n[/math] up to a tolerance that converges to zero sufficiently slowly removes this problem.

[3] significantly generalize the consistency results in [2]. They work with a normalized criterion function equal to [math]\crit_n(\vartheta)-\inf_{\tilde\vartheta\in\Theta}\crit_n(\tilde\vartheta)[/math], but to keep notation light I simply refer to it as [math]\crit_n[/math].[Notes 9] Under suitable regularity conditions, they establish consistency of an estimator that can be a smaller set than the one proposed by [2], and derive its convergence rate. Some of the key conditions required by [3](Conditions C1 and C2) to study convergence rates include that [math]\crit_n[/math] is lower semicontinuous in [math]\vartheta[/math], satisfies various convergence properties among which [math]\sup_{\vartheta\in\idr{\theta}}\crit_n=O_p(1/a_n)[/math] for a sequence of normalizing constants [math]a_n\to\infty[/math], that [math]\tau_n\ge \sup_{\vartheta\in\idr{\theta}}\crit_n(\vartheta)[/math] with probability approaching one, and that [math]\tau_n\to 0[/math]. They also require that there exist positive constants [math](\delta,\kappa,\gamma)[/math] such that for any [math]\epsilon\in(0,1)[/math] there are [math](d_\epsilon,n_\epsilon)[/math] such that

[[math]] \begin{align*} \forall n\ge n_\epsilon, \, \crit_n(\vartheta)\ge\kappa[\min\{\delta,\dist(\vartheta,\idr{\theta})\}]^\gamma \end{align*} [[/math]]

uniformly on [math]\{\vartheta\in\Theta:\dist(\vartheta,\idr{\theta})\ge(d_\epsilon/a_n)^{1/\gamma}\}[/math] with probability at least [math]1-\epsilon[/math]. In words, the assumption, referred to as polynomial minorant condition, rules out that [math]\crit_n[/math] can be arbitrarily close to zero outside [math]\idr{\theta}[/math]. It posits that [math]\crit_n[/math] changes as at least a polynomial of degree [math]\gamma[/math] in the distance of [math]\vartheta[/math] from [math]\idr{\theta}[/math]. Under some additional regularity conditions, [3] establish that

[[math]] \begin{align} \dist_H(\idrn{\theta},\idr{\theta})=O_p(\max\{1/a_n,\tau_n\})^{1/\gamma}.\label{eq:CHT_rate} \end{align} [[/math]]


What is the role played by the polynomial minorant condition for the result in \eqref{eq:CHT_rate}? Under the maintained assumptions [math]\tau_n\ge \sup_{\vartheta\in\idr{\theta}}\crit_n(\vartheta)\ge\kappa[\min\{\delta,\dist(\vartheta,\idr{\theta})\}]^\gamma[/math], and the latter part of the inequality is used to obtain \eqref{eq:CHT_rate}. When could the polynomial minorant condition be violated? In moment (in)equalities models, [3] require [math]\gamma=2[/math].[Notes 10] Consider a simple stylized example with (in)equalities of the form

[[math]] \begin{align*} -\theta_1 &\le \E_\sP(\ew_1),\\ -\theta_2 &\le \E_\sP(\ew_2),\\ \theta_1\theta_2 &= \E_\sP(\ew_3), \end{align*} [[/math]]

with [math]\E_\sP(\ew_1)=\E_\sP(\ew_2)=\E_\sP(\ew_3)=0[/math], and note that the sample means [math](\bar{\ew}_1,\bar{\ew}_2,\bar{\ew}_3)[/math] are [math]\sqrt{n}[/math]-consistent estimators of [math](\E_\sP(\ew_1),\E_\sP(\ew_2),\E_\sP(\ew_3))[/math]. Suppose [math](\ew_1,\ew_2,\ew_3)[/math] are distributed multivariate standard normal. Consider a sequence [math]\vartheta_n=[\vartheta_{1n}\vartheta_{2n}]^\top=[n^{-1/4}n^{-1/4}]^\top[/math]. Then [math][\dist(\vartheta_n,\idr{\theta})]^\gamma=O_p(n^{-1/2})[/math]. On the other hand, with positive probability [math]\crit_n(\vartheta_n)=(\bar{\ew}_3-\vartheta_{1n}\vartheta_{2n})^2=O_p\left(n^{-1}\right)[/math], so that for [math]n[/math] large enough [math]\crit_n(\vartheta_n) \lt [\dist(\vartheta_n,\idr{\theta})]^\gamma[/math], violating the assumption. This occurs because the gradient of the moment equality vanishes as [math]\vartheta[/math] approaches zero, rendering the criterion function flat in a neighborhood of [math]\idr{\theta}[/math].

As intuition would suggest, rates of convergence are slower the flatter [math]\crit_n[/math] is outside [math]\idr{\theta}[/math]. [21] show that in moment inequality models with smooth moment conditions, the polynomial minorant assumption with [math]\gamma=2[/math] implies the Abadie constraint qualification (ACQ); see, e.g., [22](Chapter 5) for a definition and discussion of ACQ.

The example just given to discuss failures of the polynomial minorant condition is in fact a known example where ACQ fails at [math]\vartheta=[00]^\top[/math]. [3](Condition C.3, referred to as degeneracy) also consider the case that [math]\crit_n[/math] vanishes on subsets of [math]\Theta[/math] that converge in Hausdorff distance to [math]\idr{\theta}[/math] at rate [math]a_n^{-1/\gamma}[/math]. While degeneracy might be difficult to verify in practice, [3] show that if it holds, [math]\tau_n[/math] can be set to zero. [23] provides conditions on the moment functions, which are closely related to constraint qualifications (as discussed in [21]) under which it is possible to set [math]\tau_n=0[/math]. [24] studies estimation of [math]\idr{\theta}[/math] when the number of moment inequalities is large relative to sample size (possibly infinite).

He provides a consistency result for criterion-based estimators that use a number of unconditional moment inequalities that grows with sample size. He also considers estimators based on conditional moment inequalities, and derives the fastest possible rate for estimating [math]\idr{\theta}[/math] under smoothness conditions on the conditional moment functions.

He shows that the rates achieved by the procedures in [25][26] are (minimax) optimal, and cannot be improved upon.

Key Insight: [2] extend the notion of extremum estimation from point identified to partially identified models. They do so by putting forward a generalized criterion function whose zero-level set can be used to define [math]\idr{\theta}[/math] in partially identified structural semiparametric models. It is then natural to define the set valued estimator [math]\idrn{\theta}[/math] as the collection of approximate minimizers of the sample analog of this criterion function. [2]'s analysis of statistical inference focuses exclusively on providing consistent estimators. [3] substantially generalize the analysis of consistency of criterion function-based set estimators. They provide a comprehensive study of convergence rates in partially identified models. Their work highlights the challenges a researcher faces in this context, and puts forward possible solutions in the form of assumptions under which specific rates of convergence attain.

Support Function Based Estimators

[27] introduce to the econometrics literature inference methods for set valued estimators based on random set theory. They study the class of models where [math]\idr{\theta}[/math] is convex and can be written as the Aumann (or selection) expectation of a properly defined random closed set.[Notes 11] They propose to carry out estimation and inference leveraging the representation of convex sets through their support function (given in Definition), as it is done in random set theory; see [13](Chapter 3) and [12](Chapter 4). Because the support function fully characterizes the boundary of [math]\idr{\theta}[/math], it allows for a simple sample analog estimator, and for inference procedures with desirable properties. An example of a framework where the approach of [27] can be applied is that of best linear prediction with interval outcome data in Identification Problem.[Notes 12] Recall that in that case, the researcher observes random variables [math](\yL,\yU,\ex)[/math] and wishes to learn the best linear predictor of [math]\ey|\ex[/math], with [math]\ey[/math] unobserved and [math]\sR(\yL\le\ey\le\yU)=1[/math]. For simplicity let [math]\ex[/math] be a scalar. Given a random sample [math]\{\yLi,\yUi,\ex_i\}_{i=1}^n[/math] from [math]\sP[/math], the researcher can construct a random segment [math]\eG_i[/math] for each [math]i[/math] and a consistent estimator [math]\hat{\Sigma}_n[/math] of the random matrix [math]\Sigma_\sP[/math] in eq:G_and_Sigma as

[[math]] \begin{align*} \eG_i=\left\{ \begin{pmatrix} \ey_i\\ \ey_i\ex_i \end{pmatrix}  :\; \ey_i \in \Sel(\eY_i)\right\}\subset\R^2, \text{and} \hat\Sigma_n= \begin{pmatrix} 1 & \overline\ex\\ \overline\ex & \overline{\ex^2} \end{pmatrix},\end{align*} [[/math]]

where [math]\eY_i=[\yLi,\yUi][/math] and [math]\overline\ex,\overline{\ex^2}[/math] are the sample means of [math]\ex_i[/math] and [math]\ex^2_i[/math] respectively. Because in this problem [math]\idr{\theta}=\Sigma_\sP^{-1}\E_\sP\eG[/math] (see Theorem SIR- on p.\pageref{SIR:BLP_intervalY}), a natural sample analog estimator replaces [math]\Sigma_\sP[/math] with [math]\hat{\Sigma}_n[/math], and [math]\E_\sP\eG[/math] with a Minkowski average of [math]\eG_i[/math] (see Appendix, p.\pageref{def:mink:sum} for a formal definition), yielding

[[math]] \begin{align} \idrn{\theta}=\hat\Sigma_n^{-1}\frac{1}{n}\sum_{i=1}^n\eG_i.\label{eq:BLP_estimator} \end{align} [[/math]]

The support function of [math]\idrn{\theta}[/math] is the sample analog of that of [math]\idr{\theta}[/math] provided in eq:supfun:BLP:

[[math]] \begin{align*} h_{\idrn{\theta}}(u)=\frac{1}{n}\sum_{i=1}^n[(\yLi\one(f(\ex_i,u) \lt 0)+\yUi\one(f(\ex_i,u)\ge 0))f(\ex_i,u)],u\in\mathbb{S}, \end{align*} [[/math]]

where [math]f(\ex_i,u)=[1\ex_i]\hat\Sigma_n^{-1}u[/math]. [27] use the Law of Large Numbers for random sets reported in Theorem to show that [math]\idrn{\theta}[/math] in \eqref{eq:BLP_estimator} is [math]\sqrt{n}[/math]-consistent under standard conditions on the moments of [math](\yLi,\yUi,\ex_i)[/math]. [28] and [29] significantly expand the applicability of [27] estimator. [28] show that it can be used in a large class of partially identified linear models, including ones that allow for the availability of instrumental variables. [29] show that it can be used for best linear approximation of any function [math]f(x)[/math] that is known to lie within two identified bounding functions. The lower and upper functions defining the band are allowed to be any functions, including ones carrying an index, and can be estimated parametrically or nonparametrically. The method allows for estimation of the parameters of the best linear approximations to the set identified functions in many of the identification problems described in Section. It can also be used to estimate the sharp identification region for the parameters of a binary choice model with interval or discrete regressors under the assumptions of [30], characterized in eq:SIR:mag:mau in Section Semiparametric Binary Choice Models with Interval Valued Covariates. [31] develop a theory of efficiency for estimators of sets [math]\idr{\theta}[/math] as in \eqref{eq:sharp_id_for_inference} under the additional requirements that the inequalities [math]\E_\sP(m_j(\ew,\vartheta))[/math] are convex in [math]\vartheta\in\Theta[/math] and smooth as functionals of the distribution of the data. Because of the convexity of the moment inequalities, [math]\idr{\theta}[/math] is convex and can be represented through its support function. Using the classic results in [32], [31] show that under suitable regularity conditions, the support function admits for [math]\sqrt{n}[/math]-consistent regular estimation. They also show that a simple plug-in estimator based on the support function attains the semiparametric efficiency bound, and the corresponding estimator of [math]\idr{\theta}[/math] minimizes a wide class of asymptotic loss functions based on the Hausdorff distance. As they establish, this efficiency result applies to the estimators proposed by [27], including that in \eqref{eq:BLP_estimator}, and by [28].

[33] further enlarges the applicability of the support function approach by establishing its duality with the criterion function approach, for the case that [math]\crit_\sP[/math] is a convex function and [math]\crit_n[/math] is a convex function almost surely. This allows one to use the support function approach also when a representation of [math]\idr{\theta}[/math] as the Aumann expectation of a random closed set is not readily available.

[33] considers [math]\idr{\theta}[/math] and its level set estimator [math]\idrn{\theta}[/math] as defined, respectively, in \eqref{eq:define:idr} and \eqref{eq:define:idrn}, with [math]\Theta[/math] a convex subset of [math]\R^d[/math]. Because [math]\crit_\sP[/math] and [math]\crit_n[/math] are convex functions, [math]\idr{\theta}[/math] and [math]\idrn{\theta}[/math] are convex sets. Under the same assumptions as in [3], including the polynomial minorant and the degeneracy conditions, one can set [math]\tau_n=0[/math] and have [math]\dist_H(\idrn{\theta},\idr{\theta})=O_p(a_n^{-1/\gamma})[/math]. Moreover, due to its convexity, [math]\idr{\theta}[/math] is fully characterized by its support function, which in turn can be consistently estimated (at the same rate as [math]\idr{\theta}[/math]) using sample analogs as [math]h_{\idrn{\theta}}(u)=\max_{a_n\crit_n(\vartheta)\le 0}u^\top\vartheta[/math]. The latter can be computed via convex programming.

[34] consider consistent estimation of [math]\idr{\theta}[/math] in the context of Bayesian inference. They focus on partially identified models where [math]\idr{\theta}[/math] depends on a “reduced form” parameter [math]\phi[/math] (e.g., a vector of moments of observable random variables). They recognize that while a prior on [math]\phi[/math] can be revised in light of the data, a prior on [math]\theta[/math] cannot, due to the lack of point identification. As such they propose to choose a single prior for the revisable parameters, and a set of priors for the unrevisable ones. The latter is the collection of priors such that the distribution of [math]\theta|\phi[/math] places probability one on [math]\idr{\theta}[/math]. A crucial observation in [34] is that once [math]\phi[/math] is viewed as a random vector, as in the Bayesian paradigm, under mild regularity conditions [math]\idr{\theta}[/math] is a random closed set, and Bayesian inference on it can be carried out using elements of random set theory. In particular, they show that the set of posterior means of [math]\theta|\ew[/math] equals the Aumann expectation of [math]\idr{\theta}[/math] (with the underlying probability measure of [math]\phi|\ew[/math]). They also show that this Aumann expectation converges in Hausdorff distance to the “true” identified set if the latter is convex, or otherwise to its convex hull. They apply their method to analyze impulse-response in set-identified Structural Vector Autoregressions, where standard Bayesian inference is otherwise sensitive to the choice of an unrevisable prior.[Notes 13]

Key Insight: [27] show that elements of random set theory can be employed to obtain inference methods for partially identified models that are easy to implement and have desirable statistical properties. Whereas they apply their findings to a specific class of models based on the Aumann expectation, the ensuing literature demonstrates that random set methods are widely applicable to obtain estimators of sharp identification regions and establish their consistency.

[35] propose an alternative to the notion of consistent estimator. Rather than asking that [math]\idrn{\theta}[/math] satisfies the requirement in Definition, they propose the notion of ’'half-median-unbiased estimator. This notion is easiest to explain in the case of interval identified scalar parameters. Take, e.g., the bound in Theorem SIR- for the conditional expectation of selectively observed data. Then an estimator of that interval is half-median-unbiased if the estimated upper bound exceeds the true upper bound, and the estimated lower bound falls below the true lower bound, each with probability at least [math]1/2[/math] asymptotically. More generally, one can obtain a half-median-unbiased estimator as

[[math]] \begin{align} \idrn{\theta}=\left\{\vartheta\in\Theta:a_n\crit_n(\vartheta)\le c_{1/2}(\vartheta)\right\},\label{eq:idrn:half:med:unb} \end{align} [[/math]]

where [math]c_{1/2}(\vartheta)[/math] is a critical value chosen so that [math]\idrn{\theta}[/math] asymptotically contains [math]\idr{\theta}[/math] (or any fixed element in [math]\idr{\theta}[/math]; see the discussion in Section Coverage of $\idr{\theta}$ vs. Coverage of $\theta$ below) with at least probability [math]1/2[/math]. As discussed in the next section, [math]c_{1/2}(\vartheta)[/math] can be further chosen so that this probability is uniform over [math]\sP\in\cP[/math].

The requirement of half-median unbiasedness has the virtue that, by construction, an estimator such as \eqref{eq:idrn:half:med:unb} is a subset of a [math]1-\alpha[/math] confidence set as defined in \eqref{eq:CS} below for any [math]\alpha \lt 1/2[/math], provided [math]c_{1-\alpha}(\vartheta)[/math] is chosen using the same criterion for all [math]\alpha\in(0,1)[/math]. In contrast, a consistent estimator satisfying the requirement in Definition needs not be a subset of a confidence set. This is because the sequence [math]\tau_n[/math] in \eqref{eq:define:idrn} may be larger than the critical value used to obtain the confidence set, see equation \eqref{eq:CS} below, unless regularity conditions such as degeneracy or others allow one to set [math]\tau_n[/math] equal to zero. Moreover, choice of the sequence [math]\tau_n[/math] is not data driven, and hence can be viewed as arbitrary. This raises a concern for the scope of consistent estimation in general settings.

However, reporting a set estimator together with a confidence set is arguably important to shed light on how much of the volume of the confidence set is due to statistical uncertainty and how much is due to a large identified set. One can do so by either using a half-median unbiased estimator as in \eqref{eq:idrn:half:med:unb}, or the set of minimizers of the criterion function in \eqref{eq:define:idrn} with [math]\tau_n=0[/math] (which, as previously discussed, satisfies the inner consistency requirement in \eqref{eq:inner_consistent} under weak conditions, and is Hausdorff consistent in some well behaved cases).

Confidence Sets Satisfying Various Coverage Notions

Coverage of [math]\idr{\theta}[/math] vs. Coverage of [math]\theta[/math]

I first discuss confidence sets [math]\CS\subset\R^d[/math] defined as level sets of a criterion function. To simplify notation, henceforth I assume [math]a_n=n[/math].

[[math]] \begin{align} \CS=\left\{\vartheta\in\Theta:n\crit_n(\vartheta)\le c_{1-\alpha}(\vartheta)\right\}.\label{eq:CS} \end{align} [[/math]]

In \eqref{eq:CS}, [math]c_{1-\alpha}(\vartheta)[/math] may be constant or vary in [math]\vartheta\in\Theta[/math]. It is chosen to that [math]\CS[/math] satisfies (asymptotically) a certain coverage property with respect to either [math]\idr{\theta}[/math] or each [math]\vartheta\in\idr{\theta}[/math]. Correspondingly, different appearances of [math]c_{1-\alpha}(\vartheta)[/math] may refer to different critical values associated with different coverage notions. The challenging theoretical aspect of inference in partial identification is the determination of [math]c_{1-\alpha}[/math] and of methods to approximate it.

A first classification of coverage notions pertains to whether the confidence set should cover [math]\idr{\theta}[/math] or each of its elements with a prespecified asymptotic probability. Early on, within the study of interval-identified parameters, [36][37] put forward a confidence interval that expands each of the sample analogs of the extreme points of the population bounds by an amount designed so that the confidence interval asymptotically covers the population bounds with prespecified probability. [3] study the general problem of inference for a set [math]\idr{\theta}[/math] defined as the zero-level set of a criterion function. The coverage notion that they propose is pointwise coverage of the set, whereby [math]c_{1-\alpha}[/math] is chosen so that:

[[math]] \begin{align} \liminf_{n\to\infty}\sP(\idr{\theta}\subseteq\CS)\ge 1-\alpha\text{for all}\sP\in\cP.\label{eq:CS_coverage:set:pw} \end{align} [[/math]]

[3] provide conditions under which [math]\CS[/math] satisfies \eqref{eq:CS_coverage:set:pw} with [math]c_{1-\alpha}[/math] constant in [math]\vartheta[/math], yielding the so called criterion function approach to statistical inference in partial identification. Under the same coverage requirement, [38] and [39] introduce novel bootstrap methods for inference in moment inequality models. [40] propose an inference method for finite games of complete information that exploits the structure of these models. [27] propose a method to test hypotheses and build confidence sets satisfying \eqref{eq:CS_coverage:set:pw} based on random set theory, the so called support function approach, which yields simple to compute confidence sets with asymptotic coverage equal to [math]1-\alpha[/math] when [math]\idr{\theta}[/math] is strictly convex. The reason for the strict convexity requirement is that in its absence, the support function of [math]\idr{\theta}[/math] is not fully differentiable, but only directionally differentiable, complicating inference. Indeed, [41] show that standard bootstrap methods are consistent if and only if full differentiability holds, and they provide modified bootstrap methods that remain valid when only directional differentiability holds. [29] propose a data jittering method that enforces full differentiability at the price of a small conservative distortion. [31] extend the applicability of the support function approach to other moment inequality models and establish efficiency results. [16] show that an Hausdorff distance-based test statistic can be weighted to enforce either exact or first-order equivariance to transformations of parameters. [42] provide empirical likelihood based inference methods for the support function approach. The test statistics employed in the criterion function approach and in the support function approach are asymptotically equivalent in specific moment inequality models [27][33], but the criterion function approach is more broadly applicable.


The field's interest changed to a different notion of coverage when [43] pointed out that often there is one “true” data generating [math]\theta[/math], even if it is only partially identified. Hence, they proposed confidence sets that cover each [math]\vartheta\in\idr{\theta}[/math] with a prespecified probability. For pointwise coverage, this leads to choosing [math]c_{1-\alpha}[/math] so that:

[[math]] \begin{align} \liminf_{n\to\infty}\sP(\vartheta\in\CS)\ge 1-\alpha\text{for all}\sP\in\cP\text{and}\vartheta\in\idr{\theta}.\label{eq:CS_coverage:point:pw} \end{align} [[/math]]

If [math]\idr{\theta}[/math] is a singleton then \eqref{eq:CS_coverage:set:pw} and \eqref{eq:CS_coverage:point:pw} both coincide with the pointwise coverage requirement employed for point identified parameters. However, as shown in [43](Lemma 1), if [math]\idr{\theta}[/math] contains more than one element, the two notions differ, with confidence sets satisfying \eqref{eq:CS_coverage:point:pw} being weakly smaller than ones satisfying \eqref{eq:CS_coverage:set:pw}. [5] provides confidence sets for general moment (in)equalities models that satisfy \eqref{eq:CS_coverage:point:pw} and are easy to compute. Although confidence sets that take each [math]\vartheta\in\idr{\theta}[/math] as the object of interest (and which satisfy the ’'uniform coverage requirements described in Section Pointwise vs. Uniform Coverage below) have received the most attention in the literature on inference in partially identified models, this choice merits some words of caution. First, [44] point out that if confidence sets are to be used for decision making, a policymaker concerned with robust decisions might prefer ones satisfying \eqref{eq:CS_coverage:set:pw} (respectively, \eqref{eq:CS_coverage:set} below once uniformity is taken into account) to ones satisfying \eqref{eq:CS_coverage:point:pw} (respectively, \eqref{eq:CS_coverage:point} below with uniformity). Second, while in many applications a “true” data generating [math]\theta[/math] exists, in others it does not. For example, [45] and [46] query survey respondents (in the American Life Panel and in the Health and Retirement Study, respectively) about their subjective beliefs on the probability chance of future events. A large fraction of these respondents, when given the possibility to do so, report imprecise beliefs in the form of intervals. In this case, there is no “true” point-valued belief: the “truth” is interval-valued. If one is interested in (say) average beliefs, the sharp identification region is the (Aumann) expectation of the reported intervals, and the appropriate coverage requirement for a confidence set is that in \eqref{eq:CS_coverage:set:pw} (respectively, \eqref{eq:CS_coverage:set} below with uniformity).

Pointwise vs. Uniform Coverage

In the context of interval identified parameters, such as, e.g., the mean with missing data in Theorem SIR- with [math]\theta\in\R[/math], [43] pointed out that extra care should be taken in the construction of confidence sets for partially identified parameters, as otherwise they may be asymptotically valid only pointwise (in the distribution of the observed data) over relevant classes of distributions.[Notes 14] For example, consider a confidence interval that expands each of the sample analogs of the extreme points of the population bounds by a one-sided critical value. This confidence interval controls the asymptotic coverage probability pointwise for any DGP at which the width of the population bounds is positive. This is because the sampling variation becomes asymptotically negligible relative to the (fixed) width of the bounds, making the inference problem essentially one-sided. However, for every [math]n[/math] one can find a distribution [math]\sP\in\cP[/math] and a parameter [math]\vartheta\in\idr{\theta}[/math] such that the width of the population bounds (under [math]\sP[/math]) is small relative to [math]n[/math] and the coverage probability for [math]\vartheta[/math] is below [math]1-\alpha[/math]. This happens because the proposed confidence interval does not take into account the fact that for some [math]\sP\in\cP[/math] the problem has a two-sided nature. This observation naturally leads to a more stringent requirement of ’'uniform coverage, whereby \eqref{eq:CS_coverage:set:pw}-\eqref{eq:CS_coverage:point:pw} are replaced, respectively, by

[[math]] \begin{align} \liminf_{n\to\infty}\inf_{\sP\in\cP}\sP(\idr{\theta}\subseteq\CS)&\ge 1-\alpha,\label{eq:CS_coverage:set}\\ \liminf_{n\to\infty}\inf_{\sP\in\cP}\inf_{\vartheta\in\idr{\theta}}\sP(\vartheta\in\CS)&\ge 1-\alpha,\label{eq:CS_coverage:point} \end{align} [[/math]]

and [math]c_{1-\alpha}[/math] is chosen accordingly, to obtain either \eqref{eq:CS_coverage:set} or \eqref{eq:CS_coverage:point}. Sets satisfying \eqref{eq:CS_coverage:set} are referred to as confidence regions for [math]\idr{\theta}[/math] that are uniformly consistent in level (over [math]\sP\in\cP[/math]). [10] propose such confidence regions, study their properties, and provide a step-down procedure to obtain them. [47] propose confidence sets that are contour sets of criterion functions using cutoffs that are computed via Monte Carlo simulations from the quasi‐posterior distribution of the criterion and satisfy the coverage requirement in \eqref{eq:CS_coverage:set}. They recommend the use of a Sequential Monte Carlo algorithm that works well also when the quasi-posterior is irregular and multi-modal. They establish exact asymptotic coverage, non-trivial local power, and validity of their procedure in point identified and partially identified regular models, and validity in irregular models (e.g., in models where the reduced form parameters are on the boundary of the parameter space). They also establish efficiency of their procedure in regular models that happen to be point identified.

Sets satisfying \eqref{eq:CS_coverage:point} are referred to as confidence regions for points in [math]\idr{\theta}[/math] that are uniformly consistent in level (over [math]\sP\in\cP[/math]). Within the framework of [43], [48] shows that one can obtain a confidence interval satisfying \eqref{eq:CS_coverage:point} by pre-testing whether the lower and upper population bounds are sufficiently close to each other. If so, the confidence interval expands each of the sample analogs of the extreme points of the population bounds by a two-sided critical value; otherwise, by a one-sided. [48] provides important insights clarifying the connection between superefficient (i.e., faster than [math]O_p(1/\sqrt{n})[/math]) estimation of the width of the population bounds when it equals zero, and certain challenges in [43]'s proposed method.[Notes 15] [28] leverage [48]'s results to obtain confidence sets satisfying \eqref{eq:CS_coverage:point} using the support function approach for set identified linear models. Obtaining confidence sets that satisfy the requirement in \eqref{eq:CS_coverage:point} becomes substantially more complex in the context of general moment (in)equalities models. One of the key challenges to uniform inference stems from the fact that the behavior of the limit distribution of the test statistic depends on [math]\sqrt{n}\E_\sP(m_j(\ew_i;\vartheta)),j=1,\dots,|\cJ|[/math], which cannot be consistently estimated. [4][7][8][9][49][50], among others, make significant contributions to circumvent these difficulties in the context of a finite number of unconditional moment (in)equalities. [51][35][52][25][26][53][54], among others, make significant contributions to circumvent these difficulties in the context of a finite number of conditional moment (in)equalities (with continuously distributed conditioning variables). [55] and [56] study, respectively, the challenging frameworks where the number of moment inequalities grows with sample size and where there is a continuum of conditional moment inequalities. I refer to [11](Section 4) for a thorough discussion of these methods and a comparison of their relative (de)merits (see also [57][58]).

Coverage of the Vector [math]\theta[/math] vs. Coverage of a Component of [math]\theta[/math]

The coverage requirements in \eqref{eq:CS_coverage:set}-\eqref{eq:CS_coverage:point} refer to confidence sets in [math]\R^d[/math] for the entire [math]\theta[/math] or [math]\idr{\theta}[/math]. Often empirical researchers are interested in inference on a specific component or (smooth) function of [math]\theta[/math] (e.g., the returns to education; the effect of market size on the probability of entry; the elasticity of demand for insurance to price, etc.). For simplicity, here I focus on the case of a component of [math]\theta[/math], which I represent as [math]u^\top\theta[/math], with [math]u[/math] a standard basis vector in [math]\R^d[/math]. In this case, the (sharp) identification region of interest is

[[math]] \begin{align*} \idr{u^\top\theta}=\{s\in[-h_\Theta(-u),h_\Theta(u)]:s=u^\top\vartheta\text{and}\vartheta\in\idr{\theta}\}. \end{align*} [[/math]]

One could report as confidence interval for [math]u^\top\theta[/math] the projection of [math]\CS[/math] in direction [math]\pm u[/math]. The resulting confidence interval is asymptotically valid but typically conservative. The extent of the conservatism increases with the dimension of [math]\theta[/math] and is easily appreciated in the case of a point identified parameter. Consider, for example, a linear regression in [math]\R^{10}[/math], and suppose for simplicity that the limiting covariance matrix of the estimator is the identity matrix. Then a 95% confidence interval for [math]u^\top\theta[/math] is obtained by adding and subtracting [math]1.96[/math] to that component's estimate. In contrast, projection of a 95% confidence ellipsoid for [math]\theta[/math] on each component amounts to adding and subtracting [math]4.28[/math] to that component's estimate. It is therefore desirable to provide confidence intervals [math]\CI[/math] specifically designed to cover [math]u^\top\theta[/math] rather then the entire [math]\theta[/math]. Natural counterparts to \eqref{eq:CS_coverage:set}-\eqref{eq:CS_coverage:point} are

[[math]] \begin{align} \liminf_{n\to\infty}\inf_{\sP\in\cP}\sP(\idr{u^\top\theta} \subseteq \CI)&\ge 1-\alpha,\label{eq:CS_coverage:set:proj}\\ \liminf_{n\to\infty}\inf_{\sP\in\cP}\inf_{\vartheta\in\idr{\theta}}\sP(u^\top\vartheta\in \CI)&\ge 1-\alpha. \label{eq:CS_coverage:point:proj} \end{align} [[/math]]

As shown in [27] and [33] for the case of pointwise coverage, obtaining asymptotically valid confidence intervals is simple if the identified set is convex and one uses the support function approach. This is because it suffices to base the test statistic on the support function in direction [math]u[/math], and it is often possible to easily characterize the limiting distribution of this test statistic. See [12](Chapters 4 and 5) for details. The task is significantly more complex in general moment inequality models when [math]\idr{\theta}[/math] is non-convex and one wants to satisfy the criterion in \eqref{eq:CS_coverage:set:proj} or that in \eqref{eq:CS_coverage:point:proj}. [4] and [59] propose confidence intervals of the form

[[math]] \begin{align} \CI = \left\{s\in[-h_\Theta(-u),h_\Theta(u)]:\inf_{\vartheta\in\Theta(s)}n\crit_n(\vartheta)\le c_{1-\alpha}(s)\right\},\label{eq:CI:BCS} \end{align} [[/math]]

where [math]\Theta(s)=\{\vartheta\in\Theta:u^\top\vartheta=s\}[/math] and [math]c_{1-\alpha}[/math] is such that \eqref{eq:CS_coverage:point:proj} holds. An important idea in this proposal is that of profiling the test statistic [math]n\crit_n(\vartheta)[/math] by minimizing it over all [math]\vartheta[/math]s such that [math]u^\top\vartheta=s[/math]. One then includes in the confidence interval all values [math]s[/math] for which the profiled test statistic's value is not too large. [4] propose the use of subsampling to obtain the critical value [math]c_{1-\alpha}(s)[/math] and provide high-level conditions ensuring that \eqref{eq:CS_coverage:point:proj} holds. [59] substantially extend and improve the profiling approach by providing a bootstrap-based method to obtain [math]c_{1-\alpha}[/math] so that \eqref{eq:CS_coverage:point:proj} holds. Their method is more powerful than subsampling (for reasonable choices of subsample size). [60] further enlarge the domain of applicability of the profiling approach by proposing a method based on this approach that is asymptotically uniformly valid when the number of moment conditions is large, and can grow with the sample size, possibly at exponential rates. [61] propose a bootstrap-based calibrated projection approach where

[[math]] \begin{align} \CI= [-h_{\eC_n(c_{1-\alpha})}(-u),h_{\eC_n(c_{1-\alpha})}(u)],\label{eq:def:CI} \end{align} [[/math]]

with

[[math]] \begin{align} h_{\eC_n(c_{1-\alpha})}(u)\equiv\sup_{\vartheta\in\Theta}u^\top\vartheta\text{s.t.}\frac{\sqrt{n}\bar{m}_{n,j}(\vartheta)}{\hat{\sigma}_{n,j}(\vartheta)}\leq c_{1-\alpha}(\vartheta),j=1,\dots,|\cJ|\label{eq:KMS:proj} \end{align} [[/math]]

and [math]c_{1-\alpha}[/math] a critical level function calibrated so that \eqref{eq:CS_coverage:point:proj} holds. Compared to the simple projection of [math]\CS[/math] mentioned at the beginning of Section Coverage of the Vector $\theta$ vs. Coverage of a Component of $\theta$, calibrated projection (weakly) reduces the value of [math]c_{1-\alpha}[/math] so that the projection of [math]\theta[/math], rather than [math]\theta[/math] itself, is asymptotically covered with the desired probability uniformly. [47] provide methods to build confidence intervals and confidence sets on projections of [math]\idr{\theta}[/math] as contour sets of criterion functions using cutoffs that are computed via Monte Carlo simulations from the quasi‐posterior distribution of the criterion, and that satisfy the coverage requirement in \eqref{eq:CS_coverage:set:proj}. One of their procedures, designed specifically for scalar projections, delivers a confidence interval as the contour set of a profiled quasi-likelihood ratio with critical value equal to a quantile of the Chi-squared distribution with one degree of freedom.

A Brief Note on Bayesian Methods

The confidence sets discussed in this section are based on the frequentist approach to inference. It is natural to ask whether in partially identified models, as in well behaved point identified models, one can build Bayesian credible sets that at least asymptotically coincide with frequentist confidence sets. This question was first addressed by [62], with a negative answer for the case that the coverage in \eqref{eq:CS_coverage:point} is sought out. In particular, they showed that the resulting Bayesian credible sets are a subset of [math]\idr{\theta}[/math], and hence too narrow from the frequentist perspective. This discrepancy can be ameliorated when inference is sought out for [math]\idr{\theta}[/math] rather than for each [math]\vartheta\in\idr{\theta}[/math]. [63], [64], [34], and [65] propose Bayesian credible regions that are valid for frequentist inference in the sense of \eqref{eq:CS_coverage:set:pw}, where the first two build on the criterion function approach and the second two on the support function approach. All these contributions rely on the model being separable, in the sense that it yields moment inequalities that can be written as the sum of a function of the data only, and a function of the model parameters only (as in, e.g., eq:CT_00-eq:CT_01L). In these models, the function of the data only (the reduced form parameter) is point identified, it is related to the structural parameters [math]\theta[/math] through a known mapping, and under standard regularity conditions it can be [math]\sqrt{n}[/math]-consistently estimated. The resulting estimator has an asymptotically Normal distribution. The various approaches place a prior on the reduced form parameter, and standard tools in Bayesian analysis are used to obtain a posterior. The known mapping from reduced form to structural parameters is then applied to this posterior to obtain a credible set for [math]\idr{\theta}[/math].

General references

Molinari, Francesca (2020). "Microeconometrics with Partial Identification". arXiv:2004.11751 [econ.EM].

Notes

  1. This assumption is often maintained in the literature. See, e.g., [1] for a treatment of inference with dependent observations. [2] study inference in games of complete information as in Identification Problem, imposing the i.i.d. assumption on the unobserved payoff shifters [math]\{\eps_{i1},\eps_{i2}\}_{i=1}^n[/math]. The authors note that because the selection mechanism picking the equilibrium played in the regions of multiplicity (see Section Static, Simultaneous-Move Finite Games with Multiple Equilibria) is left completely unspecified and may be arbitrarily correlated across markets, the resulting observed variables [math]\{\ew_i\}_{i=1}^n[/math] may not be independent and identically distributed, and they propose an inference method to address this issue.
  2. Examples where the set [math]\cJ[/math] is a compact set (e.g., a unit ball) rather than a finite set include the case of best linear prediction with interval outcome and covariate data, see characterization eq:ThetaI:BLP on p.\pageref{eq:ThetaI:BLP}, and the case of entry games with multiple mixed strategy Nash equilibria, see characterization eq:SIR_sharp_mixed_sup on p.\pageref{eq:SIR_sharp_mixed_sup}. A more general continuum of inequalities is also possible, as in the case of discrete choice with endogenous explanatory variables, see characterization eq:SIR:discrete:choice:endogenous on p.\pageref{eq:SIR:discrete:choice:endogenous}. I refer to [3] and [4](Supplementary Appendix B) for inference methods in the presence of a continuum of conditional moment (in)equalities.
  3. I refer to [5], [6], [7], [8], [9][10], [11], [12], and [13], for inference methods in the case that the conditioning variables have a continuous distribution.
  4. In these expressions an index of the form [math]jk[/math] not separated by a comma equals the product of [math]j[/math] with [math]k[/math].
  5. Using the well known duality between tests of hypotheses and confidence sets, the discussion could be re-framed in terms of size of the test.
  6. The definition of the Hausdorff distance can be generalized to an arbitrary metric space by replacing the Euclidean metric by the metric specified on that space.
  7. It was previously used in the mathematical literature on random set theory, for example to formalize laws of large numbers and central limit theorems for random sets such as the ones in Theorems and [14][15].
  8. See [16](Theorem 1) for a pedagogically helpful proof for a semiparametric binary model.
  9. Using this normalized criterion function is especially important in light of possible model misspecification, see Section.
  10. [17](equation (4.1) and equation (4.6)) set [math]\gamma=1[/math] because they report the assumption for a criterion function that does not square the moment violations.
  11. By Theorem, the Aumann expectation of a random closed set defined on a nonatomic probability space is convex. In this chapter I am assuming nonatomicity of the probability space. Even if I did not make this assumption, however, when working with a random sample the relevant probability space is the product space with [math]n\to\infty[/math], hence nonatomic [18]. If [math]\idr{\theta}[/math] is not convex, [19]'s analysis applies to its convex hull.
  12. [20](Supplementary Appendix F) establish that if [math]\ex[/math] has finite support, [math]\idr{\theta}[/math] in Theorem SIR- can be written as the collection of [math]\vartheta\in\Theta[/math] that satisfy a finite number of moment inequalities, as posited in this section.
  13. There is a large literature in macro-econometrics, pioneered by [21], [22], and [23], concerned with Bayesian inference with a non-informative prior for non-identified parameters. I refer to [24](Chapter 13) for a thorough review. Frequentist inference for impulse response functions in Structural Vector Autoregression models is carried out, e.g., in [25] and [26].
  14. This discussion draws on many conversations with J\"{o}rg Stoye, as well as on notes that he shared with me, for which I thank him.
  15. Indeed, the confidence interval proposed by [27] can be thought of as using a Hodges-type shrinkage estimator (see, e.g., [28]) for the width of the population bounds.

References

  1. Hansen, L.P. (1982b): “Large Sample Properties of Generalized Method of Moments Estimators” Econometrica, 50(4), 1029--1054.
  2. 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 Manski, C.F., and E.Tamer (2002): “Inference on Regressions with Interval Data on a Regressor or Outcome” Econometrica, 70(2), 519--546.
  3. 3.00 3.01 3.02 3.03 3.04 3.05 3.06 3.07 3.08 3.09 3.10 Chernozhukov, V., H.Hong, and E.Tamer (2007): “Estimation and Confidence Regions for Parameter Sets in Econometric Models” Econometrica, 75(5), 1243--1284.
  4. 4.0 4.1 4.2 4.3 Romano, J.P., and A.M. Shaikh (2008): “Inference for identifiable parameters in partially identified econometric models” Journal of Statistical Planning and Inference, 138(9), 2786 -- 2807.
  5. 5.0 5.1 Rosen, A.M. (2008): “Confidence sets for partially identified parameters that satisfy a finite number of moment inequalities” Journal of Econometrics, 146(1), 107 -- 117.
  6. Galichon, A., and M.Henry (2009): “A test of non-identifying restrictions and confidence regions for partially identified parameters” Journal of Econometrics, 152(2), 186 -- 196.
  7. 7.0 7.1 Andrews, D. W.K., and P.Guggenberger (2009): “Validity of Subsampling and `Plug-in Asymptotic' Inference for Parameters Defined by Moment Inequalities” Econometric Theory, 25(3), 669--709.
  8. 8.0 8.1 8.2 Andrews, D. W.K., and G.Soares (2010): “Inference for Parameters Defined by Moment Inequalities Using Generalized Moment Selection” Econometrica, 78(1), 119--157.
  9. 9.0 9.1 Canay, I.A. (2010): “EL inference for partially identified models: Large deviations optimality and bootstrap validity” Journal of Econometrics, 156(2), 408 -- 425.
  10. 10.0 10.1 Romano, J.P., and A.M. Shaikh (2010): “Inference for the Identified Set in Partially Identified Econometric Models” Econometrica, 78(1), 169--211.
  11. 11.0 11.1 Canay, I.A., and A.M. Shaikh (2017): “Practical and Theoretical Advances in Inference for Partially Identified Models” in Advances in Economics and Econometrics: Eleventh World Congress, ed. by B.Honoré, A.Pakes, M.Piazzesi, and L.Samuelson, vol.2 of Econometric Society Monographs, p. 271–306. Cambridge University Press.
  12. 12.0 12.1 12.2 12.3 Molchanov, I., and F.Molinari (2018): Random Sets in Econometrics. Econometric Society Monograph Series, Cambridge University Press, Cambridge UK.
  13. 13.0 13.1 Molchanov, I. (2017): Theory of Random Sets. Springer, London, 2 edn.
  14. Hansen, L.P., J.Heaton, and E.G.J. Luttmer (1995): “Econometric Evaluation of Asset Pricing Models” The Review of Financial Studies, 8(2), 237--274.
  15. 15.0 15.1 15.2 15.3 Molchanov, I. (1998): “A limit theorem for solutions of inequalities” Scandinavian Journal of Statistics, 25, 235--242.
  16. 16.0 16.1 Chernozhukov, V., E.Kocatulum, and K.Menzel (2015): “Inference on sets in finance” Quantitative Economics, 6(2), 309--358.
  17. Hansen, L.P., and R.Jagannathan (1991): “Implications of Security Market Data for Models of Dynamic Economies” Journal of Political Economy, 99(2), 225--262.
  18. Markowitz, H. (1952): “Portfolio selection” Journal of Finance, 7, 77--91.
  19. Chetty, R. (2012): “Bounds on elasticities with optimization frictions: a synthesis of micro and macro evidence in labor supply” Econometrica, 80(3), 969--1018.
  20. Redner, R. (1981): “Note on the Consistency of the Maximum Likelihood Estimate for Nonidentifiable Distributions” The Annals of Statistics, 9(1), 225--228.
  21. 21.0 21.1 {Kaido}, H., F.{Molinari}, and J.{Stoye} (2019b): “{Constraint Qualifications in Partial Identification}” working paper, available at https://arxiv.org/pdf/1908.09103.pdf.
  22. Bazaraa, M.S., H.D. Sherali, and C.Shetty (2006): Nonlinear programming: theory and algorithms. Hoboken, N.J. : Wiley-Interscience, 3rd edn.
  23. Yildiz, N. (2012): “Consistency of plug-in estimators of upper contour and level sets” Econometric Theory, 28(2), 309--327.
  24. Menzel, K. (2014): “Consistent estimation with many moment inequalities” Journal of Econometrics, 182(2), 329 -- 350.
  25. 25.0 25.1 Armstrong, T.B. (2014): “Weighted KS statistics for inference on conditional moment inequalities” Journal of Econometrics, 181(2), 92 -- 116.
  26. 26.0 26.1 Armstrong, T.B. (2015): “Asymptotically exact inference in conditional moment inequality models” Journal of Econometrics, 186(1), 51 -- 65.
  27. 27.0 27.1 27.2 27.3 27.4 27.5 27.6 27.7 27.8 Beresteanu, A., and F.Molinari (2008): “Asymptotic Properties for a Class of Partially Identified Models” Econometrica, 76(4), 763--814.
  28. 28.0 28.1 28.2 28.3 Bontemps, C., T.Magnac, and E.Maurin (2012): “Set identified linear models” Econometrica, 80(3), 1129--1155.
  29. 29.0 29.1 29.2 Chandrasekhar, A., V.Chernozhukov, F.Molinari, and P.Schrimpf (2018): “Best linear approximations to set identified functions: with an application to the gender wage gap” CeMMAP working paper CWP09/19, available at https://www.cemmap.ac.uk/publication/id/13913.
  30. Magnac, T., and E.Maurin (2008): “Partial Identification in Monotone Binary Models: Discrete Regressors and Interval Data” The Review of Economic Studies, 75(3), 835--864.
  31. 31.0 31.1 31.2 Kaido, H., and A.Santos (2014): “Asymptotically efficient estimation of models defined by convex moment inequalities” Econometrica, 82(1), 387--413.
  32. Bickel, P.J., C.A. Klaassen, Y.Ritov, and J.A. Wellner (1993): Efficient and Adaptive Estimation for Semiparametric Models. Springer, New York.
  33. 33.0 33.1 33.2 33.3 Kaido, H. (2016): “A dual approach to inference for partially identified econometric models” Journal of Econometrics, 192(1), 269 -- 290.
  34. 34.0 34.1 34.2 Kitagawa, T., and R.Giacomini (2018): “Robust Bayesian inference for set-identified models” CeMMAP working paper CWP61/18, available at https://www.cemmap.ac.uk/publication/id/13675.
  35. 35.0 35.1 Chernozhukov, V., S.Lee, and A.M. Rosen (2013): “Intersection Bounds: estimation and inference” Econometrica, 81(2), 667--737.
  36. Horowitz, J.L., and C.F. Manski (1998): “Censoring of outcomes and regressors due to survey nonresponse: Identification and estimation using weights and imputations” Journal of Econometrics, 84(1), 37 -- 58.
  37. Horowitz, J.L., and C.F. Manski (2000): “Nonparametric Analysis of Randomized Experiments with Missing Covariate and Outcome Data” Journal of the American Statistical Association, 95(449), 77--84.
  38. Bugni, F.A. (2010): “Bootstrap inference in partially identified models defined by moment inequalities: coverage of the identified set” Econometrica, 78(2), 735--753.
  39. Galichon, A., and M.Henry (2013): “Dilation bootstrap” Journal of Econometrics, 177(1), 109 -- 115.
  40. Henry, M., R.Méango, and M.Queyranne (2015): “Combinatorial approach to inference in partially identified incomplete structural models” Quantitative Economics, 6(2), 499--529.
  41. Fang, Z., and A.Santos (2018): “{Inference on Directionally Differentiable Functions}” The Review of Economic Studies, 86(1), 377--412.
  42. Adusumilli, K., and T.Otsu (2017): “{Empirical Likelihood for Random Sets}” Journal of the American Statistical Association, 112(519), 1064--1075.
  43. 43.0 43.1 43.2 43.3 43.4 Imbens, G.W., and C.F. Manski (2004): “Confidence Intervals for Partially Identified Parameters” Econometrica, 72(6), 1845--1857.
  44. Henry, M., and A.Onatski (2012): “Set coverage and robust policy” Economics Letters, 115(2), 256 -- 257.
  45. Manski, C.F., and F.Molinari (2010): “Rounding Probabilistic Expectations in Surveys” Journal of Business and Economic Statistics, 28(2), 219--231.
  46. Giustinelli, P., C.F. Manski, and F.Molinari (2019a): “Precise or Imprecise Probabilities? Evidence from survey response on dementia and long-term care” NBER Working Paper 26125, available at https://www.nber.org/papers/w26125.
  47. 47.0 47.1 Chen, X., T.M. Christensen, and E.Tamer (2018): “MCMC Confidence Sets for Identified Sets” Econometrica, 86(6), 1965--2018.
  48. 48.0 48.1 48.2 Stoye, J. (2009): “More on Confidence Intervals for Partially Identified Parameters” Econometrica, 77(4), 1299--1315.
  49. Andrews, D. W.K., and P.J. Barwick (2012): “Inference for parameters defined by moment inequalities: a recommended moment selection procedure” Econometrica, 80(6), 2805--2826.
  50. Romano, J.P., A.M. Shaikh, and M.Wolf (2014): “A practical two-step method for testing moment inequalities” Econometrica, 82(5), 1979--2002.
  51. Andrews, D. W.K., and X.Shi (2013): “Inference based on conditional moment inequalities” Econometrica, 81(2), 609--666.
  52. Lee, S., K.Song, and Y.-J. Whang (2013): “Testing functional inequalities” Journal of Econometrics, 172(1), 14 -- 32.
  53. Armstrong, T.B., and H.P. Chan (2016): “Multiscale adaptive inference on conditional moment inequalities” Journal of Econometrics, 194(1), 24 -- 43.
  54. Chetverikov, D. (2018): “{Adaptive Test of Conditional Moment Inequalities}” Econometric Theory, 34(1), 186–227.
  55. Chernozhukov, V., D.Chetverikov, and K.Kato (2018): “Inference on causal and structural parameters using many moment inequalities” Review of Economic Studies, forthcoming, available at https://doi.org/10.1093/restud/rdy065.
  56. Andrews, D. W.K., and X.Shi (2017): “Inference based on many conditional moment inequalities” Journal of Econometrics, 196(2), 275 -- 287.
  57. Bugni, F.A., I.A. Canay, and P.Guggenberger (2012): “Distortions of Asymptotic Confidence Size in Locally Misspecified Moment Inequality Models” Econometrica, 80(4), 1741--1768.
  58. Bugni, F.A. (2016): “Comparison of inferential methods in partially identifies models in terms of error in coverage probability” Econometric Theory, 32(1), 187–242.
  59. 59.0 59.1 Bugni, F.A., I.A. Canay, and X.Shi (2017): “Inference for subvectors and other functions of partially identified parameters in moment inequality models” Quantitative Economics, 8(1), 1--38.
  60. Belloni, A., F.A. Bugni, and V.Chernozhukov (2018): “Subvector inference in partially identified models with many moment inequalities” available at https://arxiv.org/abs/1806.11466.
  61. {Kaido}, H., F.{Molinari}, and J.{Stoye} (2019a): “{Confidence Intervals for Projections of Partially Identified Parameters}” Econometrica, 87(4), 1397--1432.
  62. Moon, H.R., and F.Schorfheide (2012): “Bayesian and frequentist inference in partially identified models” Econometrica, 80(2), 755--782.
  63. Norets, A., and X.Tang (2014): “{Semiparametric Inference in Dynamic Binary Choice Models}” The Review of Economic Studies, 81(3), 1229--1262.
  64. Kline, B., and E.Tamer (2016): “Bayesian inference in a class of partially identified models” Quantitative Economics, 7(2), 329--366.
  65. Liao, Y., and A.Simoni (2019): “Bayesian inference for partially identified smooth convex models” Journal of Econometrics, 211(2), 338 -- 360.