guide:6d1a428897: Difference between revisions

From Stochiki
mNo edit summary
mNo edit summary
 
(6 intermediate revisions by the same user not shown)
Line 148: Line 148:
</div>
</div>


===<span id="subsec:framework:inference"></span>Framework and Scope of the Discussion===
==<span id="subsec:framework:inference"></span>Framework and Scope of the Discussion==


The identification analysis carried out in [[guide:Ec36399528#sec:prob:distr |Section-]] [[guide:521939d27a#sec:structural |Section]] presumes knowledge of the joint distribution <math>\sP</math> of the observable variables.
The identification analysis carried out in [[guide:Ec36399528#sec:prob:distr |Section-]] [[guide:8d94784544#sec:structural |Section]] presumes knowledge of the joint distribution <math>\sP</math> of the observable variables.
That is, it presumes that <math>\sP</math> can be learned with certainty from observation of the entire population.
That is, it presumes that <math>\sP</math> can be learned with certainty from observation of the entire population.
In practice, one observes a sample of size <math>n</math> drawn from <math>\sP</math>.
In practice, one observes a sample of size <math>n</math> drawn from <math>\sP</math>.
For simplicity I assume it to be a random sample.<ref group="Notes" >This assumption is often maintained in the literature. See, e.g., {{ref|name=and:soa10}} for a treatment of inference with dependent observations. {{ref|name=eps:kai:seo16}} study inference in games of complete information as in Identification [[guide:521939d27a#IP:entry_game |Problem]], imposing the i.i.d. assumption on the unobserved payoff shifters <math>\{\eps_{i1},\eps_{i2}\}_{i=1}^n</math>. The authors note that because the selection mechanism picking the equilibrium played in the regions of multiplicity (see Section [[guide:521939d27a#subsec:multiple:eq |Static, Simultaneous-Move Finite Games with Multiple Equilibria]]) is left completely unspecified and may be arbitrarily correlated across markets, the resulting observed variables <math>\{\ew_i\}_{i=1}^n</math> may not be independent and identically distributed, and they propose an inference method to address this issue.</ref>
For simplicity I assume it to be a random sample.<ref group="Notes" >This assumption is often maintained in the literature. See, e.g., {{ref|name=and:soa10}} for a treatment of inference with dependent observations. {{ref|name=eps:kai:seo16}} study inference in games of complete information as in Identification [[guide:D084086519#IP:entry_game |Problem]], imposing the i.i.d. assumption on the unobserved payoff shifters <math>\{\eps_{i1},\eps_{i2}\}_{i=1}^n</math>. The authors note that because the selection mechanism picking the equilibrium played in the regions of multiplicity (see Section [[guide:D084086519#subsec:multiple:eq |Static, Simultaneous-Move Finite Games with Multiple Equilibria]]) is left completely unspecified and may be arbitrarily correlated across markets, the resulting observed variables <math>\{\ew_i\}_{i=1}^n</math> may not be independent and identically distributed, and they propose an inference method to address this issue.</ref>
Statistical inference on <math>\idr{\theta}</math> needs to be conducted using knowledge of <math>\sP_n</math>, the empirical distribution of the observable outcomes and covariates.
Statistical inference on <math>\idr{\theta}</math> needs to be conducted using knowledge of <math>\sP_n</math>, the empirical distribution of the observable outcomes and covariates.
Because <math>\idr{\theta}</math> is not a singleton, this task is particularly delicate.  
Because <math>\idr{\theta}</math> is not a singleton, this task is particularly delicate.  
To start, care is required to choose a proper notion of consistency for a set estimator <math>\idrn{\theta}</math> and to obtain palatable conditions under which such consistency attains.
To start, care is required to choose a proper notion of consistency for a set estimator <math>\idrn{\theta}</math> and to obtain palatable conditions under which such consistency attains.
Next, the asymptotic behavior of statistics designed to test hypothesis or build confidence sets for <math>\idr{\theta}</math> or for <math>\vartheta\in\idr{\theta}</math> might change with <math>\vartheta</math>, creating technical challenges for the construction of confidence sets that are not encountered when <math>\theta</math> is point identified.
Next, the asymptotic behavior of statistics designed to test hypothesis or build confidence sets for <math>\idr{\theta}</math> or for <math>\vartheta\in\idr{\theta}</math> might change with <math>\vartheta</math>, creating technical challenges for the construction of confidence sets that are not encountered when <math>\theta</math> is point identified.
Many of the sharp identification regions derived in [[guide:Ec36399528#sec:prob:distr|Section-]] [[guide:521939d27a#sec:structural |Section]] can be written as collections of vectors <math>\vartheta\in\Theta</math> that satisfy conditional or unconditional moment (in)equalities.
Many of the sharp identification regions derived in [[guide:Ec36399528#sec:prob:distr|Section-]] [[guide:8d94784544#sec:structural |Section]] can be written as collections of vectors <math>\vartheta\in\Theta</math> that satisfy conditional or unconditional moment (in)equalities.
For simplicity, I assume that <math>\Theta</math> is a compact and convex subset of <math>\R^d</math>, and I use the formalization for the case of a finite number of unconditional moment (in)equalities:
For simplicity, I assume that <math>\Theta</math> is a compact and convex subset of <math>\R^d</math>, and I use the formalization for the case of a finite number of unconditional moment (in)equalities:


Line 167: Line 167:
\end{align}
\end{align}
</math>
</math>
In \eqref{eq:sharp_id_for_inference}, <math>\ew_i\in\cW\subseteq\R^{d_\cW}</math> is a random vector collecting all observable variables, with <math>\ew\sim\sP</math>; <math>m_j:\cW\times\Theta\to\R</math>, <math>j\in\cJ\equiv\cJ_1\cup\cJ_2</math>, are known measurable functions characterizing the model; and <math>\cJ</math> is a finite set equal to <math>\{1,\dots,|\cJ|\}</math>.<ref group="Notes" >Examples where the set <math>\cJ</math> is a compact set (e.g., a unit ball) rather than a finite set include the case of best linear prediction with interval outcome and covariate data, see characterization [[guide:Ec36399528#eq:ThetaI:BLP |eq:ThetaI:BLP]] on p.\pageref{eq:ThetaI:BLP}, and the case of entry games with multiple mixed strategy Nash equilibria, see characterization [[guide:521939d27a#eq:SIR_sharp_mixed_sup |eq:SIR_sharp_mixed_sup]] on p.\pageref{eq:SIR_sharp_mixed_sup}.
In \eqref{eq:sharp_id_for_inference}, <math>\ew_i\in\cW\subseteq\R^{d_\cW}</math> is a random vector collecting all observable variables, with <math>\ew\sim\sP</math>; <math>m_j:\cW\times\Theta\to\R</math>, <math>j\in\cJ\equiv\cJ_1\cup\cJ_2</math>, are known measurable functions characterizing the model; and <math>\cJ</math> is a finite set equal to <math>\{1,\dots,|\cJ|\}</math>.<ref group="Notes" >Examples where the set <math>\cJ</math> is a compact set (e.g., a unit ball) rather than a finite set include the case of best linear prediction with interval outcome and covariate data, see characterization [[guide:Ec36399528#eq:ThetaI:BLP |eq:ThetaI:BLP]] on p.\pageref{eq:ThetaI:BLP}, and the case of entry games with multiple mixed strategy Nash equilibria, see characterization [[guide:D084086519#eq:SIR_sharp_mixed_sup |eq:SIR_sharp_mixed_sup]] on p.\pageref{eq:SIR_sharp_mixed_sup}.
A more general continuum of inequalities is also possible, as in the case of discrete choice with endogenous explanatory variables, see characterization [[guide:521939d27a#eq:SIR:discrete:choice:endogenous |eq:SIR:discrete:choice:endogenous]] on p.\pageref{eq:SIR:discrete:choice:endogenous}.
A more general continuum of inequalities is also possible, as in the case of discrete choice with endogenous explanatory variables, see characterization [[guide:8d94784544#eq:SIR:discrete:choice:endogenous |eq:SIR:discrete:choice:endogenous]] on p.\pageref{eq:SIR:discrete:choice:endogenous}.
I refer to {{ref|name=and:shi17}} and {{ref|name=ber:mol:mol11}}{{rp|at=Supplementary Appendix B}} for inference methods in the presence of a continuum of conditional moment (in)equalities.</ref>
I refer to {{ref|name=and:shi17}} and {{ref|name=ber:mol:mol11}}{{rp|at=Supplementary Appendix B}} for inference methods in the presence of a continuum of conditional moment (in)equalities.</ref>
Instances where <math>\idr{\theta}</math> is characterized through a finite number of conditional moment (in)equalities and the conditioning variables have finite support can easily be recast as in \eqref{eq:sharp_id_for_inference}.<ref group="Notes" >I refer to {{ref|name=kha:tam09}}, {{ref|name=and:shi13}}, {{ref|name=che:lee:ros13}}, {{ref|name=lee:son:wha13}}, {{ref|name=arm14b}}{{ref|name=arm15}}, {{ref|name=arm:cha16}}, {{ref|name=che:che:kat18}}, and {{ref|name=che18}}, for inference methods in the case that the conditioning variables have a continuous distribution.</ref>
Instances where <math>\idr{\theta}</math> is characterized through a finite number of conditional moment (in)equalities and the conditioning variables have finite support can easily be recast as in \eqref{eq:sharp_id_for_inference}.<ref group="Notes" >I refer to {{ref|name=kha:tam09}}, {{ref|name=and:shi13}}, {{ref|name=che:lee:ros13}}, {{ref|name=lee:son:wha13}}, {{ref|name=arm14b}}{{ref|name=arm15}}, {{ref|name=arm:cha16}}, {{ref|name=che:che:kat18}}, and {{ref|name=che18}}, for inference methods in the case that the conditioning variables have a continuous distribution.</ref>
Consider, for example, the two player entry game model in Identification [[guide:521939d27a#IP:entry_game |Problem]] on p.\pageref{IP:entry_game}, where <math>\ew=(\ey_1,\ey_2,\ex_1,\ex_2)</math>.
Consider, for example, the two player entry game model in Identification [[guide:D084086519#IP:entry_game |Problem]], where <math>\ew=(\ey_1,\ey_2,\ex_1,\ex_2)</math>.
Using (in)equalities [[guide:521939d27a#eq:CT_00 |eq:CT_00]]-[[guide:521939d27a#eq:CT_01L |eq:CT_01L]] and assuming that the distribution of <math>(\ex_1,\ex_2)</math> has <math>\bar{k}</math> points of support, denoted <math>(x_{1,k},x_{2,k}),k=1,\dots,\bar{k}</math>, we have <math>|\cJ|=4\bar{k}</math> and for <math>k=1,\dots,\bar{k}</math>,<ref group="Notes" >In these expressions an index of the form <math>jk</math> not separated by a comma equals the product of <math>j</math> with <math>k</math>.</ref>
Using (in)equalities [[guide:D084086519#eq:CT_00 |eq:CT_00]]-[[guide:D084086519#eq:CT_01L |eq:CT_01L]] and assuming that the distribution of <math>(\ex_1,\ex_2)</math> has <math>\bar{k}</math> points of support, denoted <math>(x_{1,k},x_{2,k}),k=1,\dots,\bar{k}</math>, we have <math>|\cJ|=4\bar{k}</math> and for <math>k=1,\dots,\bar{k}</math>,<ref group="Notes" >In these expressions an index of the form <math>jk</math> not separated by a comma equals the product of <math>j</math> with <math>k</math>.</ref>


<math display="block">
<math display="block">
Line 185: Line 185:




In point identified moment equality models it has been common to conduct estimation and inference using a criterion function that aggregates moment violations <ref name="han82"></ref>.
In point identified moment equality models it has been common to conduct estimation and inference using a criterion function that aggregates moment violations <ref name="han82"><span style="font-variant-caps:small-caps">Hansen, L.P.</span>  (1982b): “Large Sample Properties of Generalized Method of  Moments Estimators” ''Econometrica'', 50(4), 1029--1054.</ref>.
<ref name="man:tam02"></ref> adapt this idea to the partially identified case, through a criterion function <math>\crit_\sP:\Theta\to\R_+</math> such that <math>\crit_\sP(\vartheta)=0</math> if and only if <math>\vartheta\in\idr{\theta}</math>.
<ref name="man:tam02"><span style="font-variant-caps:small-caps">Manski, C.F.,  <span style="font-variant-caps:normal">and</span> E.Tamer</span>  (2002): “Inference on  Regressions with Interval Data on a Regressor or Outcome”  ''Econometrica'', 70(2), 519--546.</ref> adapt this idea to the partially identified case, through a criterion function <math>\crit_\sP:\Theta\to\R_+</math> such that <math>\crit_\sP(\vartheta)=0</math> if and only if <math>\vartheta\in\idr{\theta}</math>.
Many criterion functions can be used (see, e.g. <ref name="man:tam02"></ref><ref name="che:hon:tam07"></ref><ref name="rom:sha08"></ref><ref name="ros08"></ref><ref name="gal:hen09"></ref><ref name="and:gug09b"></ref><ref name="and:soa10"></ref><ref name="can10"></ref><ref name="rom:sha10"></ref>).
Many criterion functions can be used (see, e.g. <ref name="man:tam02"/><ref name="che:hon:tam07"><span style="font-variant-caps:small-caps">Chernozhukov, V., H.Hong,  <span style="font-variant-caps:normal">and</span> E.Tamer</span>  (2007):  “Estimation and Confidence Regions for Parameter Sets in Econometric  Models” ''Econometrica'', 75(5), 1243--1284.</ref><ref name="rom:sha08"><span style="font-variant-caps:small-caps">Romano, J.P.,  <span style="font-variant-caps:normal">and</span> A.M. Shaikh</span>  (2008): “Inference for  identifiable parameters in partially identified econometric models”  ''Journal of Statistical Planning and Inference'', 138(9), 2786 -- 2807.</ref><ref name="ros08"><span style="font-variant-caps:small-caps">Rosen, A.M.</span>  (2008): “Confidence sets for partially identified  parameters that satisfy a finite number of moment inequalities”  ''Journal of Econometrics'', 146(1), 107 -- 117.</ref><ref name="gal:hen09"><span style="font-variant-caps:small-caps">Galichon, A.,  <span style="font-variant-caps:normal">and</span> M.Henry</span>  (2009): “A test of non-identifying restrictions and  confidence regions for partially identified parameters” ''Journal of  Econometrics'', 152(2), 186 -- 196.</ref><ref name="and:gug09b"><span style="font-variant-caps:small-caps">Andrews, D. W.K.,  <span style="font-variant-caps:normal">and</span> P.Guggenberger</span>  (2009): “Validity  of Subsampling and `Plug-in Asymptotic' Inference for Parameters Defined by  Moment Inequalities” ''Econometric Theory'', 25(3), 669--709.</ref><ref name="and:soa10"><span style="font-variant-caps:small-caps">Andrews, D. W.K.,  <span style="font-variant-caps:normal">and</span> G.Soares</span>  (2010): “Inference for  Parameters Defined by Moment Inequalities Using Generalized Moment  Selection” ''Econometrica'', 78(1), 119--157.</ref><ref name="can10"><span style="font-variant-caps:small-caps">Canay, I.A.</span>  (2010): “EL inference for partially identified models:  Large deviations optimality and bootstrap validity” ''Journal of  Econometrics'', 156(2), 408 -- 425.</ref><ref name="rom:sha10"><span style="font-variant-caps:small-caps">Romano, J.P.,  <span style="font-variant-caps:normal">and</span> A.M. Shaikh</span>  (2010): “Inference for the Identified Set in Partially  Identified Econometric Models” ''Econometrica'', 78(1), 169--211.</ref>).
Some simple and commonly employed ones include
Some simple and commonly employed ones include


Line 197: Line 197:
</math>
</math>
where <math>[x]_+=\max\{x,0\}</math> and <math>\sigma_{\sP,j}(\vartheta)</math> is the population standard deviation of <math>m_j(\ew_i;\vartheta)</math>.
where <math>[x]_+=\max\{x,0\}</math> and <math>\sigma_{\sP,j}(\vartheta)</math> is the population standard deviation of <math>m_j(\ew_i;\vartheta)</math>.
In \eqref{eq:criterion_fn_sum}-\eqref{eq:criterion_fn_max} the moment functions are standardized, as doing so is important for statistical power (see, e.g., <ref name="and:soa10"></ref>{{rp|at=p. 127}}).
In \eqref{eq:criterion_fn_sum}-\eqref{eq:criterion_fn_max} the moment functions are standardized, as doing so is important for statistical power (see, e.g., <ref name="and:soa10"/>{{rp|at=p. 127}}).
To simplify notation, I omit the label and simply use <math>\crit_\sP(\vartheta)</math>.
To simplify notation, I omit the label and simply use <math>\crit_\sP(\vartheta)</math>.
Given the criterion function, one can rewrite \eqref{eq:sharp_id_for_inference} as
Given the criterion function, one can rewrite \eqref{eq:sharp_id_for_inference} as
Line 209: Line 209:


To keep this chapter to a manageable length, I focus my discussion of statistical inference ''exclusively'' on consistent estimation and on different notions of coverage that a confidence set may be required to satisfy and that have proven useful in the literature.<ref group="Notes" >Using the well known duality between tests of hypotheses and confidence sets, the discussion could be re-framed in terms of size of the test.</ref>
To keep this chapter to a manageable length, I focus my discussion of statistical inference ''exclusively'' on consistent estimation and on different notions of coverage that a confidence set may be required to satisfy and that have proven useful in the literature.<ref group="Notes" >Using the well known duality between tests of hypotheses and confidence sets, the discussion could be re-framed in terms of size of the test.</ref>
The topics of test of hypotheses and construction of confidence sets in partially identified models are covered in <ref name="can:sha17"></ref>, who provide a comprehensive survey devoted entirely to them in the context of moment inequality models.
The topics of test of hypotheses and construction of confidence sets in partially identified models are covered in <ref name="can:sha17"><span style="font-variant-caps:small-caps">Canay, I.A.,  <span style="font-variant-caps:normal">and</span> A.M. Shaikh</span>  (2017): “Practical and  Theoretical Advances in Inference for Partially Identified Models” in  ''Advances in Economics and Econometrics: Eleventh World Congress'', ed.  by B.Honoré, A.Pakes, M.Piazzesi,  <span style="font-variant-caps:normal">and</span> L.Samuelson, vol.2 of  ''Econometric Society Monographs'', p. 271–306. Cambridge University  Press.</ref>, who provide a comprehensive survey devoted entirely to them in the context of moment inequality models.
<ref name="mol:mol18"></ref>{{rp|at=Chapters 4 and 5}} provide a thorough discussion of related methods based on the use of random set theory.
<ref name="mol:mol18"><span style="font-variant-caps:small-caps">Molchanov, I.,  <span style="font-variant-caps:normal">and</span> F.Molinari</span>  (2018): ''Random Sets in Econometrics''. Econometric  Society Monograph Series, Cambridge University Press, Cambridge UK.</ref>{{rp|at=Chapters 4 and 5}} provide a thorough discussion of related methods based on the use of random set theory.
===<span id="subsec:consistent"></span>Consistent Estimation===
==<span id="subsec:consistent"></span>Consistent Estimation==
When the identified object is a set, it is natural that its estimator is also a set.
When the identified object is a set, it is natural that its estimator is also a set.
In order to discuss statistical properties of a set-valued estimator <math>\idrn{\theta}</math> (to be defined below), and in particular its consistency, one needs to specify how to measure the distance between <math>\idrn{\theta}</math> and <math>\idr{\theta}</math>.
In order to discuss statistical properties of a set-valued estimator <math>\idrn{\theta}</math> (to be defined below), and in particular its consistency, one needs to specify how to measure the distance between <math>\idrn{\theta}</math> and <math>\idr{\theta}</math>.
Several distance measures among sets exist (see, e.g., <ref name="mo1"></ref>{{rp|at=Appendix D}}).
Several distance measures among sets exist (see, e.g., <ref name="mo1"><span style="font-variant-caps:small-caps">Molchanov, I.</span>  (2017): ''Theory of Random Sets''. Springer, London,  2 edn.</ref>{{rp|at=Appendix D}}).
A natural generalization of the commonly used Euclidean distance is the ''Hausdorff distance'', see [[guide:379e0dcd67#def:hausdorff |Definition]], which for given <math>A,B\subset\R^d</math> can be written as
A natural generalization of the commonly used Euclidean distance is the ''Hausdorff distance'', see [[guide:379e0dcd67#def:hausdorff |Definition]], which for given <math>A,B\subset\R^d</math> can be written as


Line 225: Line 225:
It is easy to verify that <math>\dist_H</math> metrizes the family of non-empty compact sets; in particular, given non-empty compact sets <math>A,B\subset\R^d</math>, <math>\dist_H(A,B) =0</math> if and only if <math>A=B</math>.
It is easy to verify that <math>\dist_H</math> metrizes the family of non-empty compact sets; in particular, given non-empty compact sets <math>A,B\subset\R^d</math>, <math>\dist_H(A,B) =0</math> if and only if <math>A=B</math>.
If either <math>A</math> or <math>B</math> is empty, <math>\dist_H(A,B) =\infty</math>.
If either <math>A</math> or <math>B</math> is empty, <math>\dist_H(A,B) =\infty</math>.
The use of the Hausdorff distance to conceptualize consistency of set valued estimators in econometrics was proposed by <ref name="han:hea:lut95"></ref>{{rp|at=Section 2.4}} and <ref name="man:tam02"></ref>{{rp|at=Section 3.2}}.<ref group="Notes" >It was previously used in the mathematical literature on random set theory, for example to formalize laws of large numbers and central limit theorems for random sets such as the ones in [[guide:379e0dcd67#thr:SLLN-basic |Theorems]] [[guide:379e0dcd67#thr:clt |and]] {{ref|name=art:vit75}}{{ref|name=gin:hah:zin83}}.</ref>
The use of the Hausdorff distance to conceptualize consistency of set valued estimators in econometrics was proposed by <ref name="han:hea:lut95"><span style="font-variant-caps:small-caps">Hansen, L.P., J.Heaton,  <span style="font-variant-caps:normal">and</span> E.G.J. Luttmer</span>  (1995):  “Econometric Evaluation of Asset Pricing Models” ''The Review of  Financial Studies'', 8(2), 237--274.</ref>{{rp|at=Section 2.4}} and <ref name="man:tam02"/>{{rp|at=Section 3.2}}.<ref group="Notes" >It was previously used in the mathematical literature on random set theory, for example to formalize laws of large numbers and central limit theorems for random sets such as the ones in [[guide:379e0dcd67#thr:SLLN-basic |Theorems]] [[guide:379e0dcd67#thr:clt |and]] {{ref|name=art:vit75}}{{ref|name=gin:hah:zin83}}.</ref>
{{defncard|label=Hausdorff Consistency|id=def:consistent_estimator|
{{defncard|label=Hausdorff Consistency|id=def:consistent_estimator|
An estimator <math>\idrn{\theta}</math> is consistent for <math>\idr{\theta}</math> if  
An estimator <math>\idrn{\theta}</math> is consistent for <math>\idr{\theta}</math> if  
Line 235: Line 235:
</math>
</math>
}}
}}
<ref name="mol98"></ref> establishes Hausdorff consistency of a plug-in estimator of the set <math>\{\vartheta\in\Theta:g_\sP(\vartheta)\le 0\}</math>, with <math>g_\sP:\cW\times\Theta \to \R</math> a lower semicontinuous function of <math>\vartheta\in\Theta</math> that can be consistently estimated by a lower semicontinuous function <math>g_n</math> uniformly over <math>\Theta</math>.
<ref name="mol98"><span style="font-variant-caps:small-caps">Molchanov, I.</span>  (1998): “A limit theorem for solutions of  inequalities” ''Scandinavian Journal of Statistics'', 25, 235--242.</ref> establishes Hausdorff consistency of a plug-in estimator of the set <math>\{\vartheta\in\Theta:g_\sP(\vartheta)\le 0\}</math>, with <math>g_\sP:\cW\times\Theta \to \R</math> a lower semicontinuous function of <math>\vartheta\in\Theta</math> that can be consistently estimated by a lower semicontinuous function <math>g_n</math> uniformly over <math>\Theta</math>.
The set estimator is <math>\{\vartheta\in\Theta:g_n(\vartheta)\le 0\}</math>.
The set estimator is <math>\{\vartheta\in\Theta:g_n(\vartheta)\le 0\}</math>.
The fundamental assumption in <ref name="mol98"></ref> is that <math>\{\vartheta\in\Theta:g_\sP(\vartheta)\le 0\}\subseteq\cl(\{\vartheta\in\Theta:g_\sP(\vartheta) <  0\})</math>, see <ref name="mol:mol18"></ref>{{rp|at=Section 5.2}} for a discussion.
The fundamental assumption in <ref name="mol98"/> is that <math>\{\vartheta\in\Theta:g_\sP(\vartheta)\le 0\}\subseteq\cl(\{\vartheta\in\Theta:g_\sP(\vartheta) <  0\})</math>, see <ref name="mol:mol18"/>{{rp|at=Section 5.2}} for a discussion.
There are important applications where this condition holds.
There are important applications where this condition holds.
<ref name="che:koc:men15"></ref> provide results related to <ref name="mol98"></ref>, as well as important extensions for the construction of confidence sets, and show that these can be applied to carry out statistical inference on the Hansen–Jagannathan sets of admissible stochastic discount factors <ref name="han:jag91"></ref>, the Markowitz–Fama mean–variance sets for asset portfolio returns <ref name="mar52"></ref>, and the set of structural elasticities in <ref name="che12b"></ref>'s analysis of demand with optimization frictions.
<ref name="che:koc:men15"><span style="font-variant-caps:small-caps">Chernozhukov, V., E.Kocatulum,  <span style="font-variant-caps:normal">and</span> K.Menzel</span>  (2015):  “Inference on sets in finance” ''Quantitative Economics'', 6(2),  309--358.</ref> provide results related to <ref name="mol98"/>, as well as important extensions for the construction of confidence sets, and show that these can be applied to carry out statistical inference on the Hansen–Jagannathan sets of admissible stochastic discount factors <ref name="han:jag91"><span style="font-variant-caps:small-caps">Hansen, L.P.,  <span style="font-variant-caps:normal">and</span> R.Jagannathan</span>  (1991): “Implications of  Security Market Data for Models of Dynamic Economies” ''Journal of  Political Economy'', 99(2), 225--262.</ref>, the Markowitz–Fama mean–variance sets for asset portfolio returns <ref name="mar52"><span style="font-variant-caps:small-caps">Markowitz, H.</span>  (1952): “Portfolio selection” ''Journal of  Finance'', 7, 77--91.</ref>, and the set of structural elasticities in <ref name="che12b"><span style="font-variant-caps:small-caps">Chetty, R.</span>  (2012): “Bounds on elasticities with optimization  frictions: a synthesis of micro and macro evidence in labor supply”  ''Econometrica'', 80(3), 969--1018.</ref>'s analysis of demand with optimization frictions.
However, these methods are not broadly applicable in the general moment (in)equalities framework of this section, as <ref name="mol98"></ref>'s key condition generally fails for the set <math>\idr{\theta}</math> in \eqref{eq:define:idr}.\medskip
However, these methods are not broadly applicable in the general moment (in)equalities framework of this section, as <ref name="mol98"/>'s key condition generally fails for the set <math>\idr{\theta}</math> in \eqref{eq:define:idr}.\medskip
====Criterion Function Based Estimators====
===Criterion Function Based Estimators===
<ref name="man:tam02"></ref> extend the standard theory of extremum estimation of point identified parameters to partial identification, and propose to estimate <math>\idr{\theta}</math> using the collection of values <math>\vartheta\in\Theta</math> that approximately minimize a sample analog of <math>\crit_\sP</math>:
<ref name="man:tam02"/> extend the standard theory of extremum estimation of point identified parameters to partial identification, and propose to estimate <math>\idr{\theta}</math> using the collection of values <math>\vartheta\in\Theta</math> that approximately minimize a sample analog of <math>\crit_\sP</math>:


<math display="block">
<math display="block">
Line 269: Line 269:
This yields that asymptotically each point in <math>\idrn{\theta}</math> is arbitrarily close to a point in <math>\idr{\theta}</math>, or more formally that <math>\sP(\idrn{\theta}\subseteq\idr{\theta})\to 1</math>.
This yields that asymptotically each point in <math>\idrn{\theta}</math> is arbitrarily close to a point in <math>\idr{\theta}</math>, or more formally that <math>\sP(\idrn{\theta}\subseteq\idr{\theta})\to 1</math>.
I refer to \eqref{eq:inner_consistent} as ''inner consistency'' henceforth.<ref group="Notes" >See {{ref|name=ble15}}{{rp|at=Theorem 1}} for a pedagogically helpful proof for a semiparametric binary model.</ref>
I refer to \eqref{eq:inner_consistent} as ''inner consistency'' henceforth.<ref group="Notes" >See {{ref|name=ble15}}{{rp|at=Theorem 1}} for a pedagogically helpful proof for a semiparametric binary model.</ref>
<ref name="red81"></ref> provides an early contribution establishing this type of inner consistency for maximum likelihood estimators when the true parameter is not point identified.
<ref name="red81"><span style="font-variant-caps:small-caps">Redner, R.</span>  (1981): “Note on the Consistency of the Maximum  Likelihood Estimate for Nonidentifiable Distributions” ''The Annals of  Statistics'', 9(1), 225--228.</ref> provides an early contribution establishing this type of inner consistency for maximum likelihood estimators when the true parameter is not point identified.
However, Hausdorff consistency requires also that
However, Hausdorff consistency requires also that


Line 278: Line 278:
</math>
</math>
i.e., that each point in <math>\idr{\theta}</math> is arbitrarily close to a point in <math>\idrn{\theta}</math>, or more formally that <math>\sP(\idr{\theta}\subseteq\idrn{\theta})\to 1</math>.
i.e., that each point in <math>\idr{\theta}</math> is arbitrarily close to a point in <math>\idrn{\theta}</math>, or more formally that <math>\sP(\idr{\theta}\subseteq\idrn{\theta})\to 1</math>.
To establish this result for the sharp identification regions in Theorem [[guide:Ec36399528#SIR:man:tam02_param |SIR-]] (parametric regression with interval covariate) and Theorem [[guide:521939d27a#SIR:man:tam02_binary |SIR-]] (semiparametric binary model with interval covariate), <ref name="man:tam02"></ref>{{rp|at=Propositions 3 and 5}} require the rate at which <math>\tau_n\stackrel{p}{\rightarrow} 0</math> to be slower than the rate at which <math>\crit_n</math> converges uniformly to <math>\crit_\sP</math> over <math>\Theta</math>.
To establish this result for the sharp identification regions in Theorem [[guide:Ec36399528#SIR:man:tam02_param |SIR-]] (parametric regression with interval covariate) and Theorem [[guide:8d94784544#SIR:man:tam02_binary |SIR-(semiparametric binary model with interval covariate)]], <ref name="man:tam02"/>{{rp|at=Propositions 3 and 5}} require the rate at which <math>\tau_n\stackrel{p}{\rightarrow} 0</math> to be slower than the rate at which <math>\crit_n</math> converges uniformly to <math>\crit_\sP</math> over <math>\Theta</math>.
What might go wrong in the absence of such a restriction?
What might go wrong in the absence of such a restriction?
A simple example can help understand the issue.
A simple example can help understand the issue.
Line 295: Line 295:
However, with positive probability in any finite sample <math>\crit_n(\vartheta)=0</math> for <math>\vartheta</math> in a random region (e.g., a triangle if <math>\crit_n</math> is the sample analog of \eqref{eq:criterion_fn_max}) that only includes points that are close to a subset of the points in <math>\idr{\theta}</math>.
However, with positive probability in any finite sample <math>\crit_n(\vartheta)=0</math> for <math>\vartheta</math> in a random region (e.g., a triangle if <math>\crit_n</math> is the sample analog of \eqref{eq:criterion_fn_max}) that only includes points that are close to a subset of the points in <math>\idr{\theta}</math>.
Hence, with positive probability the minimizer of <math>\crit_n</math> cycles between consistent estimators of subsets of <math>\idr{\theta}</math>, but does not estimate the entire set.
Hence, with positive probability the minimizer of <math>\crit_n</math> cycles between consistent estimators of subsets of <math>\idr{\theta}</math>, but does not estimate the entire set.
Enlarging the estimator to include all points that are close to minimizing <math>\crit_n</math> up to a tolerance that converges to zero sufficiently slowly removes this problem.\medskip
Enlarging the estimator to include all points that are close to minimizing <math>\crit_n</math> up to a tolerance that converges to zero sufficiently slowly removes this problem.
<ref name="che:hon:tam07"></ref> significantly generalize the consistency results in <ref name="man:tam02"></ref>.
 
<ref name="che:hon:tam07"/> significantly generalize the consistency results in <ref name="man:tam02"/>.
They work with a normalized criterion function equal to <math>\crit_n(\vartheta)-\inf_{\tilde\vartheta\in\Theta}\crit_n(\tilde\vartheta)</math>, but to keep notation light I simply refer to it as <math>\crit_n</math>.<ref group="Notes" >Using this normalized criterion function is especially important in light of possible model misspecification, see [[guide:7b0105e1fc#sec:misspec |Section]].</ref>
They work with a normalized criterion function equal to <math>\crit_n(\vartheta)-\inf_{\tilde\vartheta\in\Theta}\crit_n(\tilde\vartheta)</math>, but to keep notation light I simply refer to it as <math>\crit_n</math>.<ref group="Notes" >Using this normalized criterion function is especially important in light of possible model misspecification, see [[guide:7b0105e1fc#sec:misspec |Section]].</ref>
Under suitable regularity conditions, they establish consistency of an estimator that can be a smaller set than the one proposed by <ref name="man:tam02"></ref>, and derive its convergence rate.
Under suitable regularity conditions, they establish consistency of an estimator that can be a smaller set than the one proposed by <ref name="man:tam02"/>, and derive its convergence rate.
Some of the key conditions required by <ref name="che:hon:tam07"></ref>{{rp|at=Conditions C1 and C2}} to study convergence rates include that <math>\crit_n</math> is lower semicontinuous in <math>\vartheta</math>, satisfies various convergence properties among which <math>\sup_{\vartheta\in\idr{\theta}}\crit_n=O_p(1/a_n)</math> for a sequence of normalizing constants <math>a_n\to\infty</math>, that <math>\tau_n\ge \sup_{\vartheta\in\idr{\theta}}\crit_n(\vartheta)</math> with probability approaching one, and that <math>\tau_n\to 0</math>.
Some of the key conditions required by <ref name="che:hon:tam07"/>{{rp|at=Conditions C1 and C2}} to study convergence rates include that <math>\crit_n</math> is lower semicontinuous in <math>\vartheta</math>, satisfies various convergence properties among which <math>\sup_{\vartheta\in\idr{\theta}}\crit_n=O_p(1/a_n)</math> for a sequence of normalizing constants <math>a_n\to\infty</math>, that <math>\tau_n\ge \sup_{\vartheta\in\idr{\theta}}\crit_n(\vartheta)</math> with probability approaching one, and that <math>\tau_n\to 0</math>.
They also require that there exist positive constants <math>(\delta,\kappa,\gamma)</math> such that for any <math>\epsilon\in(0,1)</math> there are <math>(d_\epsilon,n_\epsilon)</math> such that
They also require that there exist positive constants <math>(\delta,\kappa,\gamma)</math> such that for any <math>\epsilon\in(0,1)</math> there are <math>(d_\epsilon,n_\epsilon)</math> such that


Line 311: Line 312:
In words, the assumption, referred to as ''polynomial minorant'' condition, rules out that <math>\crit_n</math> can be arbitrarily close to zero outside <math>\idr{\theta}</math>.
In words, the assumption, referred to as ''polynomial minorant'' condition, rules out that <math>\crit_n</math> can be arbitrarily close to zero outside <math>\idr{\theta}</math>.
It posits that <math>\crit_n</math> changes as at least a polynomial of degree <math>\gamma</math> in the distance of <math>\vartheta</math> from <math>\idr{\theta}</math>.  
It posits that <math>\crit_n</math> changes as at least a polynomial of degree <math>\gamma</math> in the distance of <math>\vartheta</math> from <math>\idr{\theta}</math>.  
Under some additional regularity conditions, <ref name="che:hon:tam07"></ref> establish that  
Under some additional regularity conditions, <ref name="che:hon:tam07"/> establish that  


<math display="block">
<math display="block">
Line 323: Line 324:
Under the maintained assumptions <math>\tau_n\ge \sup_{\vartheta\in\idr{\theta}}\crit_n(\vartheta)\ge\kappa[\min\{\delta,\dist(\vartheta,\idr{\theta})\}]^\gamma</math>, and the latter part of the inequality is used to obtain \eqref{eq:CHT_rate}.
Under the maintained assumptions <math>\tau_n\ge \sup_{\vartheta\in\idr{\theta}}\crit_n(\vartheta)\ge\kappa[\min\{\delta,\dist(\vartheta,\idr{\theta})\}]^\gamma</math>, and the latter part of the inequality is used to obtain \eqref{eq:CHT_rate}.
When could the polynomial minorant condition be violated?
When could the polynomial minorant condition be violated?
In moment (in)equalities models, <ref name="che:hon:tam07"></ref> require <math>\gamma=2</math>.<ref group="Notes" >{{ref|name=che:hon:tam07}}{{rp|at=equation (4.1) and equation (4.6)}} set <math>\gamma=1</math> because they report the assumption for a criterion function that does not square the moment violations.</ref>
In moment (in)equalities models, <ref name="che:hon:tam07"/> require <math>\gamma=2</math>.<ref group="Notes" >{{ref|name=che:hon:tam07}}{{rp|at=equation (4.1) and equation (4.6)}} set <math>\gamma=1</math> because they report the assumption for a criterion function that does not square the moment violations.</ref>
Consider a simple stylized example with (in)equalities of the form
Consider a simple stylized example with (in)equalities of the form


Line 339: Line 340:
On the other hand, with positive probability <math>\crit_n(\vartheta_n)=(\bar{\ew}_3-\vartheta_{1n}\vartheta_{2n})^2=O_p\left(n^{-1}\right)</math>, so that for <math>n</math> large enough <math>\crit_n(\vartheta_n) < [\dist(\vartheta_n,\idr{\theta})]^\gamma</math>, violating the assumption.
On the other hand, with positive probability <math>\crit_n(\vartheta_n)=(\bar{\ew}_3-\vartheta_{1n}\vartheta_{2n})^2=O_p\left(n^{-1}\right)</math>, so that for <math>n</math> large enough <math>\crit_n(\vartheta_n) < [\dist(\vartheta_n,\idr{\theta})]^\gamma</math>, violating the assumption.
This occurs because the gradient of the moment equality vanishes as <math>\vartheta</math> approaches zero, rendering the criterion function flat in a neighborhood of <math>\idr{\theta}</math>.
This occurs because the gradient of the moment equality vanishes as <math>\vartheta</math> approaches zero, rendering the criterion function flat in a neighborhood of <math>\idr{\theta}</math>.
As intuition would suggest, rates of convergence are slower the flatter <math>\crit_n</math> is outside <math>\idr{\theta}</math>.
As intuition would suggest, rates of convergence are slower the flatter <math>\crit_n</math> is outside <math>\idr{\theta}</math>.
<ref name="kai:mol:sto19CQ"></ref> show that in moment inequality models with smooth moment conditions, the polynomial minorant assumption with <math>\gamma=2</math> implies the Abadie constraint qualification (ACQ); see, e.g., <ref name="baz:she:she06"></ref>{{rp|at=Chapter 5}} for a definition and discussion of ACQ.
<ref name="kai:mol:sto19CQ"><span style="font-variant-caps:small-caps">{Kaido}, H., F.{Molinari},  <span style="font-variant-caps:normal">and</span> J.{Stoye}</span>  (2019b): “{Constraint Qualifications in Partial  Identification}” working paper, available at  [https://arxiv.org/pdf/1908.09103.pdf https://arxiv.org/pdf/1908.09103.pdf].</ref> show that in moment inequality models with smooth moment conditions, the polynomial minorant assumption with <math>\gamma=2</math> implies the Abadie constraint qualification (ACQ); see, e.g., <ref name="baz:she:she06"><span style="font-variant-caps:small-caps">Bazaraa, M.S., H.D. Sherali,  <span style="font-variant-caps:normal">and</span> C.Shetty</span>  (2006):  ''Nonlinear programming: theory and algorithms''. Hoboken, N.J. :  Wiley-Interscience, 3rd edn.</ref>{{rp|at=Chapter 5}} for a definition and discussion of ACQ.
 
The example just given to discuss failures of the polynomial minorant condition is in fact a known example where ACQ fails at <math>\vartheta=[00]^\top</math>.
The example just given to discuss failures of the polynomial minorant condition is in fact a known example where ACQ fails at <math>\vartheta=[00]^\top</math>.
<ref name="che:hon:tam07"></ref>{{rp|at=Condition C.3, referred to as ''degeneracy''}} also consider the case that <math>\crit_n</math> vanishes on subsets of <math>\Theta</math> that converge in Hausdorff distance to <math>\idr{\theta}</math> at rate <math>a_n^{-1/\gamma}</math>.
<ref name="che:hon:tam07"/>{{rp|at=Condition C.3, referred to as ''degeneracy''}} also consider the case that <math>\crit_n</math> vanishes on subsets of <math>\Theta</math> that converge in Hausdorff distance to <math>\idr{\theta}</math> at rate <math>a_n^{-1/\gamma}</math>.
While degeneracy might be difficult to verify in practice, <ref name="che:hon:tam07"></ref> show that if it holds, <math>\tau_n</math> can be set to zero.
While degeneracy might be difficult to verify in practice, <ref name="che:hon:tam07"/> show that if it holds, <math>\tau_n</math> can be set to zero.
<ref name="yil12"></ref> provides conditions on the moment functions, which are closely related to constraint qualifications (as discussed in <ref name="kai:mol:sto19CQ"></ref>) under which it is possible to set <math>\tau_n=0</math>.
<ref name="yil12"><span style="font-variant-caps:small-caps">Yildiz, N.</span>  (2012): “Consistency of plug-in estimators of upper  contour and level sets” ''Econometric Theory'', 28(2), 309--327.</ref> provides conditions on the moment functions, which are closely related to constraint qualifications (as discussed in <ref name="kai:mol:sto19CQ"/>) under which it is possible to set <math>\tau_n=0</math>.
<ref name="men14"></ref> studies estimation of <math>\idr{\theta}</math> when the number of moment inequalities is large relative to sample size (possibly infinite).
<ref name="men14"><span style="font-variant-caps:small-caps">Menzel, K.</span>  (2014): “Consistent estimation with many moment  inequalities” ''Journal of Econometrics'', 182(2), 329 -- 350.</ref> studies estimation of <math>\idr{\theta}</math> when the number of moment inequalities is large relative to sample size (possibly infinite).
 
He provides a consistency result for criterion-based estimators that use a number of unconditional moment inequalities that grows with sample size.
He provides a consistency result for criterion-based estimators that use a number of unconditional moment inequalities that grows with sample size.
He also considers estimators based on conditional moment inequalities, and derives the fastest possible rate for estimating <math>\idr{\theta}</math> under smoothness conditions on the conditional moment functions.  
He also considers estimators based on conditional moment inequalities, and derives the fastest possible rate for estimating <math>\idr{\theta}</math> under smoothness conditions on the conditional moment functions.  
He shows that the rates achieved by the procedures in <ref name="arm14b"></ref><ref name="arm15"></ref> are (minimax) optimal, and cannot be improved upon.
 
\begin{BI}
He shows that the rates achieved by the procedures in <ref name="arm14b"><span style="font-variant-caps:small-caps">Armstrong, T.B.</span>  (2014): “Weighted KS statistics for inference on  conditional moment inequalities” ''Journal of Econometrics'', 181(2), 92  -- 116.</ref><ref name="arm15"><span style="font-variant-caps:small-caps">Armstrong, T.B.</span>  (2015): “Asymptotically exact inference in conditional  moment inequality models” ''Journal of Econometrics'', 186(1), 51 -- 65.</ref> are (minimax) optimal, and cannot be improved upon.
<ref name="man:tam02"></ref> extend the notion of extremum estimation from point identified to partially identified models.
 
'''Key Insight:'''<i>
<ref name="man:tam02"/> extend the notion of extremum estimation from point identified to partially identified models.
They do so by putting forward a generalized criterion function whose zero-level set can be used to define <math>\idr{\theta}</math> in partially identified structural semiparametric models.
They do so by putting forward a generalized criterion function whose zero-level set can be used to define <math>\idr{\theta}</math> in partially identified structural semiparametric models.
It is then natural to define the set valued estimator <math>\idrn{\theta}</math> as the collection of approximate minimizers of the sample analog of this criterion function.
It is then natural to define the set valued estimator <math>\idrn{\theta}</math> as the collection of approximate minimizers of the sample analog of this criterion function.
<ref name="man:tam02"></ref>'s analysis of statistical inference focuses exclusively on providing consistent estimators.
<ref name="man:tam02"/>'s analysis of statistical inference focuses exclusively on providing consistent estimators.
<ref name="che:hon:tam07"></ref> substantially generalize the analysis of consistency of criterion function-based set estimators.
<ref name="che:hon:tam07"/> substantially generalize the analysis of consistency of criterion function-based set estimators.
They provide a comprehensive study of convergence rates in partially identified models.
They provide a comprehensive study of convergence rates in partially identified models.
Their work highlights the challenges a researcher faces in this context, and puts forward possible solutions in the form of assumptions under which specific rates of convergence attain.
Their work highlights the challenges a researcher faces in this context, and puts forward possible solutions in the form of assumptions under which specific rates of convergence attain.
\end{BI}
</i>
====Support Function Based Estimators====
 
<ref name="ber:mol08"></ref> introduce to the econometrics literature inference methods for set valued estimators based on random set theory.
===Support Function Based Estimators===
<ref name="ber:mol08"><span style="font-variant-caps:small-caps">Beresteanu, A.,  <span style="font-variant-caps:normal">and</span> F.Molinari</span>  (2008): “Asymptotic  Properties for a Class of Partially Identified Models” ''Econometrica'',  76(4), 763--814.</ref> introduce to the econometrics literature inference methods for set valued estimators based on random set theory.
They study the class of models where <math>\idr{\theta}</math> is convex and can be written as the Aumann (or selection) expectation of a properly defined random closed set.<ref group="Notes" >By [[guide:379e0dcd67#thr:exp-supp |Theorem]], the Aumann expectation of a random closed set defined on a nonatomic probability space is convex. In this chapter I am assuming nonatomicity of the probability space. Even if I did not make this assumption, however, when working with a random sample the relevant probability space is the product space with <math>n\to\infty</math>, hence nonatomic {{ref|name=art:vit75}}. If <math>\idr{\theta}</math> is not convex, {{ref|name=ber:mol08}}'s analysis applies to its convex hull.</ref>
They study the class of models where <math>\idr{\theta}</math> is convex and can be written as the Aumann (or selection) expectation of a properly defined random closed set.<ref group="Notes" >By [[guide:379e0dcd67#thr:exp-supp |Theorem]], the Aumann expectation of a random closed set defined on a nonatomic probability space is convex. In this chapter I am assuming nonatomicity of the probability space. Even if I did not make this assumption, however, when working with a random sample the relevant probability space is the product space with <math>n\to\infty</math>, hence nonatomic {{ref|name=art:vit75}}. If <math>\idr{\theta}</math> is not convex, {{ref|name=ber:mol08}}'s analysis applies to its convex hull.</ref>
They propose to carry out estimation and inference leveraging the representation of convex sets through their ''support function'' (given in [[guide:379e0dcd67#def:sup-fun |Definition]]), as it is done in random set theory; see <ref name="mo1"></ref>{{rp|at=Chapter 3}} and <ref name="mol:mol18"></ref>{{rp|at=Chapter 4}}.
They propose to carry out estimation and inference leveraging the representation of convex sets through their ''support function'' (given in [[guide:379e0dcd67#def:sup-fun |Definition]]), as it is done in random set theory; see <ref name="mo1"/>{{rp|at=Chapter 3}} and <ref name="mol:mol18"/>{{rp|at=Chapter 4}}.
Because the support function fully characterizes the boundary of <math>\idr{\theta}</math>, it allows for a simple sample analog estimator, and for inference procedures with desirable properties.
Because the support function fully characterizes the boundary of <math>\idr{\theta}</math>, it allows for a simple sample analog estimator, and for inference procedures with desirable properties.
An example of a framework where the approach of <ref name="ber:mol08"></ref> can be applied is that of best linear prediction with interval outcome data in Identification [[guide:Ec36399528#IP:param_pred_interval |Problem]].<ref group="Notes" >{{ref|name=kai:mol:sto19}}{{rp|at=Supplementary Appendix F}} establish that if <math>\ex</math> has finite support, <math>\idr{\theta}</math> in Theorem [[guide:Ec36399528#SIR:BLP_intervalY |SIR-]] can be written as the collection of <math>\vartheta\in\Theta</math> that satisfy a finite number of moment inequalities, as posited in this section.</ref>
An example of a framework where the approach of <ref name="ber:mol08"/> can be applied is that of best linear prediction with interval outcome data in Identification [[guide:Ec36399528#IP:param_pred_interval |Problem]].<ref group="Notes" >{{ref|name=kai:mol:sto19}}{{rp|at=Supplementary Appendix F}} establish that if <math>\ex</math> has finite support, <math>\idr{\theta}</math> in Theorem [[guide:Ec36399528#SIR:BLP_intervalY |SIR-]] can be written as the collection of <math>\vartheta\in\Theta</math> that satisfy a finite number of moment inequalities, as posited in this section.</ref>
Recall that in that case, the researcher observes random variables <math>(\yL,\yU,\ex)</math> and wishes to learn the best linear predictor of <math>\ey|\ex</math>, with <math>\ey</math> unobserved and <math>\sR(\yL\le\ey\le\yU)=1</math>.  
Recall that in that case, the researcher observes random variables <math>(\yL,\yU,\ex)</math> and wishes to learn the best linear predictor of <math>\ey|\ex</math>, with <math>\ey</math> unobserved and <math>\sR(\yL\le\ey\le\yU)=1</math>.  
For simplicity let <math>\ex</math> be a scalar.
For simplicity let <math>\ex</math> be a scalar.
Line 396: Line 403:
</math>
</math>
where <math>f(\ex_i,u)=[1\ex_i]\hat\Sigma_n^{-1}u</math>.
where <math>f(\ex_i,u)=[1\ex_i]\hat\Sigma_n^{-1}u</math>.
<ref name="ber:mol08"></ref> use the Law of Large Numbers for random sets reported in [[guide:379e0dcd67#thr:SLLN-basic |Theorem]] to show that <math>\idrn{\theta}</math> in \eqref{eq:BLP_estimator} is <math>\sqrt{n}</math>-consistent under standard conditions on the moments of <math>(\yLi,\yUi,\ex_i)</math>.
<ref name="ber:mol08"/> use the Law of Large Numbers for random sets reported in [[guide:379e0dcd67#thr:SLLN-basic |Theorem]] to show that <math>\idrn{\theta}</math> in \eqref{eq:BLP_estimator} is <math>\sqrt{n}</math>-consistent under standard conditions on the moments of <math>(\yLi,\yUi,\ex_i)</math>.
<ref name="bon:mag:mau12"></ref> and <ref name="cha:che:mol:sch18"></ref> significantly expand the applicability of <ref name="ber:mol08"/> estimator.
<ref name="bon:mag:mau12"><span style="font-variant-caps:small-caps">Bontemps, C., T.Magnac,  <span style="font-variant-caps:normal">and</span> E.Maurin</span>  (2012): “Set  identified linear models” ''Econometrica'', 80(3), 1129--1155.</ref> and <ref name="cha:che:mol:sch18"><span style="font-variant-caps:small-caps">Chandrasekhar, A., V.Chernozhukov, F.Molinari,  <span style="font-variant-caps:normal">and</span>  P.Schrimpf</span>  (2018): “Best linear approximations to set identified  functions: with an application to the gender wage gap” CeMMAP working paper  CWP09/19, available at [https://www.cemmap.ac.uk/publication/id/13913 https://www.cemmap.ac.uk/publication/id/13913].</ref> significantly expand the applicability of <ref name="ber:mol08"/> estimator.
<ref name="bon:mag:mau12"></ref> show that it can be used in a large class of partially identified linear models, including ones that allow for the availability of instrumental variables.
<ref name="bon:mag:mau12"></ref> show that it can be used in a large class of partially identified linear models, including ones that allow for the availability of instrumental variables.
<ref name="cha:che:mol:sch18"></ref> show that it can be used for best linear approximation of any function <math>f(x)</math> that is known to lie within two identified bounding functions.  
<ref name="cha:che:mol:sch18"/> show that it can be used for best linear approximation of any function <math>f(x)</math> that is known to lie within two identified bounding functions.  
The lower and upper functions defining the band are allowed to be any functions, including ones carrying an index, and can be estimated parametrically or nonparametrically.  
The lower and upper functions defining the band are allowed to be any functions, including ones carrying an index, and can be estimated parametrically or nonparametrically.  
The method allows for estimation of the parameters of the best linear approximations to the set identified functions in many of the identification problems described in [[guide:Ec36399528#sec:prob:distr |Section]].
The method allows for estimation of the parameters of the best linear approximations to the set identified functions in many of the identification problems described in [[guide:Ec36399528#sec:prob:distr |Section]].
It can also be used to estimate the sharp identification region for the parameters of a binary choice model with interval or discrete regressors under the assumptions of <ref name="mag:mau08"></ref>, characterized in [[guide:521939d27a#eq:SIR:mag:mau |eq:SIR:mag:mau]] in Section [[guide:521939d27a#subsubsec:man:tam02 |Semiparametric Binary Choice Models with Interval Valued Covariates]].  
It can also be used to estimate the sharp identification region for the parameters of a binary choice model with interval or discrete regressors under the assumptions of <ref name="mag:mau08"><span style="font-variant-caps:small-caps">Magnac, T.,  <span style="font-variant-caps:normal">and</span> E.Maurin</span>  (2008): “Partial Identification  in Monotone Binary Models: Discrete Regressors and Interval Data” ''The  Review of Economic Studies'', 75(3), 835--864.</ref>, characterized in [[guide:8d94784544#eq:SIR:mag:mau |eq:SIR:mag:mau]] in Section [[guide:8d94784544#subsubsec:man:tam02 |Semiparametric Binary Choice Models with Interval Valued Covariates]].  
<ref name="kai:san14"></ref> develop a theory of efficiency for estimators of sets <math>\idr{\theta}</math> as in \eqref{eq:sharp_id_for_inference} under the additional requirements that the inequalities <math>\E_\sP(m_j(\ew,\vartheta))</math> are convex in <math>\vartheta\in\Theta</math> and smooth as functionals of the distribution of the data.
<ref name="kai:san14"><span style="font-variant-caps:small-caps">Kaido, H.,  <span style="font-variant-caps:normal">and</span> A.Santos</span>  (2014): “Asymptotically efficient  estimation of models defined by convex moment inequalities”  ''Econometrica'', 82(1), 387--413.</ref> develop a theory of efficiency for estimators of sets <math>\idr{\theta}</math> as in \eqref{eq:sharp_id_for_inference} under the additional requirements that the inequalities <math>\E_\sP(m_j(\ew,\vartheta))</math> are convex in <math>\vartheta\in\Theta</math> and smooth as functionals of the distribution of the data.
Because of the convexity of the moment inequalities, <math>\idr{\theta}</math> is convex and can be represented through its support function.   
Because of the convexity of the moment inequalities, <math>\idr{\theta}</math> is convex and can be represented through its support function.   
Using the classic results in <ref name="bic:kla:rit:wel93"></ref>, <ref name="kai:san14"></ref> show that under suitable regularity conditions, the support function admits for <math>\sqrt{n}</math>-consistent regular estimation.  
Using the classic results in <ref name="bic:kla:rit:wel93"><span style="font-variant-caps:small-caps">Bickel, P.J., C.A. Klaassen, Y.Ritov,  <span style="font-variant-caps:normal">and</span> J.A. Wellner</span>  (1993): ''Efficient and Adaptive Estimation for Semiparametric Models''.  Springer, New York.</ref>, <ref name="kai:san14"/> show that under suitable regularity conditions, the support function admits for <math>\sqrt{n}</math>-consistent regular estimation.  
They also show that a simple plug-in estimator based on the support function attains the semiparametric efficiency bound, and the corresponding estimator of <math>\idr{\theta}</math> minimizes a wide class of asymptotic loss functions based on the Hausdorff distance.  
They also show that a simple plug-in estimator based on the support function attains the semiparametric efficiency bound, and the corresponding estimator of <math>\idr{\theta}</math> minimizes a wide class of asymptotic loss functions based on the Hausdorff distance.  
As they establish, this efficiency result applies to the estimators proposed by <ref name="ber:mol08"></ref>, including that in \eqref{eq:BLP_estimator}, and by <ref name="bon:mag:mau12"></ref>.
As they establish, this efficiency result applies to the estimators proposed by <ref name="ber:mol08"/>, including that in \eqref{eq:BLP_estimator}, and by <ref name="bon:mag:mau12"/>.
<ref name="kai16"></ref> further enlarges the applicability of the support function approach by establishing its duality with the criterion function approach, for the case that <math>\crit_\sP</math> is a convex function and <math>\crit_n</math> is a convex function almost surely.
 
<ref name="kai16"><span style="font-variant-caps:small-caps">Kaido, H.</span>  (2016): “A dual approach to inference for partially  identified econometric models” ''Journal of Econometrics'', 192(1), 269  -- 290.</ref> further enlarges the applicability of the support function approach by establishing its duality with the criterion function approach, for the case that <math>\crit_\sP</math> is a convex function and <math>\crit_n</math> is a convex function almost surely.
This allows one to use the support function approach also when a representation of <math>\idr{\theta}</math> as the Aumann expectation of a random closed set is not readily available.
This allows one to use the support function approach also when a representation of <math>\idr{\theta}</math> as the Aumann expectation of a random closed set is not readily available.
<ref name="kai16"></ref> considers <math>\idr{\theta}</math> and its level set estimator <math>\idrn{\theta}</math> as defined, respectively, in \eqref{eq:define:idr} and \eqref{eq:define:idrn}, with <math>\Theta</math> a convex subset of <math>\R^d</math>.
 
<ref name="kai16"/> considers <math>\idr{\theta}</math> and its level set estimator <math>\idrn{\theta}</math> as defined, respectively, in \eqref{eq:define:idr} and \eqref{eq:define:idrn}, with <math>\Theta</math> a convex subset of <math>\R^d</math>.
Because <math>\crit_\sP</math> and <math>\crit_n</math> are convex functions, <math>\idr{\theta}</math> and <math>\idrn{\theta}</math> are convex sets.
Because <math>\crit_\sP</math> and <math>\crit_n</math> are convex functions, <math>\idr{\theta}</math> and <math>\idrn{\theta}</math> are convex sets.
Under the same assumptions as in <ref name="che:hon:tam07"></ref>, including the polynomial minorant and the degeneracy conditions, one can set <math>\tau_n=0</math> and have <math>\dist_H(\idrn{\theta},\idr{\theta})=O_p(a_n^{-1/\gamma})</math>.
Under the same assumptions as in <ref name="che:hon:tam07"/>, including the polynomial minorant and the degeneracy conditions, one can set <math>\tau_n=0</math> and have <math>\dist_H(\idrn{\theta},\idr{\theta})=O_p(a_n^{-1/\gamma})</math>.
Moreover, due to its convexity, <math>\idr{\theta}</math> is fully characterized by its support function, which in turn can be consistently estimated (at the same rate as <math>\idr{\theta}</math>) using sample analogs as <math>h_{\idrn{\theta}}(u)=\max_{a_n\crit_n(\vartheta)\le 0}u^\top\vartheta</math>.
Moreover, due to its convexity, <math>\idr{\theta}</math> is fully characterized by its support function, which in turn can be consistently estimated (at the same rate as <math>\idr{\theta}</math>) using sample analogs as <math>h_{\idrn{\theta}}(u)=\max_{a_n\crit_n(\vartheta)\le 0}u^\top\vartheta</math>.
The latter can be computed via convex programming.\medskip
The latter can be computed via convex programming.
<ref name="kit:gia18"></ref> consider consistent estimation of <math>\idr{\theta}</math> in the context of Bayesian inference.
 
<ref name="kit:gia18"><span style="font-variant-caps:small-caps">Kitagawa, T.,  <span style="font-variant-caps:normal">and</span> R.Giacomini</span>  (2018): “Robust Bayesian  inference for set-identified models” CeMMAP working paper CWP61/18,  available at [https://www.cemmap.ac.uk/publication/id/13675 https://www.cemmap.ac.uk/publication/id/13675].</ref> consider consistent estimation of <math>\idr{\theta}</math> in the context of Bayesian inference.
They focus on partially identified models where <math>\idr{\theta}</math> depends on a “reduced form” parameter <math>\phi</math> (e.g., a vector of moments of observable random variables).
They focus on partially identified models where <math>\idr{\theta}</math> depends on a “reduced form” parameter <math>\phi</math> (e.g., a vector of moments of observable random variables).
They recognize that while a prior on <math>\phi</math> can be revised in light of the data, a prior on <math>\theta</math> cannot, due to the lack of point identification.
They recognize that while a prior on <math>\phi</math> can be revised in light of the data, a prior on <math>\theta</math> cannot, due to the lack of point identification.
As such they propose to choose a single prior for the revisable parameters, and a set of priors for the unrevisable ones.
As such they propose to choose a single prior for the revisable parameters, and a set of priors for the unrevisable ones.
The latter is the collection of priors such that the distribution of <math>\theta|\phi</math> places probability one on <math>\idr{\theta}</math>.  
The latter is the collection of priors such that the distribution of <math>\theta|\phi</math> places probability one on <math>\idr{\theta}</math>.  
A crucial observation in <ref name="kit:gia18"></ref> is that once <math>\phi</math> is viewed as a random vector, as in the Bayesian paradigm, under mild regularity conditions <math>\idr{\theta}</math> is a random closed set, and Bayesian inference on it can be carried out using elements of random set theory.
A crucial observation in <ref name="kit:gia18"/> is that once <math>\phi</math> is viewed as a random vector, as in the Bayesian paradigm, under mild regularity conditions <math>\idr{\theta}</math> is a random closed set, and Bayesian inference on it can be carried out using elements of random set theory.
In particular, they show that the set of posterior means of <math>\theta|\ew</math> equals the Aumann expectation of <math>\idr{\theta}</math> (with the underlying probability measure of <math>\phi|\ew</math>).
In particular, they show that the set of posterior means of <math>\theta|\ew</math> equals the Aumann expectation of <math>\idr{\theta}</math> (with the underlying probability measure of <math>\phi|\ew</math>).
They also show that this Aumann expectation converges in Hausdorff distance to the “true” identified set if the latter is convex, or otherwise to its convex hull.  
They also show that this Aumann expectation converges in Hausdorff distance to the “true” identified set if the latter is convex, or otherwise to its convex hull.  
Line 426: Line 436:
Frequentist inference for impulse response functions in Structural Vector Autoregression models is carried out, e.g., in {{ref|name=gra:moo:sch18}} and {{ref|name=gaf:mei:mon18}}.
Frequentist inference for impulse response functions in Structural Vector Autoregression models is carried out, e.g., in {{ref|name=gra:moo:sch18}} and {{ref|name=gaf:mei:mon18}}.
</ref>
</ref>
\begin{BI}
 
<ref name="ber:mol08"></ref> show that elements of random set theory can be employed to obtain inference methods for partially identified models that are easy to implement and have desirable statistical properties.
'''Key Insight:'''
<i>
<ref name="ber:mol08"/> show that elements of random set theory can be employed to obtain inference methods for partially identified models that are easy to implement and have desirable statistical properties.
Whereas they apply their findings to a specific class of models based on the Aumann expectation, the ensuing literature demonstrates that random set methods are widely applicable to obtain estimators of sharp identification regions and establish their consistency.
Whereas they apply their findings to a specific class of models based on the Aumann expectation, the ensuing literature demonstrates that random set methods are widely applicable to obtain estimators of sharp identification regions and establish their consistency.
\end{BI}
</i>
<ref name="che:lee:ros13"></ref> propose an alternative to the notion of consistent estimator.
 
<ref name="che:lee:ros13"><span style="font-variant-caps:small-caps">Chernozhukov, V., S.Lee,  <span style="font-variant-caps:normal">and</span> A.M. Rosen</span>  (2013):  “Intersection Bounds: estimation and inference” ''Econometrica'',  81(2), 667--737.</ref> propose an alternative to the notion of consistent estimator.
Rather than asking that <math>\idrn{\theta}</math> satisfies the requirement in [[#def:consistent_estimator |Definition]], they propose the notion of ’'half-median-unbiased'' estimator.
Rather than asking that <math>\idrn{\theta}</math> satisfies the requirement in [[#def:consistent_estimator |Definition]], they propose the notion of ’'half-median-unbiased'' estimator.
This notion is easiest to explain in the case of interval identified scalar parameters.
This notion is easiest to explain in the case of interval identified scalar parameters.
Line 444: Line 457:
where <math>c_{1/2}(\vartheta)</math> is a critical value chosen so that <math>\idrn{\theta}</math> asymptotically contains <math>\idr{\theta}</math> (or any fixed element in <math>\idr{\theta}</math>; see the discussion in Section [[#subsub:set:or:point:inference |Coverage of $\idr{\theta}$ vs. Coverage of $\theta$]] below) with at least probability <math>1/2</math>.
where <math>c_{1/2}(\vartheta)</math> is a critical value chosen so that <math>\idrn{\theta}</math> asymptotically contains <math>\idr{\theta}</math> (or any fixed element in <math>\idr{\theta}</math>; see the discussion in Section [[#subsub:set:or:point:inference |Coverage of $\idr{\theta}$ vs. Coverage of $\theta$]] below) with at least probability <math>1/2</math>.
As discussed in the next section, <math>c_{1/2}(\vartheta)</math> can be further chosen so that this probability is uniform over <math>\sP\in\cP</math>.
As discussed in the next section, <math>c_{1/2}(\vartheta)</math> can be further chosen so that this probability is uniform over <math>\sP\in\cP</math>.
The requirement of half-median unbiasedness has the virtue that, by construction, an estimator such as \eqref{eq:idrn:half:med:unb} is a subset of a <math>1-\alpha</math> confidence set as defined in \eqref{eq:CS} below for any <math>\alpha < 1/2</math>, provided <math>c_{1-\alpha}(\vartheta)</math> is chosen using the same criterion for all <math>\alpha\in(0,1)</math>.
The requirement of half-median unbiasedness has the virtue that, by construction, an estimator such as \eqref{eq:idrn:half:med:unb} is a subset of a <math>1-\alpha</math> confidence set as defined in \eqref{eq:CS} below for any <math>\alpha < 1/2</math>, provided <math>c_{1-\alpha}(\vartheta)</math> is chosen using the same criterion for all <math>\alpha\in(0,1)</math>.
In contrast, a consistent estimator satisfying the requirement in [[#def:consistent_estimator |Definition]] needs not be a subset of a confidence set.
In contrast, a consistent estimator satisfying the requirement in [[#def:consistent_estimator |Definition]] needs not be a subset of a confidence set.
Line 449: Line 463:
Moreover, choice of the sequence <math>\tau_n</math> is not data driven, and hence can be viewed as arbitrary.
Moreover, choice of the sequence <math>\tau_n</math> is not data driven, and hence can be viewed as arbitrary.
This raises a concern for the scope of consistent estimation in general settings.
This raises a concern for the scope of consistent estimation in general settings.
However, reporting a set estimator together with a confidence set is arguably important to shed light on how much of the volume of the confidence set is due to statistical uncertainty and how much is due to a large identified set.
However, reporting a set estimator together with a confidence set is arguably important to shed light on how much of the volume of the confidence set is due to statistical uncertainty and how much is due to a large identified set.
One can do so by either using a half-median unbiased estimator as in \eqref{eq:idrn:half:med:unb}, or the set of minimizers of the criterion function in \eqref{eq:define:idrn} with <math>\tau_n=0</math> (which, as previously discussed, satisfies the inner consistency requirement in \eqref{eq:inner_consistent} under weak conditions, and is Hausdorff consistent in some well behaved cases).
One can do so by either using a half-median unbiased estimator as in \eqref{eq:idrn:half:med:unb}, or the set of minimizers of the criterion function in \eqref{eq:define:idrn} with <math>\tau_n=0</math> (which, as previously discussed, satisfies the inner consistency requirement in \eqref{eq:inner_consistent} under weak conditions, and is Hausdorff consistent in some well behaved cases).


===<span id="subsec:CS"></span>Confidence Sets Satisfying Various Coverage Notions===
==<span id="subsec:CS"></span>Confidence Sets Satisfying Various Coverage Notions==
====<span id="subsub:set:or:point:inference"></span>Coverage of <math>\idr{\theta}</math> vs. Coverage of <math>\theta</math>====
===<span id="subsub:set:or:point:inference"></span>Coverage of <math>\idr{\theta}</math> vs. Coverage of <math>\theta</math>===
I first discuss confidence sets <math>\CS\subset\R^d</math> defined as level sets of a criterion function.
I first discuss confidence sets <math>\CS\subset\R^d</math> defined as level sets of a criterion function.
To simplify notation, henceforth I assume <math>a_n=n</math>.
To simplify notation, henceforth I assume <math>a_n=n</math>.
Line 465: Line 480:
It is chosen to that <math>\CS</math> satisfies (asymptotically) a certain coverage property with respect to either <math>\idr{\theta}</math> or each <math>\vartheta\in\idr{\theta}</math>.
It is chosen to that <math>\CS</math> satisfies (asymptotically) a certain coverage property with respect to either <math>\idr{\theta}</math> or each <math>\vartheta\in\idr{\theta}</math>.
Correspondingly, different appearances of <math>c_{1-\alpha}(\vartheta)</math> may refer to different critical values associated with different coverage notions.
Correspondingly, different appearances of <math>c_{1-\alpha}(\vartheta)</math> may refer to different critical values associated with different coverage notions.
The challenging theoretical aspect of inference in partial identification is the determination of <math>c_{1-\alpha}</math> and of methods to approximate it.\medskip
The challenging theoretical aspect of inference in partial identification is the determination of <math>c_{1-\alpha}</math> and of methods to approximate it.
 
A first classification of coverage notions pertains to whether the confidence set should cover <math>\idr{\theta}</math> or each of its elements with a prespecified asymptotic probability.
A first classification of coverage notions pertains to whether the confidence set should cover <math>\idr{\theta}</math> or each of its elements with a prespecified asymptotic probability.
Early on, within the study of interval-identified parameters, <ref name="hor:man98"></ref><ref name="hor:man00"></ref> put forward a confidence interval that expands each of the sample analogs of the extreme points of the population bounds by an amount designed so that the confidence interval asymptotically covers the population bounds with prespecified probability.
Early on, within the study of interval-identified parameters, <ref name="hor:man98"><span style="font-variant-caps:small-caps">Horowitz, J.L.,  <span style="font-variant-caps:normal">and</span> C.F. Manski</span>  (1998): “Censoring of outcomes and regressors due to  survey nonresponse: Identification and estimation using weights and  imputations” ''Journal of Econometrics'', 84(1), 37 -- 58.</ref><ref name="hor:man00"><span style="font-variant-caps:small-caps">Horowitz, J.L.,  <span style="font-variant-caps:normal">and</span> C.F. Manski</span>  (2000): “Nonparametric Analysis of Randomized Experiments  with Missing Covariate and Outcome Data” ''Journal of the American  Statistical Association'', 95(449), 77--84.</ref> put forward a confidence interval that expands each of the sample analogs of the extreme points of the population bounds by an amount designed so that the confidence interval asymptotically covers the population bounds with prespecified probability.
<ref name="che:hon:tam07"></ref> study the general problem of inference for a set <math>\idr{\theta}</math> defined as the zero-level set of a criterion function.
<ref name="che:hon:tam07"/> study the general problem of inference for a set <math>\idr{\theta}</math> defined as the zero-level set of a criterion function.
The coverage notion that they propose is ''pointwise coverage of the set'', whereby <math>c_{1-\alpha}</math> is chosen so that:
The coverage notion that they propose is ''pointwise coverage of the set'', whereby <math>c_{1-\alpha}</math> is chosen so that:


Line 476: Line 492:
\end{align}
\end{align}
</math>
</math>
<ref name="che:hon:tam07"></ref> provide conditions under which <math>\CS</math> satisfies \eqref{eq:CS_coverage:set:pw} with <math>c_{1-\alpha}</math> constant in <math>\vartheta</math>, yielding the so called ''criterion function approach'' to statistical inference in partial identification.
<ref name="che:hon:tam07"/> provide conditions under which <math>\CS</math> satisfies \eqref{eq:CS_coverage:set:pw} with <math>c_{1-\alpha}</math> constant in <math>\vartheta</math>, yielding the so called ''criterion function approach'' to statistical inference in partial identification.
Under the same coverage requirement, <ref name="bug10"></ref> and <ref name="gal:hen13"></ref> introduce novel bootstrap methods for inference in moment inequality models.
Under the same coverage requirement, <ref name="bug10"><span style="font-variant-caps:small-caps">Bugni, F.A.</span>  (2010): “Bootstrap inference in partially identified  models defined by moment inequalities: coverage of the identified set”  ''Econometrica'', 78(2), 735--753.</ref> and <ref name="gal:hen13"><span style="font-variant-caps:small-caps">Galichon, A.,  <span style="font-variant-caps:normal">and</span> M.Henry</span>  (2013): “Dilation bootstrap” ''Journal of  Econometrics'', 177(1), 109 -- 115.</ref> introduce novel bootstrap methods for inference in moment inequality models.
<ref name="hen:mea:que15"></ref> propose an inference method for finite games of complete information that exploits the structure of these models.
<ref name="hen:mea:que15"><span style="font-variant-caps:small-caps">Henry, M., R.Méango,  <span style="font-variant-caps:normal">and</span> M.Queyranne</span>  (2015):  “Combinatorial approach to inference in partially identified incomplete  structural models” ''Quantitative Economics'', 6(2), 499--529.</ref> propose an inference method for finite games of complete information that exploits the structure of these models.
<ref name="ber:mol08"></ref> propose a method to test hypotheses and build confidence sets satisfying \eqref{eq:CS_coverage:set:pw} based on random set theory, the so called ''support function approach'', which yields simple to compute confidence sets with asymptotic coverage equal to <math>1-\alpha</math> when <math>\idr{\theta}</math> is strictly convex.
<ref name="ber:mol08"/> propose a method to test hypotheses and build confidence sets satisfying \eqref{eq:CS_coverage:set:pw} based on random set theory, the so called ''support function approach'', which yields simple to compute confidence sets with asymptotic coverage equal to <math>1-\alpha</math> when <math>\idr{\theta}</math> is strictly convex.
The reason for the strict convexity requirement is that in its absence, the support function of <math>\idr{\theta}</math> is not fully differentiable, but only directionally differentiable, complicating inference.
The reason for the strict convexity requirement is that in its absence, the support function of <math>\idr{\theta}</math> is not fully differentiable, but only directionally differentiable, complicating inference.
Indeed, <ref name="fan:san18"></ref> show that standard bootstrap methods are consistent if and only if full differentiability holds, and they provide modified bootstrap methods that remain valid when only directional differentiability holds.
Indeed, <ref name="fan:san18"><span style="font-variant-caps:small-caps">Fang, Z.,  <span style="font-variant-caps:normal">and</span> A.Santos</span>  (2018): “{Inference on  Directionally Differentiable Functions}” ''The Review of Economic  Studies'', 86(1), 377--412.</ref> show that standard bootstrap methods are consistent if and only if full differentiability holds, and they provide modified bootstrap methods that remain valid when only directional differentiability holds.
<ref name="cha:che:mol:sch18"></ref> propose a data jittering method that enforces full differentiability at the price of a small conservative distortion.
<ref name="cha:che:mol:sch18"/> propose a data jittering method that enforces full differentiability at the price of a small conservative distortion.
<ref name="kai:san14"></ref> extend the applicability of the support function approach to other moment inequality models and establish efficiency results.
<ref name="kai:san14"/> extend the applicability of the support function approach to other moment inequality models and establish efficiency results.
<ref name="che:koc:men15"></ref> show that an Hausdorff distance-based test statistic can be weighted to enforce either exact or first-order equivariance to transformations of parameters.
<ref name="che:koc:men15"/> show that an Hausdorff distance-based test statistic can be weighted to enforce either exact or first-order equivariance to transformations of parameters.
<ref name="adu:ots16"></ref> provide empirical likelihood based inference methods for the support function approach.
<ref name="adu:ots16"><span style="font-variant-caps:small-caps">Adusumilli, K.,  <span style="font-variant-caps:normal">and</span> T.Otsu</span>  (2017): “{Empirical Likelihood  for Random Sets}” ''Journal of the American Statistical Association'',  112(519), 1064--1075.</ref> provide empirical likelihood based inference methods for the support function approach.
The test statistics employed in the criterion function approach and in the support function approach are asymptotically equivalent in specific moment inequality models <ref name="ber:mol08"></ref><ref name="kai16"></ref>, but the criterion function approach is more broadly applicable.
The test statistics employed in the criterion function approach and in the support function approach are asymptotically equivalent in specific moment inequality models <ref name="ber:mol08"/><ref name="kai16"/>, but the criterion function approach is more broadly applicable.
\medskip
 
The field's interest changed to a different notion of coverage when <ref name="imb:man04"></ref> pointed out that often there is one “true” data generating <math>\theta</math>, even if it is only partially identified.
 
The field's interest changed to a different notion of coverage when <ref name="imb:man04"><span style="font-variant-caps:small-caps">Imbens, G.W.,  <span style="font-variant-caps:normal">and</span> C.F. Manski</span>  (2004): “Confidence  Intervals for Partially Identified Parameters” ''Econometrica'', 72(6),  1845--1857.</ref> pointed out that often there is one “true” data generating <math>\theta</math>, even if it is only partially identified.
Hence, they proposed confidence sets that cover each <math>\vartheta\in\idr{\theta}</math> with a prespecified probability.
Hence, they proposed confidence sets that cover each <math>\vartheta\in\idr{\theta}</math> with a prespecified probability.
For pointwise coverage, this leads to choosing <math>c_{1-\alpha}</math> so that:
For pointwise coverage, this leads to choosing <math>c_{1-\alpha}</math> so that:
Line 498: Line 515:
</math>
</math>
If <math>\idr{\theta}</math> is a singleton then \eqref{eq:CS_coverage:set:pw} and \eqref{eq:CS_coverage:point:pw} both coincide with the pointwise coverage requirement employed for point identified parameters.
If <math>\idr{\theta}</math> is a singleton then \eqref{eq:CS_coverage:set:pw} and \eqref{eq:CS_coverage:point:pw} both coincide with the pointwise coverage requirement employed for point identified parameters.
However, as shown in <ref name="imb:man04"></ref>{{rp|at=Lemma 1}}, if <math>\idr{\theta}</math> contains more than one element, the two notions differ, with confidence sets satisfying \eqref{eq:CS_coverage:point:pw} being weakly smaller than ones satisfying \eqref{eq:CS_coverage:set:pw}.
However, as shown in <ref name="imb:man04"/>{{rp|at=Lemma 1}}, if <math>\idr{\theta}</math> contains more than one element, the two notions differ, with confidence sets satisfying \eqref{eq:CS_coverage:point:pw} being weakly smaller than ones satisfying \eqref{eq:CS_coverage:set:pw}.
<ref name="ros08"></ref> provides confidence sets for general moment (in)equalities models that satisfy \eqref{eq:CS_coverage:point:pw} and are easy to compute.
<ref name="ros08"/> provides confidence sets for general moment (in)equalities models that satisfy \eqref{eq:CS_coverage:point:pw} and are easy to compute.
Although confidence sets that take each <math>\vartheta\in\idr{\theta}</math> as the object of interest (and which satisfy the ’'uniform coverage'' requirements described in Section [[#subsub:uniform:inference |Pointwise vs. Uniform Coverage]] below) have received the most attention in the literature on inference in partially identified models, this choice merits some words of caution.
Although confidence sets that take each <math>\vartheta\in\idr{\theta}</math> as the object of interest (and which satisfy the ’'uniform coverage'' requirements described in Section [[#subsub:uniform:inference |Pointwise vs. Uniform Coverage]] below) have received the most attention in the literature on inference in partially identified models, this choice merits some words of caution.
First, <ref name="hen:ona12"></ref> point out that if confidence sets are to be used for decision making, a policymaker concerned with robust decisions might prefer ones satisfying \eqref{eq:CS_coverage:set:pw} (respectively, \eqref{eq:CS_coverage:set} below once uniformity is taken into account) to ones satisfying \eqref{eq:CS_coverage:point:pw} (respectively, \eqref{eq:CS_coverage:point} below with uniformity).
First, <ref name="hen:ona12"><span style="font-variant-caps:small-caps">Henry, M.,  <span style="font-variant-caps:normal">and</span> A.Onatski</span>  (2012): “Set coverage and robust  policy” ''Economics Letters'', 115(2), 256 -- 257.</ref> point out that if confidence sets are to be used for decision making, a policymaker concerned with robust decisions might prefer ones satisfying \eqref{eq:CS_coverage:set:pw} (respectively, \eqref{eq:CS_coverage:set} below once uniformity is taken into account) to ones satisfying \eqref{eq:CS_coverage:point:pw} (respectively, \eqref{eq:CS_coverage:point} below with uniformity).
Second, while in many applications a “true” data generating <math>\theta</math> exists, in others it does not.
Second, while in many applications a “true” data generating <math>\theta</math> exists, in others it does not.
For example, <ref name="man:mol10"></ref> and <ref name="giu:man:mol19"></ref> query survey respondents (in the American Life Panel and in the Health and Retirement Study, respectively) about their subjective beliefs on the probability chance of future events.
For example, <ref name="man:mol10"><span style="font-variant-caps:small-caps">Manski, C.F.,  <span style="font-variant-caps:normal">and</span> F.Molinari</span>  (2010): “Rounding  Probabilistic Expectations in Surveys” ''Journal of Business and  Economic Statistics'', 28(2), 219--231.</ref> and <ref name="giu:man:mol19"><span style="font-variant-caps:small-caps">Giustinelli, P., C.F. Manski,  <span style="font-variant-caps:normal">and</span> F.Molinari</span>  (2019a):  “Precise or Imprecise Probabilities? Evidence from survey response on  dementia and long-term care” NBER Working Paper 26125, available at  [https://www.nber.org/papers/w26125 https://www.nber.org/papers/w26125].</ref> query survey respondents (in the American Life Panel and in the Health and Retirement Study, respectively) about their subjective beliefs on the probability chance of future events.
A large fraction of these respondents, when given the possibility to do so, report imprecise beliefs in the form of intervals.  
A large fraction of these respondents, when given the possibility to do so, report imprecise beliefs in the form of intervals.  
In this case, there is no “true” point-valued belief: the “truth” is interval-valued.
In this case, there is no “true” point-valued belief: the “truth” is interval-valued.
If one is interested in (say) average beliefs, the sharp identification region is the (Aumann) expectation of the reported intervals, and the appropriate coverage requirement for a confidence set is that in \eqref{eq:CS_coverage:set:pw} (respectively, \eqref{eq:CS_coverage:set} below with uniformity).
If one is interested in (say) average beliefs, the sharp identification region is the (Aumann) expectation of the reported intervals, and the appropriate coverage requirement for a confidence set is that in \eqref{eq:CS_coverage:set:pw} (respectively, \eqref{eq:CS_coverage:set} below with uniformity).


====<span id="subsub:uniform:inference"></span>Pointwise vs. Uniform Coverage====
===<span id="subsub:uniform:inference"></span>Pointwise vs. Uniform Coverage===
In the context of interval identified parameters, such as, e.g., the mean with missing data in Theorem [[guide:Ec36399528#SIR:prob:E:md |SIR-]] with <math>\theta\in\R</math>, <ref name="imb:man04"></ref> pointed out that extra care should be taken in the construction of confidence sets for partially identified parameters, as otherwise they may be asymptotically valid only pointwise (in the distribution of the observed data) over relevant classes of distributions.<ref group="Notes" >This discussion draws on many conversations with J\"{o}rg Stoye, as well as on notes that he shared with me, for which I thank him.</ref>  
In the context of interval identified parameters, such as, e.g., the mean with missing data in Theorem [[guide:Ec36399528#SIR:prob:E:md |SIR-]] with <math>\theta\in\R</math>, <ref name="imb:man04"/> pointed out that extra care should be taken in the construction of confidence sets for partially identified parameters, as otherwise they may be asymptotically valid only pointwise (in the distribution of the observed data) over relevant classes of distributions.<ref group="Notes" >This discussion draws on many conversations with J\"{o}rg Stoye, as well as on notes that he shared with me, for which I thank him.</ref>  
For example, consider a confidence interval that expands each of the sample analogs of the extreme points of the population bounds by a one-sided critical value.
For example, consider a confidence interval that expands each of the sample analogs of the extreme points of the population bounds by a one-sided critical value.
This confidence interval controls the asymptotic coverage probability pointwise for any DGP at which the width of the population bounds is positive.
This confidence interval controls the asymptotic coverage probability pointwise for any DGP at which the width of the population bounds is positive.
Line 525: Line 542:
and <math>c_{1-\alpha}</math> is chosen accordingly, to obtain either \eqref{eq:CS_coverage:set} or \eqref{eq:CS_coverage:point}.
and <math>c_{1-\alpha}</math> is chosen accordingly, to obtain either \eqref{eq:CS_coverage:set} or \eqref{eq:CS_coverage:point}.
Sets satisfying \eqref{eq:CS_coverage:set} are referred to as confidence regions for <math>\idr{\theta}</math> that are uniformly consistent in level (over <math>\sP\in\cP</math>).
Sets satisfying \eqref{eq:CS_coverage:set} are referred to as confidence regions for <math>\idr{\theta}</math> that are uniformly consistent in level (over <math>\sP\in\cP</math>).
<ref name="rom:sha10"></ref> propose such confidence regions, study their properties, and provide a step-down procedure to obtain them.
<ref name="rom:sha10"/> propose such confidence regions, study their properties, and provide a step-down procedure to obtain them.
<ref name="che:chr:tam18"></ref> propose confidence sets that are contour sets of criterion functions using cutoffs that are computed via Monte Carlo simulations from the quasi‐posterior distribution of the criterion and satisfy the coverage requirement in \eqref{eq:CS_coverage:set}.
<ref name="che:chr:tam18"><span style="font-variant-caps:small-caps">Chen, X., T.M. Christensen,  <span style="font-variant-caps:normal">and</span> E.Tamer</span>  (2018): “MCMC  Confidence Sets for Identified Sets” ''Econometrica'', 86(6),  1965--2018.</ref> propose confidence sets that are contour sets of criterion functions using cutoffs that are computed via Monte Carlo simulations from the quasi‐posterior distribution of the criterion and satisfy the coverage requirement in \eqref{eq:CS_coverage:set}.
They recommend the use of a Sequential Monte Carlo algorithm that works well also when the quasi-posterior is irregular and multi-modal.
They recommend the use of a Sequential Monte Carlo algorithm that works well also when the quasi-posterior is irregular and multi-modal.
They establish exact asymptotic coverage, non-trivial local power, and validity of their procedure in point identified and partially identified regular models, and validity in irregular models (e.g., in models where the reduced form parameters are on the boundary of the parameter space).
They establish exact asymptotic coverage, non-trivial local power, and validity of their procedure in point identified and partially identified regular models, and validity in irregular models (e.g., in models where the reduced form parameters are on the boundary of the parameter space).
Line 532: Line 549:


Sets satisfying \eqref{eq:CS_coverage:point} are referred to as confidence regions for points in <math>\idr{\theta}</math> that are uniformly consistent in level (over <math>\sP\in\cP</math>).
Sets satisfying \eqref{eq:CS_coverage:point} are referred to as confidence regions for points in <math>\idr{\theta}</math> that are uniformly consistent in level (over <math>\sP\in\cP</math>).
Within the framework of <ref name="imb:man04"></ref>, <ref name="sto09"></ref> shows that one can obtain a confidence interval satisfying \eqref{eq:CS_coverage:point} by pre-testing whether the lower and upper population bounds are sufficiently close to each other.
Within the framework of <ref name="imb:man04"/>, <ref name="sto09"><span style="font-variant-caps:small-caps">Stoye, J.</span>  (2009): “More on Confidence Intervals for Partially  Identified Parameters” ''Econometrica'', 77(4), 1299--1315.</ref> shows that one can obtain a confidence interval satisfying \eqref{eq:CS_coverage:point} by pre-testing whether the lower and upper population bounds are sufficiently close to each other.
If so, the confidence interval expands each of the sample analogs of the extreme points of the population bounds by a two-sided critical value; otherwise, by a one-sided.
If so, the confidence interval expands each of the sample analogs of the extreme points of the population bounds by a two-sided critical value; otherwise, by a one-sided.
<ref name="sto09"></ref> provides important insights clarifying the connection between superefficient (i.e., faster than <math>O_p(1/\sqrt{n})</math>) estimation of the width of the population bounds when it equals zero, and certain challenges in <ref name="imb:man04"></ref>'s proposed method.<ref group="Notes" >Indeed, the confidence interval proposed by {{ref|name=sto09}} can be thought of as using a Hodges-type shrinkage estimator (see, e.g., {{ref|name=van97}}) for the width of the population bounds.</ref>
<ref name="sto09"/> provides important insights clarifying the connection between superefficient (i.e., faster than <math>O_p(1/\sqrt{n})</math>) estimation of the width of the population bounds when it equals zero, and certain challenges in <ref name="imb:man04"/>'s proposed method.<ref group="Notes" >Indeed, the confidence interval proposed by {{ref|name=sto09}} can be thought of as using a Hodges-type shrinkage estimator (see, e.g., {{ref|name=van97}}) for the width of the population bounds.</ref>
<ref name="bon:mag:mau12"></ref> leverage <ref name="sto09"></ref>'s results to obtain confidence sets satisfying \eqref{eq:CS_coverage:point} using the support function approach for set identified linear models.
<ref name="bon:mag:mau12"/> leverage <ref name="sto09"/>'s results to obtain confidence sets satisfying \eqref{eq:CS_coverage:point} using the support function approach for set identified linear models.
Obtaining confidence sets that satisfy the requirement in \eqref{eq:CS_coverage:point} becomes substantially more complex in the context of general moment (in)equalities models.
Obtaining confidence sets that satisfy the requirement in \eqref{eq:CS_coverage:point} becomes substantially more complex in the context of general moment (in)equalities models.
One of the key challenges to uniform inference stems from the fact that the behavior of the limit distribution of the test statistic depends on <math>\sqrt{n}\E_\sP(m_j(\ew_i;\vartheta)),j=1,\dots,|\cJ|</math>, which cannot be consistently estimated.
One of the key challenges to uniform inference stems from the fact that the behavior of the limit distribution of the test statistic depends on <math>\sqrt{n}\E_\sP(m_j(\ew_i;\vartheta)),j=1,\dots,|\cJ|</math>, which cannot be consistently estimated.
<ref name="rom:sha08"></ref><ref name="and:gug09b"></ref><ref name="and:soa10"></ref><ref name="can10"></ref><ref name="and:bar12"></ref><ref name="rom:sha:wol14"></ref>, among others, make significant contributions to circumvent these difficulties in the context of a finite number of unconditional moment (in)equalities.
<ref name="rom:sha08"/><ref name="and:gug09b"/><ref name="and:soa10"/><ref name="can10"/><ref name="and:bar12"><span style="font-variant-caps:small-caps">Andrews, D. W.K.,  <span style="font-variant-caps:normal">and</span> P.J. Barwick</span>  (2012): “Inference  for parameters defined by moment inequalities: a recommended moment selection  procedure” ''Econometrica'', 80(6), 2805--2826.</ref><ref name="rom:sha:wol14"><span style="font-variant-caps:small-caps">Romano, J.P., A.M. Shaikh,  <span style="font-variant-caps:normal">and</span> M.Wolf</span>  (2014): “A  practical two-step method for testing moment inequalities”  ''Econometrica'', 82(5), 1979--2002.</ref>, among others, make significant contributions to circumvent these difficulties in the context of a finite number of unconditional moment (in)equalities.
<ref name="and:shi13"></ref><ref name="che:lee:ros13"></ref><ref name="lee:son:wha13"></ref><ref name="arm14b"></ref><ref name="arm15"></ref><ref name="arm:cha16"></ref><ref name="che18"></ref>, among others, make significant contributions to circumvent these difficulties in the context of a finite number of conditional moment (in)equalities (with continuously distributed conditioning variables).
<ref name="and:shi13"><span style="font-variant-caps:small-caps">Andrews, D. W.K.,  <span style="font-variant-caps:normal">and</span> X.Shi</span>  (2013): “Inference based on  conditional moment inequalities” ''Econometrica'', 81(2), 609--666.</ref><ref name="che:lee:ros13"/><ref name="lee:son:wha13"><span style="font-variant-caps:small-caps">Lee, S., K.Song,  <span style="font-variant-caps:normal">and</span> Y.-J. Whang</span>  (2013): “Testing  functional inequalities” ''Journal of Econometrics'', 172(1), 14 -- 32.</ref><ref name="arm14b"/><ref name="arm15"/><ref name="arm:cha16"><span style="font-variant-caps:small-caps">Armstrong, T.B.,  <span style="font-variant-caps:normal">and</span> H.P. Chan</span>  (2016): “Multiscale  adaptive inference on conditional moment inequalities” ''Journal of  Econometrics'', 194(1), 24 -- 43.</ref><ref name="che18"><span style="font-variant-caps:small-caps">Chetverikov, D.</span>  (2018): “{Adaptive Test of Conditional Moment  Inequalities}” ''Econometric Theory'', 34(1), 186–227.</ref>, among others, make significant contributions to circumvent these difficulties in the context of a finite number of conditional moment (in)equalities (with continuously distributed conditioning variables).
<ref name="che:che:kat18"></ref> and <ref name="and:shi17"></ref> study, respectively, the challenging frameworks where the number of moment inequalities grows with sample size and where there is a continuum of conditional moment inequalities.
<ref name="che:che:kat18"><span style="font-variant-caps:small-caps">Chernozhukov, V., D.Chetverikov,  <span style="font-variant-caps:normal">and</span> K.Kato</span>  (2018):  “Inference on causal and structural parameters using many moment  inequalities” ''Review of Economic Studies'', forthcoming, available at  [https://doi.org/10.1093/restud/rdy065 https://doi.org/10.1093/restud/rdy065].</ref> and <ref name="and:shi17"><span style="font-variant-caps:small-caps">Andrews, D. W.K.,  <span style="font-variant-caps:normal">and</span> X.Shi</span>  (2017): “Inference based on many conditional moment  inequalities” ''Journal of Econometrics'', 196(2), 275 -- 287.</ref> study, respectively, the challenging frameworks where the number of moment inequalities grows with sample size and where there is a continuum of conditional moment inequalities.
I refer to <ref name="can:sha17"></ref>{{rp|at=Section 4}} for a thorough discussion of these methods and a comparison of their relative (de)merits (see also <ref name="bug:can:gug12"></ref><ref name="bug16"></ref>).
I refer to <ref name="can:sha17"/>{{rp|at=Section 4}} for a thorough discussion of these methods and a comparison of their relative (de)merits (see also <ref name="bug:can:gug12"><span style="font-variant-caps:small-caps">Bugni, F.A., I.A. Canay,  <span style="font-variant-caps:normal">and</span> P.Guggenberger</span>  (2012):  “Distortions of Asymptotic Confidence Size in Locally Misspecified Moment  Inequality Models” ''Econometrica'', 80(4), 1741--1768.</ref><ref name="bug16"><span style="font-variant-caps:small-caps">Bugni, F.A.</span>  (2016): “Comparison of inferential methods in partially  identifies models in terms of error in coverage probability”  ''Econometric Theory'', 32(1), 187–242.</ref>).
====<span id="subsubsec:proj:inference"></span>Coverage of the Vector <math>\theta</math> vs. Coverage of a Component of <math>\theta</math>====
===<span id="subsubsec:proj:inference"></span>Coverage of the Vector <math>\theta</math> vs. Coverage of a Component of <math>\theta</math>===
The coverage requirements in \eqref{eq:CS_coverage:set}-\eqref{eq:CS_coverage:point} refer to confidence sets in <math>\R^d</math> for the entire <math>\theta</math> or <math>\idr{\theta}</math>.
The coverage requirements in \eqref{eq:CS_coverage:set}-\eqref{eq:CS_coverage:point} refer to confidence sets in <math>\R^d</math> for the entire <math>\theta</math> or <math>\idr{\theta}</math>.
Often empirical researchers are interested in inference on a specific component or (smooth) function of <math>\theta</math> (e.g., the returns to education; the effect of market size on the probability of entry; the elasticity of demand for insurance to price, etc.).
Often empirical researchers are interested in inference on a specific component or (smooth) function of <math>\theta</math> (e.g., the returns to education; the effect of market size on the probability of entry; the elasticity of demand for insurance to price, etc.).
Line 557: Line 574:
The extent of the conservatism increases with the dimension of <math>\theta</math> and is easily appreciated in the case of a point identified parameter.
The extent of the conservatism increases with the dimension of <math>\theta</math> and is easily appreciated in the case of a point identified parameter.
Consider, for example, a linear regression in <math>\R^{10}</math>, and suppose for simplicity that the limiting covariance matrix of the estimator is the identity matrix.  
Consider, for example, a linear regression in <math>\R^{10}</math>, and suppose for simplicity that the limiting covariance matrix of the estimator is the identity matrix.  
Then a 95\% confidence interval for <math>u^\top\theta</math> is obtained by adding and subtracting <math>1.96</math> to that component's estimate.  
Then a 95% confidence interval for <math>u^\top\theta</math> is obtained by adding and subtracting <math>1.96</math> to that component's estimate.  
In contrast, projection of a 95\% confidence ellipsoid for <math>\theta</math> on each component amounts to adding and subtracting <math>4.28</math> to that component's estimate.  
In contrast, projection of a 95% confidence ellipsoid for <math>\theta</math> on each component amounts to adding and subtracting <math>4.28</math> to that component's estimate.  
It is therefore desirable to provide confidence intervals <math>\CI</math> specifically designed to cover <math>u^\top\theta</math> rather then the entire <math>\theta</math>.
It is therefore desirable to provide confidence intervals <math>\CI</math> specifically designed to cover <math>u^\top\theta</math> rather then the entire <math>\theta</math>.
Natural counterparts to \eqref{eq:CS_coverage:set}-\eqref{eq:CS_coverage:point} are
Natural counterparts to \eqref{eq:CS_coverage:set}-\eqref{eq:CS_coverage:point} are
Line 568: Line 585:
\end{align}
\end{align}
</math>
</math>
As shown in <ref name="ber:mol08"></ref> and <ref name="kai16"></ref> for the case of pointwise coverage, obtaining asymptotically valid confidence intervals is simple if the identified set is convex and one uses the support function approach.
As shown in <ref name="ber:mol08"/> and <ref name="kai16"/> for the case of pointwise coverage, obtaining asymptotically valid confidence intervals is simple if the identified set is convex and one uses the support function approach.
This is because it suffices to base the test statistic on the support function in direction <math>u</math>, and it is often possible to easily characterize the limiting distribution of this test statistic.
This is because it suffices to base the test statistic on the support function in direction <math>u</math>, and it is often possible to easily characterize the limiting distribution of this test statistic.
See <ref name="mol:mol18"></ref>{{rp|at=Chapters 4 and 5}} for details.
See <ref name="mol:mol18"/>{{rp|at=Chapters 4 and 5}} for details.
The task is significantly more complex in general moment inequality models when <math>\idr{\theta}</math> is non-convex and one wants to satisfy the criterion in \eqref{eq:CS_coverage:set:proj} or that in \eqref{eq:CS_coverage:point:proj}.
The task is significantly more complex in general moment inequality models when <math>\idr{\theta}</math> is non-convex and one wants to satisfy the criterion in \eqref{eq:CS_coverage:set:proj} or that in \eqref{eq:CS_coverage:point:proj}.
<ref name="rom:sha08"></ref> and <ref name="bug:can:shi17"></ref> propose confidence intervals of the form
<ref name="rom:sha08"/> and <ref name="bug:can:shi17"><span style="font-variant-caps:small-caps">Bugni, F.A., I.A. Canay,  <span style="font-variant-caps:normal">and</span> X.Shi</span>  (2017): “Inference for subvectors and other functions of  partially identified parameters in moment inequality models”  ''Quantitative Economics'', 8(1), 1--38.</ref> propose confidence intervals of the form


<math display="block">
<math display="block">
Line 583: Line 600:
An important idea in this proposal is that of ''profiling'' the test statistic <math>n\crit_n(\vartheta)</math> by minimizing it over all <math>\vartheta</math>s such that <math>u^\top\vartheta=s</math>.
An important idea in this proposal is that of ''profiling'' the test statistic <math>n\crit_n(\vartheta)</math> by minimizing it over all <math>\vartheta</math>s such that <math>u^\top\vartheta=s</math>.
One then includes in the confidence interval all values <math>s</math> for which the profiled test statistic's value is not too large.
One then includes in the confidence interval all values <math>s</math> for which the profiled test statistic's value is not too large.
<ref name="rom:sha08"></ref> propose the use of subsampling to obtain the critical value <math>c_{1-\alpha}(s)</math> and provide high-level conditions ensuring that \eqref{eq:CS_coverage:point:proj} holds.
<ref name="rom:sha08"/> propose the use of subsampling to obtain the critical value <math>c_{1-\alpha}(s)</math> and provide high-level conditions ensuring that \eqref{eq:CS_coverage:point:proj} holds.
<ref name="bug:can:shi17"></ref> substantially extend and improve the ''profiling approach'' by providing a bootstrap-based method to obtain <math>c_{1-\alpha}</math> so that \eqref{eq:CS_coverage:point:proj} holds.
<ref name="bug:can:shi17"/> substantially extend and improve the ''profiling approach'' by providing a bootstrap-based method to obtain <math>c_{1-\alpha}</math> so that \eqref{eq:CS_coverage:point:proj} holds.
Their method is more powerful than subsampling (for reasonable choices of subsample size).
Their method is more powerful than subsampling (for reasonable choices of subsample size).
<ref name="bel:bug:che18"></ref> further enlarge the domain of applicability of the profiling approach by proposing a method based on this approach that is asymptotically uniformly valid when the number of moment conditions is large, and can grow with the sample size, possibly at exponential rates.
<ref name="bel:bug:che18"><span style="font-variant-caps:small-caps">Belloni, A., F.A. Bugni,  <span style="font-variant-caps:normal">and</span> V.Chernozhukov</span>  (2018):  “Subvector inference in partially identified models with many moment  inequalities” available at [https://arxiv.org/abs/1806.11466 https://arxiv.org/abs/1806.11466].</ref> further enlarge the domain of applicability of the profiling approach by proposing a method based on this approach that is asymptotically uniformly valid when the number of moment conditions is large, and can grow with the sample size, possibly at exponential rates.
<ref name="kai:mol:sto19"></ref> propose a bootstrap-based ''calibrated projection approach'' where
<ref name="kai:mol:sto19"><span style="font-variant-caps:small-caps">{Kaido}, H., F.{Molinari},  <span style="font-variant-caps:normal">and</span> J.{Stoye}</span>  (2019a):  “{Confidence Intervals for Projections of Partially Identified  Parameters}” ''Econometrica'', 87(4), 1397--1432.</ref> propose a bootstrap-based ''calibrated projection approach'' where


<math display="block">
<math display="block">
Line 603: Line 620:
and <math>c_{1-\alpha}</math> a critical level function calibrated so that \eqref{eq:CS_coverage:point:proj} holds.
and <math>c_{1-\alpha}</math> a critical level function calibrated so that \eqref{eq:CS_coverage:point:proj} holds.
Compared to the simple projection of <math>\CS</math> mentioned at the beginning of Section [[#subsubsec:proj:inference |Coverage of the Vector $\theta$ vs. Coverage of a Component of $\theta$]], calibrated projection (weakly) reduces the value of <math>c_{1-\alpha}</math> so that the projection of <math>\theta</math>, rather than <math>\theta</math> itself, is asymptotically covered with the desired probability uniformly.
Compared to the simple projection of <math>\CS</math> mentioned at the beginning of Section [[#subsubsec:proj:inference |Coverage of the Vector $\theta$ vs. Coverage of a Component of $\theta$]], calibrated projection (weakly) reduces the value of <math>c_{1-\alpha}</math> so that the projection of <math>\theta</math>, rather than <math>\theta</math> itself, is asymptotically covered with the desired probability uniformly.
<ref name="che:chr:tam18"></ref> provide methods to build confidence intervals and confidence sets on projections of <math>\idr{\theta}</math> as contour sets of criterion functions using cutoffs that are computed via Monte Carlo simulations from the quasi‐posterior distribution of the criterion, and that satisfy the coverage requirement in \eqref{eq:CS_coverage:set:proj}.
<ref name="che:chr:tam18"/> provide methods to build confidence intervals and confidence sets on projections of <math>\idr{\theta}</math> as contour sets of criterion functions using cutoffs that are computed via Monte Carlo simulations from the quasi‐posterior distribution of the criterion, and that satisfy the coverage requirement in \eqref{eq:CS_coverage:set:proj}.
One of their procedures, designed specifically for scalar projections, delivers a confidence interval as the contour set of a profiled quasi-likelihood ratio with critical value equal to a quantile of the Chi-squared distribution with one degree of freedom.
One of their procedures, designed specifically for scalar projections, delivers a confidence interval as the contour set of a profiled quasi-likelihood ratio with critical value equal to a quantile of the Chi-squared distribution with one degree of freedom.
====A Brief Note on Bayesian Methods====
===A Brief Note on Bayesian Methods===
The confidence sets discussed in this section are based on the frequentist approach to inference.
The confidence sets discussed in this section are based on the frequentist approach to inference.
It is natural to ask whether in partially identified models, as in well behaved point identified models, one can build Bayesian credible sets that at least asymptotically coincide with frequentist confidence sets.
It is natural to ask whether in partially identified models, as in well behaved point identified models, one can build Bayesian credible sets that at least asymptotically coincide with frequentist confidence sets.
This question was first addressed by <ref name="moo:sch12"></ref>, with a negative answer for the case that the coverage in \eqref{eq:CS_coverage:point} is sought out.
This question was first addressed by <ref name="moo:sch12"><span style="font-variant-caps:small-caps">Moon, H.R.,  <span style="font-variant-caps:normal">and</span> F.Schorfheide</span>  (2012): “Bayesian and  frequentist inference in partially identified models” ''Econometrica'',  80(2), 755--782.</ref>, with a negative answer for the case that the coverage in \eqref{eq:CS_coverage:point} is sought out.
In particular, they showed that the resulting Bayesian credible sets are a subset of <math>\idr{\theta}</math>, and hence too narrow from the frequentist perspective.
In particular, they showed that the resulting Bayesian credible sets are a subset of <math>\idr{\theta}</math>, and hence too narrow from the frequentist perspective.
This discrepancy can be ameliorated when inference is sought out for <math>\idr{\theta}</math> rather than for each <math>\vartheta\in\idr{\theta}</math>.
This discrepancy can be ameliorated when inference is sought out for <math>\idr{\theta}</math> rather than for each <math>\vartheta\in\idr{\theta}</math>.
<ref name="nor:tan14"></ref>, <ref name="kli:tam16"></ref>, <ref name="kit:gia18"></ref>, and <ref name="lia:sim19"></ref> propose Bayesian credible regions that are valid for frequentist inference in the sense of \eqref{eq:CS_coverage:set:pw}, where the first two build on the criterion function approach and the second two on the support function approach.
<ref name="nor:tan14"><span style="font-variant-caps:small-caps">Norets, A.,  <span style="font-variant-caps:normal">and</span> X.Tang</span>  (2014): “{Semiparametric Inference  in Dynamic Binary Choice Models}” ''The Review of Economic Studies'',  81(3), 1229--1262.</ref>, <ref name="kli:tam16"><span style="font-variant-caps:small-caps">Kline, B.,  <span style="font-variant-caps:normal">and</span> E.Tamer</span>  (2016): “Bayesian inference in a class of partially  identified models” ''Quantitative Economics'', 7(2), 329--366.</ref>, <ref name="kit:gia18"/>, and <ref name="lia:sim19"><span style="font-variant-caps:small-caps">Liao, Y.,  <span style="font-variant-caps:normal">and</span> A.Simoni</span>  (2019): “Bayesian inference for  partially identified smooth convex models” ''Journal of Econometrics'',  211(2), 338 -- 360.</ref> propose Bayesian credible regions that are valid for frequentist inference in the sense of \eqref{eq:CS_coverage:set:pw}, where the first two build on the criterion function approach and the second two on the support function approach.
All these contributions rely on the model being separable, in the sense that it yields moment inequalities that can be written as the sum of a function of the data only, and a function of the model parameters only (as in, e.g., [[guide:521939d27a#eq:CT_00 |eq:CT_00]]-[[guide:521939d27a#eq:CT_01L |eq:CT_01L]]).
All these contributions rely on the model being separable, in the sense that it yields moment inequalities that can be written as the sum of a function of the data only, and a function of the model parameters only (as in, e.g., [[guide:521939d27a#eq:CT_00 |eq:CT_00]]-[[guide:521939d27a#eq:CT_01L |eq:CT_01L]]).
In these models, the function of the data only (the ''reduced form parameter'') is point identified, it is related to the structural parameters <math>\theta</math> through a known mapping, and under standard regularity conditions it can be <math>\sqrt{n}</math>-consistently estimated.
In these models, the function of the data only (the ''reduced form parameter'') is point identified, it is related to the structural parameters <math>\theta</math> through a known mapping, and under standard regularity conditions it can be <math>\sqrt{n}</math>-consistently estimated.

Latest revision as of 00:22, 20 June 2024

[math] \newcommand{\edis}{\stackrel{d}{=}} \newcommand{\fd}{\stackrel{f.d.}{\rightarrow}} \newcommand{\dom}{\operatorname{dom}} \newcommand{\eig}{\operatorname{eig}} \newcommand{\epi}{\operatorname{epi}} \newcommand{\lev}{\operatorname{lev}} \newcommand{\card}{\operatorname{card}} \newcommand{\comment}{\textcolor{Green}} \newcommand{\B}{\mathbb{B}} \newcommand{\C}{\mathbb{C}} \newcommand{\G}{\mathbb{G}} \newcommand{\M}{\mathbb{M}} \newcommand{\N}{\mathbb{N}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\T}{\mathbb{T}} \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\W}{\mathbb{W}} \newcommand{\bU}{\mathfrak{U}} \newcommand{\bu}{\mathfrak{u}} \newcommand{\bI}{\mathfrak{I}} \newcommand{\cA}{\mathcal{A}} \newcommand{\cB}{\mathcal{B}} \newcommand{\cC}{\mathcal{C}} \newcommand{\cD}{\mathcal{D}} \newcommand{\cE}{\mathcal{E}} \newcommand{\cF}{\mathcal{F}} \newcommand{\cG}{\mathcal{G}} \newcommand{\cg}{\mathcal{g}} \newcommand{\cH}{\mathcal{H}} \newcommand{\cI}{\mathcal{I}} \newcommand{\cJ}{\mathcal{J}} \newcommand{\cK}{\mathcal{K}} \newcommand{\cL}{\mathcal{L}} \newcommand{\cM}{\mathcal{M}} \newcommand{\cN}{\mathcal{N}} \newcommand{\cO}{\mathcal{O}} \newcommand{\cP}{\mathcal{P}} \newcommand{\cQ}{\mathcal{Q}} \newcommand{\cR}{\mathcal{R}} \newcommand{\cS}{\mathcal{S}} \newcommand{\cT}{\mathcal{T}} \newcommand{\cU}{\mathcal{U}} \newcommand{\cu}{\mathcal{u}} \newcommand{\cV}{\mathcal{V}} \newcommand{\cW}{\mathcal{W}} \newcommand{\cX}{\mathcal{X}} \newcommand{\cY}{\mathcal{Y}} \newcommand{\cZ}{\mathcal{Z}} \newcommand{\sF}{\mathsf{F}} \newcommand{\sM}{\mathsf{M}} \newcommand{\sG}{\mathsf{G}} \newcommand{\sT}{\mathsf{T}} \newcommand{\sB}{\mathsf{B}} \newcommand{\sC}{\mathsf{C}} \newcommand{\sP}{\mathsf{P}} \newcommand{\sQ}{\mathsf{Q}} \newcommand{\sq}{\mathsf{q}} \newcommand{\sR}{\mathsf{R}} \newcommand{\sS}{\mathsf{S}} \newcommand{\sd}{\mathsf{d}} \newcommand{\cp}{\mathsf{p}} \newcommand{\cc}{\mathsf{c}} \newcommand{\cf}{\mathsf{f}} \newcommand{\eU}{{\boldsymbol{U}}} \newcommand{\eb}{{\boldsymbol{b}}} \newcommand{\ed}{{\boldsymbol{d}}} \newcommand{\eu}{{\boldsymbol{u}}} \newcommand{\ew}{{\boldsymbol{w}}} \newcommand{\ep}{{\boldsymbol{p}}} \newcommand{\eX}{{\boldsymbol{X}}} \newcommand{\ex}{{\boldsymbol{x}}} \newcommand{\eY}{{\boldsymbol{Y}}} \newcommand{\eB}{{\boldsymbol{B}}} \newcommand{\eC}{{\boldsymbol{C}}} \newcommand{\eD}{{\boldsymbol{D}}} \newcommand{\eW}{{\boldsymbol{W}}} \newcommand{\eR}{{\boldsymbol{R}}} \newcommand{\eQ}{{\boldsymbol{Q}}} \newcommand{\eS}{{\boldsymbol{S}}} \newcommand{\eT}{{\boldsymbol{T}}} \newcommand{\eA}{{\boldsymbol{A}}} \newcommand{\eH}{{\boldsymbol{H}}} \newcommand{\ea}{{\boldsymbol{a}}} \newcommand{\ey}{{\boldsymbol{y}}} \newcommand{\eZ}{{\boldsymbol{Z}}} \newcommand{\eG}{{\boldsymbol{G}}} \newcommand{\ez}{{\boldsymbol{z}}} \newcommand{\es}{{\boldsymbol{s}}} \newcommand{\et}{{\boldsymbol{t}}} \newcommand{\ev}{{\boldsymbol{v}}} \newcommand{\ee}{{\boldsymbol{e}}} \newcommand{\eq}{{\boldsymbol{q}}} \newcommand{\bnu}{{\boldsymbol{\nu}}} \newcommand{\barX}{\overline{\eX}} \newcommand{\eps}{\varepsilon} \newcommand{\Eps}{\mathcal{E}} \newcommand{\carrier}{{\mathfrak{X}}} \newcommand{\Ball}{{\mathbb{B}}^{d}} \newcommand{\Sphere}{{\mathbb{S}}^{d-1}} \newcommand{\salg}{\mathfrak{F}} \newcommand{\ssalg}{\mathfrak{B}} \newcommand{\one}{\mathbf{1}} \newcommand{\Prob}[1]{\P\{#1\}} \newcommand{\yL}{\ey_{\mathrm{L}}} \newcommand{\yU}{\ey_{\mathrm{U}}} \newcommand{\yLi}{\ey_{\mathrm{L}i}} \newcommand{\yUi}{\ey_{\mathrm{U}i}} \newcommand{\xL}{\ex_{\mathrm{L}}} \newcommand{\xU}{\ex_{\mathrm{U}}} \newcommand{\vL}{\ev_{\mathrm{L}}} \newcommand{\vU}{\ev_{\mathrm{U}}} \newcommand{\dist}{\mathbf{d}} \newcommand{\rhoH}{\dist_{\mathrm{H}}} \newcommand{\ti}{\to\infty} \newcommand{\comp}[1]{#1^\mathrm{c}} \newcommand{\ThetaI}{\Theta_{\mathrm{I}}} \newcommand{\crit}{q} \newcommand{\CS}{CS_n} \newcommand{\CI}{CI_n} \newcommand{\cv}[1]{\hat{c}_{n,1-\alpha}(#1)} \newcommand{\idr}[1]{\mathcal{H}_\sP[#1]} \newcommand{\outr}[1]{\mathcal{O}_\sP[#1]} \newcommand{\idrn}[1]{\hat{\mathcal{H}}_{\sP_n}[#1]} \newcommand{\outrn}[1]{\mathcal{O}_{\sP_n}[#1]} \newcommand{\email}[1]{\texttt{#1}} \newcommand{\possessivecite}[1]{\ltref name="#1"\gt\lt/ref\gt's \citeyear{#1}} \newcommand\xqed[1]{% \leavevmode\unskip\penalty9999 \hbox{}\nobreak\hfill \quad\hbox{#1}} \newcommand\qedex{\xqed{$\triangle$}} \newcommand\independent{\protect\mathpalette{\protect\independenT}{\perp}} \DeclareMathOperator{\Int}{Int} \DeclareMathOperator{\conv}{conv} \DeclareMathOperator{\cov}{Cov} \DeclareMathOperator{\var}{Var} \DeclareMathOperator{\Sel}{Sel} \DeclareMathOperator{\Bel}{Bel} \DeclareMathOperator{\cl}{cl} \DeclareMathOperator{\sgn}{sgn} \DeclareMathOperator{\essinf}{essinf} \DeclareMathOperator{\esssup}{esssup} \newcommand{\mathds}{\mathbb} \renewcommand{\P}{\mathbb{P}} [/math]

Framework and Scope of the Discussion

The identification analysis carried out in Section- Section presumes knowledge of the joint distribution [math]\sP[/math] of the observable variables. That is, it presumes that [math]\sP[/math] can be learned with certainty from observation of the entire population. In practice, one observes a sample of size [math]n[/math] drawn from [math]\sP[/math]. For simplicity I assume it to be a random sample.[Notes 1] Statistical inference on [math]\idr{\theta}[/math] needs to be conducted using knowledge of [math]\sP_n[/math], the empirical distribution of the observable outcomes and covariates. Because [math]\idr{\theta}[/math] is not a singleton, this task is particularly delicate. To start, care is required to choose a proper notion of consistency for a set estimator [math]\idrn{\theta}[/math] and to obtain palatable conditions under which such consistency attains. Next, the asymptotic behavior of statistics designed to test hypothesis or build confidence sets for [math]\idr{\theta}[/math] or for [math]\vartheta\in\idr{\theta}[/math] might change with [math]\vartheta[/math], creating technical challenges for the construction of confidence sets that are not encountered when [math]\theta[/math] is point identified. Many of the sharp identification regions derived in Section- Section can be written as collections of vectors [math]\vartheta\in\Theta[/math] that satisfy conditional or unconditional moment (in)equalities. For simplicity, I assume that [math]\Theta[/math] is a compact and convex subset of [math]\R^d[/math], and I use the formalization for the case of a finite number of unconditional moment (in)equalities:

[[math]] \begin{align} \idr{\theta}=\{\vartheta\in\Theta: \E_\sP(m_j(\ew_i;\vartheta))&\le 0\forall j\in\cJ_1, \E_\sP(m_j(\ew_i;\vartheta))=0\forall j\in\cJ_2\}.\label{eq:sharp_id_for_inference} \end{align} [[/math]]

In \eqref{eq:sharp_id_for_inference}, [math]\ew_i\in\cW\subseteq\R^{d_\cW}[/math] is a random vector collecting all observable variables, with [math]\ew\sim\sP[/math]; [math]m_j:\cW\times\Theta\to\R[/math], [math]j\in\cJ\equiv\cJ_1\cup\cJ_2[/math], are known measurable functions characterizing the model; and [math]\cJ[/math] is a finite set equal to [math]\{1,\dots,|\cJ|\}[/math].[Notes 2] Instances where [math]\idr{\theta}[/math] is characterized through a finite number of conditional moment (in)equalities and the conditioning variables have finite support can easily be recast as in \eqref{eq:sharp_id_for_inference}.[Notes 3] Consider, for example, the two player entry game model in Identification Problem, where [math]\ew=(\ey_1,\ey_2,\ex_1,\ex_2)[/math]. Using (in)equalities eq:CT_00-eq:CT_01L and assuming that the distribution of [math](\ex_1,\ex_2)[/math] has [math]\bar{k}[/math] points of support, denoted [math](x_{1,k},x_{2,k}),k=1,\dots,\bar{k}[/math], we have [math]|\cJ|=4\bar{k}[/math] and for [math]k=1,\dots,\bar{k}[/math],[Notes 4]

[[math]] \begin{align*} m_{4k-3}(\ew_i;\vartheta)&=[\one((\ey_1,\ey_2)=(0,0))-\Phi((-\infty,-\ex_1b_1),(-\infty,-\ex_2b_2);r)]\one((\ex_1,\ex_2)=(x_{1,k},x_{2,k}))\\ m_{4k-2}(\ew_i;\vartheta)&=[\one((\ey_1,\ey_2)=(1,1))-\Phi([-\ex_1b_1-d_1,\infty),[-\ex_2b_2-d_2,\infty);r)]\one((\ex_1,\ex_2)=(x_{1,k},x_{2,k}))\\ m_{4k-1}(\ew_i;\vartheta)&=[\one((\ey_1,\ey_2)=(0,1))-\Phi((-\infty,-\ex_1b_1-d_1),(-\ex_2b_2,\infty);r)]\one((\ex_1,\ex_2)=(x_{1,k},x_{2,k}))\\ m_{4k}(\ew_i;\vartheta)&=\Big[\one((\ey_1,\ey_2)=(0,1))-\Big\{\Phi((-\infty,-\ex_1b_1-d_1),(-\ex_2b_2,\infty);r)\notag\\ &\quad\quad-\Phi((-\ex_1b_1,-\ex_1b_1-d_1),(-\ex_2b_2,-\ex_2b_2-d_2);r)\Big\}\Big]\one((\ex_1,\ex_2)=(x_{1,k},x_{2,k})). \end{align*} [[/math]]


In point identified moment equality models it has been common to conduct estimation and inference using a criterion function that aggregates moment violations [1]. [2] adapt this idea to the partially identified case, through a criterion function [math]\crit_\sP:\Theta\to\R_+[/math] such that [math]\crit_\sP(\vartheta)=0[/math] if and only if [math]\vartheta\in\idr{\theta}[/math]. Many criterion functions can be used (see, e.g. [2][3][4][5][6][7][8][9][10]). Some simple and commonly employed ones include

[[math]] \begin{align} \crit_{\sP,\mathrm{sum}}(\vartheta) &= \sum_{j\in\cJ_1}\left[\frac{\E_\sP(m_j(\ew_i;\vartheta))}{\sigma_{\sP,j}(\vartheta)}\right]_+^2 + \sum_{j\in\cJ_2}\left[\frac{\E_\sP(m_j(\ew_i;\vartheta))}{\sigma_{\sP,j}(\vartheta)}\right]^2,\label{eq:criterion_fn_sum}\\ \crit_{\sP,\mathrm{max}}(\vartheta) &= \max\left\{\max_{j\in\cJ_1}\left[\frac{\E_\sP(m_j(\ew_i;\vartheta))}{\sigma_{\sP,j}(\vartheta)}\right]_+,\max_{j\in\cJ_2}\left|\frac{\E_\sP(m_j(\ew_i;\vartheta))}{\sigma_{\sP,j}(\vartheta)}\right|\right\}^2,\label{eq:criterion_fn_max} \end{align} [[/math]]

where [math][x]_+=\max\{x,0\}[/math] and [math]\sigma_{\sP,j}(\vartheta)[/math] is the population standard deviation of [math]m_j(\ew_i;\vartheta)[/math]. In \eqref{eq:criterion_fn_sum}-\eqref{eq:criterion_fn_max} the moment functions are standardized, as doing so is important for statistical power (see, e.g., [8](p. 127)). To simplify notation, I omit the label and simply use [math]\crit_\sP(\vartheta)[/math]. Given the criterion function, one can rewrite \eqref{eq:sharp_id_for_inference} as

[[math]] \begin{align} \label{eq:define:idr} \idr{\theta}=\{\vartheta\in\Theta:\crit_\sP(\vartheta)=0\}.\end{align} [[/math]]


To keep this chapter to a manageable length, I focus my discussion of statistical inference exclusively on consistent estimation and on different notions of coverage that a confidence set may be required to satisfy and that have proven useful in the literature.[Notes 5] The topics of test of hypotheses and construction of confidence sets in partially identified models are covered in [11], who provide a comprehensive survey devoted entirely to them in the context of moment inequality models. [12](Chapters 4 and 5) provide a thorough discussion of related methods based on the use of random set theory.

Consistent Estimation

When the identified object is a set, it is natural that its estimator is also a set. In order to discuss statistical properties of a set-valued estimator [math]\idrn{\theta}[/math] (to be defined below), and in particular its consistency, one needs to specify how to measure the distance between [math]\idrn{\theta}[/math] and [math]\idr{\theta}[/math]. Several distance measures among sets exist (see, e.g., [13](Appendix D)). A natural generalization of the commonly used Euclidean distance is the Hausdorff distance, see Definition, which for given [math]A,B\subset\R^d[/math] can be written as

[[math]] \begin{align*} \dist_H(A,B) = \inf\Big\{r \gt 0:\; A\subseteq B^r,\; B\subseteq A^r\Big\}=\max\left\{\sup_{a \in A} \dist(a,B), \sup_{b \in B} \dist(b,A) \right\},\end{align*} [[/math]]

with [math]\dist(a,B)\equiv\inf_{b\in B}\Vert a-b\Vert[/math].[Notes 6] In words, the Hausdorff distance between two sets measures the furthest distance from an arbitrary point in one of the sets to its closest neighbor in the other set. It is easy to verify that [math]\dist_H[/math] metrizes the family of non-empty compact sets; in particular, given non-empty compact sets [math]A,B\subset\R^d[/math], [math]\dist_H(A,B) =0[/math] if and only if [math]A=B[/math]. If either [math]A[/math] or [math]B[/math] is empty, [math]\dist_H(A,B) =\infty[/math]. The use of the Hausdorff distance to conceptualize consistency of set valued estimators in econometrics was proposed by [14](Section 2.4) and [2](Section 3.2).[Notes 7]

Definition (Hausdorff Consistency)

An estimator [math]\idrn{\theta}[/math] is consistent for [math]\idr{\theta}[/math] if

[[math]] \begin{align*} \dist_H(\idrn{\theta},\idr{\theta}) \stackrel{p}{\rightarrow} 0 \text{as } n\to \infty. \end{align*} [[/math]]

[15] establishes Hausdorff consistency of a plug-in estimator of the set [math]\{\vartheta\in\Theta:g_\sP(\vartheta)\le 0\}[/math], with [math]g_\sP:\cW\times\Theta \to \R[/math] a lower semicontinuous function of [math]\vartheta\in\Theta[/math] that can be consistently estimated by a lower semicontinuous function [math]g_n[/math] uniformly over [math]\Theta[/math]. The set estimator is [math]\{\vartheta\in\Theta:g_n(\vartheta)\le 0\}[/math]. The fundamental assumption in [15] is that [math]\{\vartheta\in\Theta:g_\sP(\vartheta)\le 0\}\subseteq\cl(\{\vartheta\in\Theta:g_\sP(\vartheta) \lt 0\})[/math], see [12](Section 5.2) for a discussion. There are important applications where this condition holds. [16] provide results related to [15], as well as important extensions for the construction of confidence sets, and show that these can be applied to carry out statistical inference on the Hansen–Jagannathan sets of admissible stochastic discount factors [17], the Markowitz–Fama mean–variance sets for asset portfolio returns [18], and the set of structural elasticities in [19]'s analysis of demand with optimization frictions. However, these methods are not broadly applicable in the general moment (in)equalities framework of this section, as [15]'s key condition generally fails for the set [math]\idr{\theta}[/math] in \eqref{eq:define:idr}.\medskip

Criterion Function Based Estimators

[2] extend the standard theory of extremum estimation of point identified parameters to partial identification, and propose to estimate [math]\idr{\theta}[/math] using the collection of values [math]\vartheta\in\Theta[/math] that approximately minimize a sample analog of [math]\crit_\sP[/math]:

[[math]] \begin{align} \idrn{\theta}=\left\{\vartheta\in\Theta:\crit_n(\vartheta)\le \inf_{\tilde\vartheta\in\Theta}\crit_n(\tilde\vartheta)+\tau_n\right\},\label{eq:define:idrn} \end{align} [[/math]]

with [math]\tau_n[/math] a sequence of non-negative random variables such that [math]\tau_n\stackrel{p}{\rightarrow} 0[/math]. In \eqref{eq:define:idrn}, [math]\crit_n(\vartheta)[/math] is a sample analog of [math]\crit_\sP(\vartheta)[/math] that replaces [math]\E_\sP(m_j(\ew_i;\vartheta))[/math] and [math]\sigma_{\sP,j}(\vartheta)[/math] in \eqref{eq:criterion_fn_sum}-\eqref{eq:criterion_fn_max} with properly chosen estimators, e.g.,

[[math]] \begin{align*} \bar m_{n,j}(\vartheta) &\equiv {\frac{1}{n}\sum_{i=1}^n m_j(\ew_i,\vartheta)},j=1,\dots, |\cJ| \\ \hat{\sigma}_{n,j}(\vartheta) &\equiv {\left(\frac{1}{n}\sum_{i=1}^n [m_j(\ew_i,\vartheta)]^2-[\bar m_{n,j}(\vartheta)]^2\right)^{1/2}},j=1,\dots, |\cJ|. \end{align*} [[/math]]


It can be shown that as long as [math]\tau_n=o_p(1)[/math], under the same assumptions used to prove consistency of extremum estimators of point identified parameters (e.g., with uniform convergence of [math]\crit_n[/math] to [math]\crit_\sP[/math] and continuity of [math]\crit_\sP[/math] on [math]\Theta[/math]),

[[math]] \begin{align} \sup_{\vartheta \in \idrn{\theta}} \inf_{\tilde\vartheta \in \idr{\theta}} \Vert \vartheta-\tilde\vartheta \Vert\stackrel{p}{\rightarrow} 0\text{as } n\to \infty.\label{eq:inner_consistent} \end{align} [[/math]]

This yields that asymptotically each point in [math]\idrn{\theta}[/math] is arbitrarily close to a point in [math]\idr{\theta}[/math], or more formally that [math]\sP(\idrn{\theta}\subseteq\idr{\theta})\to 1[/math]. I refer to \eqref{eq:inner_consistent} as inner consistency henceforth.[Notes 8] [20] provides an early contribution establishing this type of inner consistency for maximum likelihood estimators when the true parameter is not point identified. However, Hausdorff consistency requires also that

[[math]] \begin{align*} \sup_{\vartheta \in \idr{\theta}} \inf_{\tilde\vartheta \in \idrn{\theta}} \Vert \vartheta-\tilde\vartheta \Vert\stackrel{p}{\rightarrow} 0\text{as } n\to \infty, \end{align*} [[/math]]

i.e., that each point in [math]\idr{\theta}[/math] is arbitrarily close to a point in [math]\idrn{\theta}[/math], or more formally that [math]\sP(\idr{\theta}\subseteq\idrn{\theta})\to 1[/math]. To establish this result for the sharp identification regions in Theorem SIR- (parametric regression with interval covariate) and Theorem SIR-(semiparametric binary model with interval covariate), [2](Propositions 3 and 5) require the rate at which [math]\tau_n\stackrel{p}{\rightarrow} 0[/math] to be slower than the rate at which [math]\crit_n[/math] converges uniformly to [math]\crit_\sP[/math] over [math]\Theta[/math]. What might go wrong in the absence of such a restriction? A simple example can help understand the issue. Consider a model with linear inequalities of the form

[[math]] \begin{align*} \theta_1 &\le \E_\sP(\ew_1),\\ -\theta_1 &\le \E_\sP(\ew_2),\\ \theta_2 &\le \E_\sP(\ew_3)+ \E_\sP(\ew_4)\theta_1,\\ -\theta_2 &\le \E_\sP(\ew_5)+ \E_\sP(\ew_6)\theta_1. \end{align*} [[/math]]

Suppose [math]\ew\equiv(\ew_1,\dots,\ew_6)[/math] is distributed multivariate normal, with [math]\E_\sP(\ew)=[6020{-2}0]^\top[/math] and [math]\cov_\sP(\ew)[/math] equal to the identity matrix. Then [math]\idr{\theta}=\{\vartheta=[\vartheta_1\vartheta_2]^\top\in\Theta:\vartheta_1\in[0,6]\text{and}\vartheta_2=2\}[/math]. However, with positive probability in any finite sample [math]\crit_n(\vartheta)=0[/math] for [math]\vartheta[/math] in a random region (e.g., a triangle if [math]\crit_n[/math] is the sample analog of \eqref{eq:criterion_fn_max}) that only includes points that are close to a subset of the points in [math]\idr{\theta}[/math]. Hence, with positive probability the minimizer of [math]\crit_n[/math] cycles between consistent estimators of subsets of [math]\idr{\theta}[/math], but does not estimate the entire set. Enlarging the estimator to include all points that are close to minimizing [math]\crit_n[/math] up to a tolerance that converges to zero sufficiently slowly removes this problem.

[3] significantly generalize the consistency results in [2]. They work with a normalized criterion function equal to [math]\crit_n(\vartheta)-\inf_{\tilde\vartheta\in\Theta}\crit_n(\tilde\vartheta)[/math], but to keep notation light I simply refer to it as [math]\crit_n[/math].[Notes 9] Under suitable regularity conditions, they establish consistency of an estimator that can be a smaller set than the one proposed by [2], and derive its convergence rate. Some of the key conditions required by [3](Conditions C1 and C2) to study convergence rates include that [math]\crit_n[/math] is lower semicontinuous in [math]\vartheta[/math], satisfies various convergence properties among which [math]\sup_{\vartheta\in\idr{\theta}}\crit_n=O_p(1/a_n)[/math] for a sequence of normalizing constants [math]a_n\to\infty[/math], that [math]\tau_n\ge \sup_{\vartheta\in\idr{\theta}}\crit_n(\vartheta)[/math] with probability approaching one, and that [math]\tau_n\to 0[/math]. They also require that there exist positive constants [math](\delta,\kappa,\gamma)[/math] such that for any [math]\epsilon\in(0,1)[/math] there are [math](d_\epsilon,n_\epsilon)[/math] such that

[[math]] \begin{align*} \forall n\ge n_\epsilon, \, \crit_n(\vartheta)\ge\kappa[\min\{\delta,\dist(\vartheta,\idr{\theta})\}]^\gamma \end{align*} [[/math]]

uniformly on [math]\{\vartheta\in\Theta:\dist(\vartheta,\idr{\theta})\ge(d_\epsilon/a_n)^{1/\gamma}\}[/math] with probability at least [math]1-\epsilon[/math]. In words, the assumption, referred to as polynomial minorant condition, rules out that [math]\crit_n[/math] can be arbitrarily close to zero outside [math]\idr{\theta}[/math]. It posits that [math]\crit_n[/math] changes as at least a polynomial of degree [math]\gamma[/math] in the distance of [math]\vartheta[/math] from [math]\idr{\theta}[/math]. Under some additional regularity conditions, [3] establish that

[[math]] \begin{align} \dist_H(\idrn{\theta},\idr{\theta})=O_p(\max\{1/a_n,\tau_n\})^{1/\gamma}.\label{eq:CHT_rate} \end{align} [[/math]]


What is the role played by the polynomial minorant condition for the result in \eqref{eq:CHT_rate}? Under the maintained assumptions [math]\tau_n\ge \sup_{\vartheta\in\idr{\theta}}\crit_n(\vartheta)\ge\kappa[\min\{\delta,\dist(\vartheta,\idr{\theta})\}]^\gamma[/math], and the latter part of the inequality is used to obtain \eqref{eq:CHT_rate}. When could the polynomial minorant condition be violated? In moment (in)equalities models, [3] require [math]\gamma=2[/math].[Notes 10] Consider a simple stylized example with (in)equalities of the form

[[math]] \begin{align*} -\theta_1 &\le \E_\sP(\ew_1),\\ -\theta_2 &\le \E_\sP(\ew_2),\\ \theta_1\theta_2 &= \E_\sP(\ew_3), \end{align*} [[/math]]

with [math]\E_\sP(\ew_1)=\E_\sP(\ew_2)=\E_\sP(\ew_3)=0[/math], and note that the sample means [math](\bar{\ew}_1,\bar{\ew}_2,\bar{\ew}_3)[/math] are [math]\sqrt{n}[/math]-consistent estimators of [math](\E_\sP(\ew_1),\E_\sP(\ew_2),\E_\sP(\ew_3))[/math]. Suppose [math](\ew_1,\ew_2,\ew_3)[/math] are distributed multivariate standard normal. Consider a sequence [math]\vartheta_n=[\vartheta_{1n}\vartheta_{2n}]^\top=[n^{-1/4}n^{-1/4}]^\top[/math]. Then [math][\dist(\vartheta_n,\idr{\theta})]^\gamma=O_p(n^{-1/2})[/math]. On the other hand, with positive probability [math]\crit_n(\vartheta_n)=(\bar{\ew}_3-\vartheta_{1n}\vartheta_{2n})^2=O_p\left(n^{-1}\right)[/math], so that for [math]n[/math] large enough [math]\crit_n(\vartheta_n) \lt [\dist(\vartheta_n,\idr{\theta})]^\gamma[/math], violating the assumption. This occurs because the gradient of the moment equality vanishes as [math]\vartheta[/math] approaches zero, rendering the criterion function flat in a neighborhood of [math]\idr{\theta}[/math].

As intuition would suggest, rates of convergence are slower the flatter [math]\crit_n[/math] is outside [math]\idr{\theta}[/math]. [21] show that in moment inequality models with smooth moment conditions, the polynomial minorant assumption with [math]\gamma=2[/math] implies the Abadie constraint qualification (ACQ); see, e.g., [22](Chapter 5) for a definition and discussion of ACQ.

The example just given to discuss failures of the polynomial minorant condition is in fact a known example where ACQ fails at [math]\vartheta=[00]^\top[/math]. [3](Condition C.3, referred to as degeneracy) also consider the case that [math]\crit_n[/math] vanishes on subsets of [math]\Theta[/math] that converge in Hausdorff distance to [math]\idr{\theta}[/math] at rate [math]a_n^{-1/\gamma}[/math]. While degeneracy might be difficult to verify in practice, [3] show that if it holds, [math]\tau_n[/math] can be set to zero. [23] provides conditions on the moment functions, which are closely related to constraint qualifications (as discussed in [21]) under which it is possible to set [math]\tau_n=0[/math]. [24] studies estimation of [math]\idr{\theta}[/math] when the number of moment inequalities is large relative to sample size (possibly infinite).

He provides a consistency result for criterion-based estimators that use a number of unconditional moment inequalities that grows with sample size. He also considers estimators based on conditional moment inequalities, and derives the fastest possible rate for estimating [math]\idr{\theta}[/math] under smoothness conditions on the conditional moment functions.

He shows that the rates achieved by the procedures in [25][26] are (minimax) optimal, and cannot be improved upon.

Key Insight: [2] extend the notion of extremum estimation from point identified to partially identified models. They do so by putting forward a generalized criterion function whose zero-level set can be used to define [math]\idr{\theta}[/math] in partially identified structural semiparametric models. It is then natural to define the set valued estimator [math]\idrn{\theta}[/math] as the collection of approximate minimizers of the sample analog of this criterion function. [2]'s analysis of statistical inference focuses exclusively on providing consistent estimators. [3] substantially generalize the analysis of consistency of criterion function-based set estimators. They provide a comprehensive study of convergence rates in partially identified models. Their work highlights the challenges a researcher faces in this context, and puts forward possible solutions in the form of assumptions under which specific rates of convergence attain.

Support Function Based Estimators

[27] introduce to the econometrics literature inference methods for set valued estimators based on random set theory. They study the class of models where [math]\idr{\theta}[/math] is convex and can be written as the Aumann (or selection) expectation of a properly defined random closed set.[Notes 11] They propose to carry out estimation and inference leveraging the representation of convex sets through their support function (given in Definition), as it is done in random set theory; see [13](Chapter 3) and [12](Chapter 4). Because the support function fully characterizes the boundary of [math]\idr{\theta}[/math], it allows for a simple sample analog estimator, and for inference procedures with desirable properties. An example of a framework where the approach of [27] can be applied is that of best linear prediction with interval outcome data in Identification Problem.[Notes 12] Recall that in that case, the researcher observes random variables [math](\yL,\yU,\ex)[/math] and wishes to learn the best linear predictor of [math]\ey|\ex[/math], with [math]\ey[/math] unobserved and [math]\sR(\yL\le\ey\le\yU)=1[/math]. For simplicity let [math]\ex[/math] be a scalar. Given a random sample [math]\{\yLi,\yUi,\ex_i\}_{i=1}^n[/math] from [math]\sP[/math], the researcher can construct a random segment [math]\eG_i[/math] for each [math]i[/math] and a consistent estimator [math]\hat{\Sigma}_n[/math] of the random matrix [math]\Sigma_\sP[/math] in eq:G_and_Sigma as

[[math]] \begin{align*} \eG_i=\left\{ \begin{pmatrix} \ey_i\\ \ey_i\ex_i \end{pmatrix}  :\; \ey_i \in \Sel(\eY_i)\right\}\subset\R^2, \text{and} \hat\Sigma_n= \begin{pmatrix} 1 & \overline\ex\\ \overline\ex & \overline{\ex^2} \end{pmatrix},\end{align*} [[/math]]

where [math]\eY_i=[\yLi,\yUi][/math] and [math]\overline\ex,\overline{\ex^2}[/math] are the sample means of [math]\ex_i[/math] and [math]\ex^2_i[/math] respectively. Because in this problem [math]\idr{\theta}=\Sigma_\sP^{-1}\E_\sP\eG[/math] (see Theorem SIR- on p.\pageref{SIR:BLP_intervalY}), a natural sample analog estimator replaces [math]\Sigma_\sP[/math] with [math]\hat{\Sigma}_n[/math], and [math]\E_\sP\eG[/math] with a Minkowski average of [math]\eG_i[/math] (see Appendix, p.\pageref{def:mink:sum} for a formal definition), yielding

[[math]] \begin{align} \idrn{\theta}=\hat\Sigma_n^{-1}\frac{1}{n}\sum_{i=1}^n\eG_i.\label{eq:BLP_estimator} \end{align} [[/math]]

The support function of [math]\idrn{\theta}[/math] is the sample analog of that of [math]\idr{\theta}[/math] provided in eq:supfun:BLP:

[[math]] \begin{align*} h_{\idrn{\theta}}(u)=\frac{1}{n}\sum_{i=1}^n[(\yLi\one(f(\ex_i,u) \lt 0)+\yUi\one(f(\ex_i,u)\ge 0))f(\ex_i,u)],u\in\mathbb{S}, \end{align*} [[/math]]

where [math]f(\ex_i,u)=[1\ex_i]\hat\Sigma_n^{-1}u[/math]. [27] use the Law of Large Numbers for random sets reported in Theorem to show that [math]\idrn{\theta}[/math] in \eqref{eq:BLP_estimator} is [math]\sqrt{n}[/math]-consistent under standard conditions on the moments of [math](\yLi,\yUi,\ex_i)[/math]. [28] and [29] significantly expand the applicability of [27] estimator. [28] show that it can be used in a large class of partially identified linear models, including ones that allow for the availability of instrumental variables. [29] show that it can be used for best linear approximation of any function [math]f(x)[/math] that is known to lie within two identified bounding functions. The lower and upper functions defining the band are allowed to be any functions, including ones carrying an index, and can be estimated parametrically or nonparametrically. The method allows for estimation of the parameters of the best linear approximations to the set identified functions in many of the identification problems described in Section. It can also be used to estimate the sharp identification region for the parameters of a binary choice model with interval or discrete regressors under the assumptions of [30], characterized in eq:SIR:mag:mau in Section Semiparametric Binary Choice Models with Interval Valued Covariates. [31] develop a theory of efficiency for estimators of sets [math]\idr{\theta}[/math] as in \eqref{eq:sharp_id_for_inference} under the additional requirements that the inequalities [math]\E_\sP(m_j(\ew,\vartheta))[/math] are convex in [math]\vartheta\in\Theta[/math] and smooth as functionals of the distribution of the data. Because of the convexity of the moment inequalities, [math]\idr{\theta}[/math] is convex and can be represented through its support function. Using the classic results in [32], [31] show that under suitable regularity conditions, the support function admits for [math]\sqrt{n}[/math]-consistent regular estimation. They also show that a simple plug-in estimator based on the support function attains the semiparametric efficiency bound, and the corresponding estimator of [math]\idr{\theta}[/math] minimizes a wide class of asymptotic loss functions based on the Hausdorff distance. As they establish, this efficiency result applies to the estimators proposed by [27], including that in \eqref{eq:BLP_estimator}, and by [28].

[33] further enlarges the applicability of the support function approach by establishing its duality with the criterion function approach, for the case that [math]\crit_\sP[/math] is a convex function and [math]\crit_n[/math] is a convex function almost surely. This allows one to use the support function approach also when a representation of [math]\idr{\theta}[/math] as the Aumann expectation of a random closed set is not readily available.

[33] considers [math]\idr{\theta}[/math] and its level set estimator [math]\idrn{\theta}[/math] as defined, respectively, in \eqref{eq:define:idr} and \eqref{eq:define:idrn}, with [math]\Theta[/math] a convex subset of [math]\R^d[/math]. Because [math]\crit_\sP[/math] and [math]\crit_n[/math] are convex functions, [math]\idr{\theta}[/math] and [math]\idrn{\theta}[/math] are convex sets. Under the same assumptions as in [3], including the polynomial minorant and the degeneracy conditions, one can set [math]\tau_n=0[/math] and have [math]\dist_H(\idrn{\theta},\idr{\theta})=O_p(a_n^{-1/\gamma})[/math]. Moreover, due to its convexity, [math]\idr{\theta}[/math] is fully characterized by its support function, which in turn can be consistently estimated (at the same rate as [math]\idr{\theta}[/math]) using sample analogs as [math]h_{\idrn{\theta}}(u)=\max_{a_n\crit_n(\vartheta)\le 0}u^\top\vartheta[/math]. The latter can be computed via convex programming.

[34] consider consistent estimation of [math]\idr{\theta}[/math] in the context of Bayesian inference. They focus on partially identified models where [math]\idr{\theta}[/math] depends on a “reduced form” parameter [math]\phi[/math] (e.g., a vector of moments of observable random variables). They recognize that while a prior on [math]\phi[/math] can be revised in light of the data, a prior on [math]\theta[/math] cannot, due to the lack of point identification. As such they propose to choose a single prior for the revisable parameters, and a set of priors for the unrevisable ones. The latter is the collection of priors such that the distribution of [math]\theta|\phi[/math] places probability one on [math]\idr{\theta}[/math]. A crucial observation in [34] is that once [math]\phi[/math] is viewed as a random vector, as in the Bayesian paradigm, under mild regularity conditions [math]\idr{\theta}[/math] is a random closed set, and Bayesian inference on it can be carried out using elements of random set theory. In particular, they show that the set of posterior means of [math]\theta|\ew[/math] equals the Aumann expectation of [math]\idr{\theta}[/math] (with the underlying probability measure of [math]\phi|\ew[/math]). They also show that this Aumann expectation converges in Hausdorff distance to the “true” identified set if the latter is convex, or otherwise to its convex hull. They apply their method to analyze impulse-response in set-identified Structural Vector Autoregressions, where standard Bayesian inference is otherwise sensitive to the choice of an unrevisable prior.[Notes 13]

Key Insight: [27] show that elements of random set theory can be employed to obtain inference methods for partially identified models that are easy to implement and have desirable statistical properties. Whereas they apply their findings to a specific class of models based on the Aumann expectation, the ensuing literature demonstrates that random set methods are widely applicable to obtain estimators of sharp identification regions and establish their consistency.

[35] propose an alternative to the notion of consistent estimator. Rather than asking that [math]\idrn{\theta}[/math] satisfies the requirement in Definition, they propose the notion of ’'half-median-unbiased estimator. This notion is easiest to explain in the case of interval identified scalar parameters. Take, e.g., the bound in Theorem SIR- for the conditional expectation of selectively observed data. Then an estimator of that interval is half-median-unbiased if the estimated upper bound exceeds the true upper bound, and the estimated lower bound falls below the true lower bound, each with probability at least [math]1/2[/math] asymptotically. More generally, one can obtain a half-median-unbiased estimator as

[[math]] \begin{align} \idrn{\theta}=\left\{\vartheta\in\Theta:a_n\crit_n(\vartheta)\le c_{1/2}(\vartheta)\right\},\label{eq:idrn:half:med:unb} \end{align} [[/math]]

where [math]c_{1/2}(\vartheta)[/math] is a critical value chosen so that [math]\idrn{\theta}[/math] asymptotically contains [math]\idr{\theta}[/math] (or any fixed element in [math]\idr{\theta}[/math]; see the discussion in Section Coverage of $\idr{\theta}$ vs. Coverage of $\theta$ below) with at least probability [math]1/2[/math]. As discussed in the next section, [math]c_{1/2}(\vartheta)[/math] can be further chosen so that this probability is uniform over [math]\sP\in\cP[/math].

The requirement of half-median unbiasedness has the virtue that, by construction, an estimator such as \eqref{eq:idrn:half:med:unb} is a subset of a [math]1-\alpha[/math] confidence set as defined in \eqref{eq:CS} below for any [math]\alpha \lt 1/2[/math], provided [math]c_{1-\alpha}(\vartheta)[/math] is chosen using the same criterion for all [math]\alpha\in(0,1)[/math]. In contrast, a consistent estimator satisfying the requirement in Definition needs not be a subset of a confidence set. This is because the sequence [math]\tau_n[/math] in \eqref{eq:define:idrn} may be larger than the critical value used to obtain the confidence set, see equation \eqref{eq:CS} below, unless regularity conditions such as degeneracy or others allow one to set [math]\tau_n[/math] equal to zero. Moreover, choice of the sequence [math]\tau_n[/math] is not data driven, and hence can be viewed as arbitrary. This raises a concern for the scope of consistent estimation in general settings.

However, reporting a set estimator together with a confidence set is arguably important to shed light on how much of the volume of the confidence set is due to statistical uncertainty and how much is due to a large identified set. One can do so by either using a half-median unbiased estimator as in \eqref{eq:idrn:half:med:unb}, or the set of minimizers of the criterion function in \eqref{eq:define:idrn} with [math]\tau_n=0[/math] (which, as previously discussed, satisfies the inner consistency requirement in \eqref{eq:inner_consistent} under weak conditions, and is Hausdorff consistent in some well behaved cases).

Confidence Sets Satisfying Various Coverage Notions

Coverage of [math]\idr{\theta}[/math] vs. Coverage of [math]\theta[/math]

I first discuss confidence sets [math]\CS\subset\R^d[/math] defined as level sets of a criterion function. To simplify notation, henceforth I assume [math]a_n=n[/math].

[[math]] \begin{align} \CS=\left\{\vartheta\in\Theta:n\crit_n(\vartheta)\le c_{1-\alpha}(\vartheta)\right\}.\label{eq:CS} \end{align} [[/math]]

In \eqref{eq:CS}, [math]c_{1-\alpha}(\vartheta)[/math] may be constant or vary in [math]\vartheta\in\Theta[/math]. It is chosen to that [math]\CS[/math] satisfies (asymptotically) a certain coverage property with respect to either [math]\idr{\theta}[/math] or each [math]\vartheta\in\idr{\theta}[/math]. Correspondingly, different appearances of [math]c_{1-\alpha}(\vartheta)[/math] may refer to different critical values associated with different coverage notions. The challenging theoretical aspect of inference in partial identification is the determination of [math]c_{1-\alpha}[/math] and of methods to approximate it.

A first classification of coverage notions pertains to whether the confidence set should cover [math]\idr{\theta}[/math] or each of its elements with a prespecified asymptotic probability. Early on, within the study of interval-identified parameters, [36][37] put forward a confidence interval that expands each of the sample analogs of the extreme points of the population bounds by an amount designed so that the confidence interval asymptotically covers the population bounds with prespecified probability. [3] study the general problem of inference for a set [math]\idr{\theta}[/math] defined as the zero-level set of a criterion function. The coverage notion that they propose is pointwise coverage of the set, whereby [math]c_{1-\alpha}[/math] is chosen so that:

[[math]] \begin{align} \liminf_{n\to\infty}\sP(\idr{\theta}\subseteq\CS)\ge 1-\alpha\text{for all}\sP\in\cP.\label{eq:CS_coverage:set:pw} \end{align} [[/math]]

[3] provide conditions under which [math]\CS[/math] satisfies \eqref{eq:CS_coverage:set:pw} with [math]c_{1-\alpha}[/math] constant in [math]\vartheta[/math], yielding the so called criterion function approach to statistical inference in partial identification. Under the same coverage requirement, [38] and [39] introduce novel bootstrap methods for inference in moment inequality models. [40] propose an inference method for finite games of complete information that exploits the structure of these models. [27] propose a method to test hypotheses and build confidence sets satisfying \eqref{eq:CS_coverage:set:pw} based on random set theory, the so called support function approach, which yields simple to compute confidence sets with asymptotic coverage equal to [math]1-\alpha[/math] when [math]\idr{\theta}[/math] is strictly convex. The reason for the strict convexity requirement is that in its absence, the support function of [math]\idr{\theta}[/math] is not fully differentiable, but only directionally differentiable, complicating inference. Indeed, [41] show that standard bootstrap methods are consistent if and only if full differentiability holds, and they provide modified bootstrap methods that remain valid when only directional differentiability holds. [29] propose a data jittering method that enforces full differentiability at the price of a small conservative distortion. [31] extend the applicability of the support function approach to other moment inequality models and establish efficiency results. [16] show that an Hausdorff distance-based test statistic can be weighted to enforce either exact or first-order equivariance to transformations of parameters. [42] provide empirical likelihood based inference methods for the support function approach. The test statistics employed in the criterion function approach and in the support function approach are asymptotically equivalent in specific moment inequality models [27][33], but the criterion function approach is more broadly applicable.


The field's interest changed to a different notion of coverage when [43] pointed out that often there is one “true” data generating [math]\theta[/math], even if it is only partially identified. Hence, they proposed confidence sets that cover each [math]\vartheta\in\idr{\theta}[/math] with a prespecified probability. For pointwise coverage, this leads to choosing [math]c_{1-\alpha}[/math] so that:

[[math]] \begin{align} \liminf_{n\to\infty}\sP(\vartheta\in\CS)\ge 1-\alpha\text{for all}\sP\in\cP\text{and}\vartheta\in\idr{\theta}.\label{eq:CS_coverage:point:pw} \end{align} [[/math]]

If [math]\idr{\theta}[/math] is a singleton then \eqref{eq:CS_coverage:set:pw} and \eqref{eq:CS_coverage:point:pw} both coincide with the pointwise coverage requirement employed for point identified parameters. However, as shown in [43](Lemma 1), if [math]\idr{\theta}[/math] contains more than one element, the two notions differ, with confidence sets satisfying \eqref{eq:CS_coverage:point:pw} being weakly smaller than ones satisfying \eqref{eq:CS_coverage:set:pw}. [5] provides confidence sets for general moment (in)equalities models that satisfy \eqref{eq:CS_coverage:point:pw} and are easy to compute. Although confidence sets that take each [math]\vartheta\in\idr{\theta}[/math] as the object of interest (and which satisfy the ’'uniform coverage requirements described in Section Pointwise vs. Uniform Coverage below) have received the most attention in the literature on inference in partially identified models, this choice merits some words of caution. First, [44] point out that if confidence sets are to be used for decision making, a policymaker concerned with robust decisions might prefer ones satisfying \eqref{eq:CS_coverage:set:pw} (respectively, \eqref{eq:CS_coverage:set} below once uniformity is taken into account) to ones satisfying \eqref{eq:CS_coverage:point:pw} (respectively, \eqref{eq:CS_coverage:point} below with uniformity). Second, while in many applications a “true” data generating [math]\theta[/math] exists, in others it does not. For example, [45] and [46] query survey respondents (in the American Life Panel and in the Health and Retirement Study, respectively) about their subjective beliefs on the probability chance of future events. A large fraction of these respondents, when given the possibility to do so, report imprecise beliefs in the form of intervals. In this case, there is no “true” point-valued belief: the “truth” is interval-valued. If one is interested in (say) average beliefs, the sharp identification region is the (Aumann) expectation of the reported intervals, and the appropriate coverage requirement for a confidence set is that in \eqref{eq:CS_coverage:set:pw} (respectively, \eqref{eq:CS_coverage:set} below with uniformity).

Pointwise vs. Uniform Coverage

In the context of interval identified parameters, such as, e.g., the mean with missing data in Theorem SIR- with [math]\theta\in\R[/math], [43] pointed out that extra care should be taken in the construction of confidence sets for partially identified parameters, as otherwise they may be asymptotically valid only pointwise (in the distribution of the observed data) over relevant classes of distributions.[Notes 14] For example, consider a confidence interval that expands each of the sample analogs of the extreme points of the population bounds by a one-sided critical value. This confidence interval controls the asymptotic coverage probability pointwise for any DGP at which the width of the population bounds is positive. This is because the sampling variation becomes asymptotically negligible relative to the (fixed) width of the bounds, making the inference problem essentially one-sided. However, for every [math]n[/math] one can find a distribution [math]\sP\in\cP[/math] and a parameter [math]\vartheta\in\idr{\theta}[/math] such that the width of the population bounds (under [math]\sP[/math]) is small relative to [math]n[/math] and the coverage probability for [math]\vartheta[/math] is below [math]1-\alpha[/math]. This happens because the proposed confidence interval does not take into account the fact that for some [math]\sP\in\cP[/math] the problem has a two-sided nature. This observation naturally leads to a more stringent requirement of ’'uniform coverage, whereby \eqref{eq:CS_coverage:set:pw}-\eqref{eq:CS_coverage:point:pw} are replaced, respectively, by

[[math]] \begin{align} \liminf_{n\to\infty}\inf_{\sP\in\cP}\sP(\idr{\theta}\subseteq\CS)&\ge 1-\alpha,\label{eq:CS_coverage:set}\\ \liminf_{n\to\infty}\inf_{\sP\in\cP}\inf_{\vartheta\in\idr{\theta}}\sP(\vartheta\in\CS)&\ge 1-\alpha,\label{eq:CS_coverage:point} \end{align} [[/math]]

and [math]c_{1-\alpha}[/math] is chosen accordingly, to obtain either \eqref{eq:CS_coverage:set} or \eqref{eq:CS_coverage:point}. Sets satisfying \eqref{eq:CS_coverage:set} are referred to as confidence regions for [math]\idr{\theta}[/math] that are uniformly consistent in level (over [math]\sP\in\cP[/math]). [10] propose such confidence regions, study their properties, and provide a step-down procedure to obtain them. [47] propose confidence sets that are contour sets of criterion functions using cutoffs that are computed via Monte Carlo simulations from the quasi‐posterior distribution of the criterion and satisfy the coverage requirement in \eqref{eq:CS_coverage:set}. They recommend the use of a Sequential Monte Carlo algorithm that works well also when the quasi-posterior is irregular and multi-modal. They establish exact asymptotic coverage, non-trivial local power, and validity of their procedure in point identified and partially identified regular models, and validity in irregular models (e.g., in models where the reduced form parameters are on the boundary of the parameter space). They also establish efficiency of their procedure in regular models that happen to be point identified.

Sets satisfying \eqref{eq:CS_coverage:point} are referred to as confidence regions for points in [math]\idr{\theta}[/math] that are uniformly consistent in level (over [math]\sP\in\cP[/math]). Within the framework of [43], [48] shows that one can obtain a confidence interval satisfying \eqref{eq:CS_coverage:point} by pre-testing whether the lower and upper population bounds are sufficiently close to each other. If so, the confidence interval expands each of the sample analogs of the extreme points of the population bounds by a two-sided critical value; otherwise, by a one-sided. [48] provides important insights clarifying the connection between superefficient (i.e., faster than [math]O_p(1/\sqrt{n})[/math]) estimation of the width of the population bounds when it equals zero, and certain challenges in [43]'s proposed method.[Notes 15] [28] leverage [48]'s results to obtain confidence sets satisfying \eqref{eq:CS_coverage:point} using the support function approach for set identified linear models. Obtaining confidence sets that satisfy the requirement in \eqref{eq:CS_coverage:point} becomes substantially more complex in the context of general moment (in)equalities models. One of the key challenges to uniform inference stems from the fact that the behavior of the limit distribution of the test statistic depends on [math]\sqrt{n}\E_\sP(m_j(\ew_i;\vartheta)),j=1,\dots,|\cJ|[/math], which cannot be consistently estimated. [4][7][8][9][49][50], among others, make significant contributions to circumvent these difficulties in the context of a finite number of unconditional moment (in)equalities. [51][35][52][25][26][53][54], among others, make significant contributions to circumvent these difficulties in the context of a finite number of conditional moment (in)equalities (with continuously distributed conditioning variables). [55] and [56] study, respectively, the challenging frameworks where the number of moment inequalities grows with sample size and where there is a continuum of conditional moment inequalities. I refer to [11](Section 4) for a thorough discussion of these methods and a comparison of their relative (de)merits (see also [57][58]).

Coverage of the Vector [math]\theta[/math] vs. Coverage of a Component of [math]\theta[/math]

The coverage requirements in \eqref{eq:CS_coverage:set}-\eqref{eq:CS_coverage:point} refer to confidence sets in [math]\R^d[/math] for the entire [math]\theta[/math] or [math]\idr{\theta}[/math]. Often empirical researchers are interested in inference on a specific component or (smooth) function of [math]\theta[/math] (e.g., the returns to education; the effect of market size on the probability of entry; the elasticity of demand for insurance to price, etc.). For simplicity, here I focus on the case of a component of [math]\theta[/math], which I represent as [math]u^\top\theta[/math], with [math]u[/math] a standard basis vector in [math]\R^d[/math]. In this case, the (sharp) identification region of interest is

[[math]] \begin{align*} \idr{u^\top\theta}=\{s\in[-h_\Theta(-u),h_\Theta(u)]:s=u^\top\vartheta\text{and}\vartheta\in\idr{\theta}\}. \end{align*} [[/math]]

One could report as confidence interval for [math]u^\top\theta[/math] the projection of [math]\CS[/math] in direction [math]\pm u[/math]. The resulting confidence interval is asymptotically valid but typically conservative. The extent of the conservatism increases with the dimension of [math]\theta[/math] and is easily appreciated in the case of a point identified parameter. Consider, for example, a linear regression in [math]\R^{10}[/math], and suppose for simplicity that the limiting covariance matrix of the estimator is the identity matrix. Then a 95% confidence interval for [math]u^\top\theta[/math] is obtained by adding and subtracting [math]1.96[/math] to that component's estimate. In contrast, projection of a 95% confidence ellipsoid for [math]\theta[/math] on each component amounts to adding and subtracting [math]4.28[/math] to that component's estimate. It is therefore desirable to provide confidence intervals [math]\CI[/math] specifically designed to cover [math]u^\top\theta[/math] rather then the entire [math]\theta[/math]. Natural counterparts to \eqref{eq:CS_coverage:set}-\eqref{eq:CS_coverage:point} are

[[math]] \begin{align} \liminf_{n\to\infty}\inf_{\sP\in\cP}\sP(\idr{u^\top\theta} \subseteq \CI)&\ge 1-\alpha,\label{eq:CS_coverage:set:proj}\\ \liminf_{n\to\infty}\inf_{\sP\in\cP}\inf_{\vartheta\in\idr{\theta}}\sP(u^\top\vartheta\in \CI)&\ge 1-\alpha. \label{eq:CS_coverage:point:proj} \end{align} [[/math]]

As shown in [27] and [33] for the case of pointwise coverage, obtaining asymptotically valid confidence intervals is simple if the identified set is convex and one uses the support function approach. This is because it suffices to base the test statistic on the support function in direction [math]u[/math], and it is often possible to easily characterize the limiting distribution of this test statistic. See [12](Chapters 4 and 5) for details. The task is significantly more complex in general moment inequality models when [math]\idr{\theta}[/math] is non-convex and one wants to satisfy the criterion in \eqref{eq:CS_coverage:set:proj} or that in \eqref{eq:CS_coverage:point:proj}. [4] and [59] propose confidence intervals of the form

[[math]] \begin{align} \CI = \left\{s\in[-h_\Theta(-u),h_\Theta(u)]:\inf_{\vartheta\in\Theta(s)}n\crit_n(\vartheta)\le c_{1-\alpha}(s)\right\},\label{eq:CI:BCS} \end{align} [[/math]]

where [math]\Theta(s)=\{\vartheta\in\Theta:u^\top\vartheta=s\}[/math] and [math]c_{1-\alpha}[/math] is such that \eqref{eq:CS_coverage:point:proj} holds. An important idea in this proposal is that of profiling the test statistic [math]n\crit_n(\vartheta)[/math] by minimizing it over all [math]\vartheta[/math]s such that [math]u^\top\vartheta=s[/math]. One then includes in the confidence interval all values [math]s[/math] for which the profiled test statistic's value is not too large. [4] propose the use of subsampling to obtain the critical value [math]c_{1-\alpha}(s)[/math] and provide high-level conditions ensuring that \eqref{eq:CS_coverage:point:proj} holds. [59] substantially extend and improve the profiling approach by providing a bootstrap-based method to obtain [math]c_{1-\alpha}[/math] so that \eqref{eq:CS_coverage:point:proj} holds. Their method is more powerful than subsampling (for reasonable choices of subsample size). [60] further enlarge the domain of applicability of the profiling approach by proposing a method based on this approach that is asymptotically uniformly valid when the number of moment conditions is large, and can grow with the sample size, possibly at exponential rates. [61] propose a bootstrap-based calibrated projection approach where

[[math]] \begin{align} \CI= [-h_{\eC_n(c_{1-\alpha})}(-u),h_{\eC_n(c_{1-\alpha})}(u)],\label{eq:def:CI} \end{align} [[/math]]

with

[[math]] \begin{align} h_{\eC_n(c_{1-\alpha})}(u)\equiv\sup_{\vartheta\in\Theta}u^\top\vartheta\text{s.t.}\frac{\sqrt{n}\bar{m}_{n,j}(\vartheta)}{\hat{\sigma}_{n,j}(\vartheta)}\leq c_{1-\alpha}(\vartheta),j=1,\dots,|\cJ|\label{eq:KMS:proj} \end{align} [[/math]]

and [math]c_{1-\alpha}[/math] a critical level function calibrated so that \eqref{eq:CS_coverage:point:proj} holds. Compared to the simple projection of [math]\CS[/math] mentioned at the beginning of Section Coverage of the Vector $\theta$ vs. Coverage of a Component of $\theta$, calibrated projection (weakly) reduces the value of [math]c_{1-\alpha}[/math] so that the projection of [math]\theta[/math], rather than [math]\theta[/math] itself, is asymptotically covered with the desired probability uniformly. [47] provide methods to build confidence intervals and confidence sets on projections of [math]\idr{\theta}[/math] as contour sets of criterion functions using cutoffs that are computed via Monte Carlo simulations from the quasi‐posterior distribution of the criterion, and that satisfy the coverage requirement in \eqref{eq:CS_coverage:set:proj}. One of their procedures, designed specifically for scalar projections, delivers a confidence interval as the contour set of a profiled quasi-likelihood ratio with critical value equal to a quantile of the Chi-squared distribution with one degree of freedom.

A Brief Note on Bayesian Methods

The confidence sets discussed in this section are based on the frequentist approach to inference. It is natural to ask whether in partially identified models, as in well behaved point identified models, one can build Bayesian credible sets that at least asymptotically coincide with frequentist confidence sets. This question was first addressed by [62], with a negative answer for the case that the coverage in \eqref{eq:CS_coverage:point} is sought out. In particular, they showed that the resulting Bayesian credible sets are a subset of [math]\idr{\theta}[/math], and hence too narrow from the frequentist perspective. This discrepancy can be ameliorated when inference is sought out for [math]\idr{\theta}[/math] rather than for each [math]\vartheta\in\idr{\theta}[/math]. [63], [64], [34], and [65] propose Bayesian credible regions that are valid for frequentist inference in the sense of \eqref{eq:CS_coverage:set:pw}, where the first two build on the criterion function approach and the second two on the support function approach. All these contributions rely on the model being separable, in the sense that it yields moment inequalities that can be written as the sum of a function of the data only, and a function of the model parameters only (as in, e.g., eq:CT_00-eq:CT_01L). In these models, the function of the data only (the reduced form parameter) is point identified, it is related to the structural parameters [math]\theta[/math] through a known mapping, and under standard regularity conditions it can be [math]\sqrt{n}[/math]-consistently estimated. The resulting estimator has an asymptotically Normal distribution. The various approaches place a prior on the reduced form parameter, and standard tools in Bayesian analysis are used to obtain a posterior. The known mapping from reduced form to structural parameters is then applied to this posterior to obtain a credible set for [math]\idr{\theta}[/math].

General references

Molinari, Francesca (2020). "Microeconometrics with Partial Identification". arXiv:2004.11751 [econ.EM].

Notes

  1. This assumption is often maintained in the literature. See, e.g., [1] for a treatment of inference with dependent observations. [2] study inference in games of complete information as in Identification Problem, imposing the i.i.d. assumption on the unobserved payoff shifters [math]\{\eps_{i1},\eps_{i2}\}_{i=1}^n[/math]. The authors note that because the selection mechanism picking the equilibrium played in the regions of multiplicity (see Section Static, Simultaneous-Move Finite Games with Multiple Equilibria) is left completely unspecified and may be arbitrarily correlated across markets, the resulting observed variables [math]\{\ew_i\}_{i=1}^n[/math] may not be independent and identically distributed, and they propose an inference method to address this issue.
  2. Examples where the set [math]\cJ[/math] is a compact set (e.g., a unit ball) rather than a finite set include the case of best linear prediction with interval outcome and covariate data, see characterization eq:ThetaI:BLP on p.\pageref{eq:ThetaI:BLP}, and the case of entry games with multiple mixed strategy Nash equilibria, see characterization eq:SIR_sharp_mixed_sup on p.\pageref{eq:SIR_sharp_mixed_sup}. A more general continuum of inequalities is also possible, as in the case of discrete choice with endogenous explanatory variables, see characterization eq:SIR:discrete:choice:endogenous on p.\pageref{eq:SIR:discrete:choice:endogenous}. I refer to [3] and [4](Supplementary Appendix B) for inference methods in the presence of a continuum of conditional moment (in)equalities.
  3. I refer to [5], [6], [7], [8], [9][10], [11], [12], and [13], for inference methods in the case that the conditioning variables have a continuous distribution.
  4. In these expressions an index of the form [math]jk[/math] not separated by a comma equals the product of [math]j[/math] with [math]k[/math].
  5. Using the well known duality between tests of hypotheses and confidence sets, the discussion could be re-framed in terms of size of the test.
  6. The definition of the Hausdorff distance can be generalized to an arbitrary metric space by replacing the Euclidean metric by the metric specified on that space.
  7. It was previously used in the mathematical literature on random set theory, for example to formalize laws of large numbers and central limit theorems for random sets such as the ones in Theorems and [14][15].
  8. See [16](Theorem 1) for a pedagogically helpful proof for a semiparametric binary model.
  9. Using this normalized criterion function is especially important in light of possible model misspecification, see Section.
  10. [17](equation (4.1) and equation (4.6)) set [math]\gamma=1[/math] because they report the assumption for a criterion function that does not square the moment violations.
  11. By Theorem, the Aumann expectation of a random closed set defined on a nonatomic probability space is convex. In this chapter I am assuming nonatomicity of the probability space. Even if I did not make this assumption, however, when working with a random sample the relevant probability space is the product space with [math]n\to\infty[/math], hence nonatomic [18]. If [math]\idr{\theta}[/math] is not convex, [19]'s analysis applies to its convex hull.
  12. [20](Supplementary Appendix F) establish that if [math]\ex[/math] has finite support, [math]\idr{\theta}[/math] in Theorem SIR- can be written as the collection of [math]\vartheta\in\Theta[/math] that satisfy a finite number of moment inequalities, as posited in this section.
  13. There is a large literature in macro-econometrics, pioneered by [21], [22], and [23], concerned with Bayesian inference with a non-informative prior for non-identified parameters. I refer to [24](Chapter 13) for a thorough review. Frequentist inference for impulse response functions in Structural Vector Autoregression models is carried out, e.g., in [25] and [26].
  14. This discussion draws on many conversations with J\"{o}rg Stoye, as well as on notes that he shared with me, for which I thank him.
  15. Indeed, the confidence interval proposed by [27] can be thought of as using a Hodges-type shrinkage estimator (see, e.g., [28]) for the width of the population bounds.

References

  1. Hansen, L.P. (1982b): “Large Sample Properties of Generalized Method of Moments Estimators” Econometrica, 50(4), 1029--1054.
  2. 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 Manski, C.F., and E.Tamer (2002): “Inference on Regressions with Interval Data on a Regressor or Outcome” Econometrica, 70(2), 519--546.
  3. 3.00 3.01 3.02 3.03 3.04 3.05 3.06 3.07 3.08 3.09 3.10 Chernozhukov, V., H.Hong, and E.Tamer (2007): “Estimation and Confidence Regions for Parameter Sets in Econometric Models” Econometrica, 75(5), 1243--1284.
  4. 4.0 4.1 4.2 4.3 Romano, J.P., and A.M. Shaikh (2008): “Inference for identifiable parameters in partially identified econometric models” Journal of Statistical Planning and Inference, 138(9), 2786 -- 2807.
  5. 5.0 5.1 Rosen, A.M. (2008): “Confidence sets for partially identified parameters that satisfy a finite number of moment inequalities” Journal of Econometrics, 146(1), 107 -- 117.
  6. Galichon, A., and M.Henry (2009): “A test of non-identifying restrictions and confidence regions for partially identified parameters” Journal of Econometrics, 152(2), 186 -- 196.
  7. 7.0 7.1 Andrews, D. W.K., and P.Guggenberger (2009): “Validity of Subsampling and `Plug-in Asymptotic' Inference for Parameters Defined by Moment Inequalities” Econometric Theory, 25(3), 669--709.
  8. 8.0 8.1 8.2 Andrews, D. W.K., and G.Soares (2010): “Inference for Parameters Defined by Moment Inequalities Using Generalized Moment Selection” Econometrica, 78(1), 119--157.
  9. 9.0 9.1 Canay, I.A. (2010): “EL inference for partially identified models: Large deviations optimality and bootstrap validity” Journal of Econometrics, 156(2), 408 -- 425.
  10. 10.0 10.1 Romano, J.P., and A.M. Shaikh (2010): “Inference for the Identified Set in Partially Identified Econometric Models” Econometrica, 78(1), 169--211.
  11. 11.0 11.1 Canay, I.A., and A.M. Shaikh (2017): “Practical and Theoretical Advances in Inference for Partially Identified Models” in Advances in Economics and Econometrics: Eleventh World Congress, ed. by B.Honoré, A.Pakes, M.Piazzesi, and L.Samuelson, vol.2 of Econometric Society Monographs, p. 271–306. Cambridge University Press.
  12. 12.0 12.1 12.2 12.3 Molchanov, I., and F.Molinari (2018): Random Sets in Econometrics. Econometric Society Monograph Series, Cambridge University Press, Cambridge UK.
  13. 13.0 13.1 Molchanov, I. (2017): Theory of Random Sets. Springer, London, 2 edn.
  14. Hansen, L.P., J.Heaton, and E.G.J. Luttmer (1995): “Econometric Evaluation of Asset Pricing Models” The Review of Financial Studies, 8(2), 237--274.
  15. 15.0 15.1 15.2 15.3 Molchanov, I. (1998): “A limit theorem for solutions of inequalities” Scandinavian Journal of Statistics, 25, 235--242.
  16. 16.0 16.1 Chernozhukov, V., E.Kocatulum, and K.Menzel (2015): “Inference on sets in finance” Quantitative Economics, 6(2), 309--358.
  17. Hansen, L.P., and R.Jagannathan (1991): “Implications of Security Market Data for Models of Dynamic Economies” Journal of Political Economy, 99(2), 225--262.
  18. Markowitz, H. (1952): “Portfolio selection” Journal of Finance, 7, 77--91.
  19. Chetty, R. (2012): “Bounds on elasticities with optimization frictions: a synthesis of micro and macro evidence in labor supply” Econometrica, 80(3), 969--1018.
  20. Redner, R. (1981): “Note on the Consistency of the Maximum Likelihood Estimate for Nonidentifiable Distributions” The Annals of Statistics, 9(1), 225--228.
  21. 21.0 21.1 {Kaido}, H., F.{Molinari}, and J.{Stoye} (2019b): “{Constraint Qualifications in Partial Identification}” working paper, available at https://arxiv.org/pdf/1908.09103.pdf.
  22. Bazaraa, M.S., H.D. Sherali, and C.Shetty (2006): Nonlinear programming: theory and algorithms. Hoboken, N.J. : Wiley-Interscience, 3rd edn.
  23. Yildiz, N. (2012): “Consistency of plug-in estimators of upper contour and level sets” Econometric Theory, 28(2), 309--327.
  24. Menzel, K. (2014): “Consistent estimation with many moment inequalities” Journal of Econometrics, 182(2), 329 -- 350.
  25. 25.0 25.1 Armstrong, T.B. (2014): “Weighted KS statistics for inference on conditional moment inequalities” Journal of Econometrics, 181(2), 92 -- 116.
  26. 26.0 26.1 Armstrong, T.B. (2015): “Asymptotically exact inference in conditional moment inequality models” Journal of Econometrics, 186(1), 51 -- 65.
  27. 27.0 27.1 27.2 27.3 27.4 27.5 27.6 27.7 27.8 Beresteanu, A., and F.Molinari (2008): “Asymptotic Properties for a Class of Partially Identified Models” Econometrica, 76(4), 763--814.
  28. 28.0 28.1 28.2 28.3 Bontemps, C., T.Magnac, and E.Maurin (2012): “Set identified linear models” Econometrica, 80(3), 1129--1155.
  29. 29.0 29.1 29.2 Chandrasekhar, A., V.Chernozhukov, F.Molinari, and P.Schrimpf (2018): “Best linear approximations to set identified functions: with an application to the gender wage gap” CeMMAP working paper CWP09/19, available at https://www.cemmap.ac.uk/publication/id/13913.
  30. Magnac, T., and E.Maurin (2008): “Partial Identification in Monotone Binary Models: Discrete Regressors and Interval Data” The Review of Economic Studies, 75(3), 835--864.
  31. 31.0 31.1 31.2 Kaido, H., and A.Santos (2014): “Asymptotically efficient estimation of models defined by convex moment inequalities” Econometrica, 82(1), 387--413.
  32. Bickel, P.J., C.A. Klaassen, Y.Ritov, and J.A. Wellner (1993): Efficient and Adaptive Estimation for Semiparametric Models. Springer, New York.
  33. 33.0 33.1 33.2 33.3 Kaido, H. (2016): “A dual approach to inference for partially identified econometric models” Journal of Econometrics, 192(1), 269 -- 290.
  34. 34.0 34.1 34.2 Kitagawa, T., and R.Giacomini (2018): “Robust Bayesian inference for set-identified models” CeMMAP working paper CWP61/18, available at https://www.cemmap.ac.uk/publication/id/13675.
  35. 35.0 35.1 Chernozhukov, V., S.Lee, and A.M. Rosen (2013): “Intersection Bounds: estimation and inference” Econometrica, 81(2), 667--737.
  36. Horowitz, J.L., and C.F. Manski (1998): “Censoring of outcomes and regressors due to survey nonresponse: Identification and estimation using weights and imputations” Journal of Econometrics, 84(1), 37 -- 58.
  37. Horowitz, J.L., and C.F. Manski (2000): “Nonparametric Analysis of Randomized Experiments with Missing Covariate and Outcome Data” Journal of the American Statistical Association, 95(449), 77--84.
  38. Bugni, F.A. (2010): “Bootstrap inference in partially identified models defined by moment inequalities: coverage of the identified set” Econometrica, 78(2), 735--753.
  39. Galichon, A., and M.Henry (2013): “Dilation bootstrap” Journal of Econometrics, 177(1), 109 -- 115.
  40. Henry, M., R.Méango, and M.Queyranne (2015): “Combinatorial approach to inference in partially identified incomplete structural models” Quantitative Economics, 6(2), 499--529.
  41. Fang, Z., and A.Santos (2018): “{Inference on Directionally Differentiable Functions}” The Review of Economic Studies, 86(1), 377--412.
  42. Adusumilli, K., and T.Otsu (2017): “{Empirical Likelihood for Random Sets}” Journal of the American Statistical Association, 112(519), 1064--1075.
  43. 43.0 43.1 43.2 43.3 43.4 Imbens, G.W., and C.F. Manski (2004): “Confidence Intervals for Partially Identified Parameters” Econometrica, 72(6), 1845--1857.
  44. Henry, M., and A.Onatski (2012): “Set coverage and robust policy” Economics Letters, 115(2), 256 -- 257.
  45. Manski, C.F., and F.Molinari (2010): “Rounding Probabilistic Expectations in Surveys” Journal of Business and Economic Statistics, 28(2), 219--231.
  46. Giustinelli, P., C.F. Manski, and F.Molinari (2019a): “Precise or Imprecise Probabilities? Evidence from survey response on dementia and long-term care” NBER Working Paper 26125, available at https://www.nber.org/papers/w26125.
  47. 47.0 47.1 Chen, X., T.M. Christensen, and E.Tamer (2018): “MCMC Confidence Sets for Identified Sets” Econometrica, 86(6), 1965--2018.
  48. 48.0 48.1 48.2 Stoye, J. (2009): “More on Confidence Intervals for Partially Identified Parameters” Econometrica, 77(4), 1299--1315.
  49. Andrews, D. W.K., and P.J. Barwick (2012): “Inference for parameters defined by moment inequalities: a recommended moment selection procedure” Econometrica, 80(6), 2805--2826.
  50. Romano, J.P., A.M. Shaikh, and M.Wolf (2014): “A practical two-step method for testing moment inequalities” Econometrica, 82(5), 1979--2002.
  51. Andrews, D. W.K., and X.Shi (2013): “Inference based on conditional moment inequalities” Econometrica, 81(2), 609--666.
  52. Lee, S., K.Song, and Y.-J. Whang (2013): “Testing functional inequalities” Journal of Econometrics, 172(1), 14 -- 32.
  53. Armstrong, T.B., and H.P. Chan (2016): “Multiscale adaptive inference on conditional moment inequalities” Journal of Econometrics, 194(1), 24 -- 43.
  54. Chetverikov, D. (2018): “{Adaptive Test of Conditional Moment Inequalities}” Econometric Theory, 34(1), 186–227.
  55. Chernozhukov, V., D.Chetverikov, and K.Kato (2018): “Inference on causal and structural parameters using many moment inequalities” Review of Economic Studies, forthcoming, available at https://doi.org/10.1093/restud/rdy065.
  56. Andrews, D. W.K., and X.Shi (2017): “Inference based on many conditional moment inequalities” Journal of Econometrics, 196(2), 275 -- 287.
  57. Bugni, F.A., I.A. Canay, and P.Guggenberger (2012): “Distortions of Asymptotic Confidence Size in Locally Misspecified Moment Inequality Models” Econometrica, 80(4), 1741--1768.
  58. Bugni, F.A. (2016): “Comparison of inferential methods in partially identifies models in terms of error in coverage probability” Econometric Theory, 32(1), 187–242.
  59. 59.0 59.1 Bugni, F.A., I.A. Canay, and X.Shi (2017): “Inference for subvectors and other functions of partially identified parameters in moment inequality models” Quantitative Economics, 8(1), 1--38.
  60. Belloni, A., F.A. Bugni, and V.Chernozhukov (2018): “Subvector inference in partially identified models with many moment inequalities” available at https://arxiv.org/abs/1806.11466.
  61. {Kaido}, H., F.{Molinari}, and J.{Stoye} (2019a): “{Confidence Intervals for Projections of Partially Identified Parameters}” Econometrica, 87(4), 1397--1432.
  62. Moon, H.R., and F.Schorfheide (2012): “Bayesian and frequentist inference in partially identified models” Econometrica, 80(2), 755--782.
  63. Norets, A., and X.Tang (2014): “{Semiparametric Inference in Dynamic Binary Choice Models}” The Review of Economic Studies, 81(3), 1229--1262.
  64. Kline, B., and E.Tamer (2016): “Bayesian inference in a class of partially identified models” Quantitative Economics, 7(2), 329--366.
  65. Liao, Y., and A.Simoni (2019): “Bayesian inference for partially identified smooth convex models” Journal of Econometrics, 211(2), 338 -- 360.