Revision as of 01:34, 30 May 2024 by Bot

Estimation and Inference

[math] \newcommand{\edis}{\stackrel{d}{=}} \newcommand{\fd}{\stackrel{f.d.}{\rightarrow}} \newcommand{\dom}{\operatorname{dom}} \newcommand{\eig}{\operatorname{eig}} \newcommand{\epi}{\operatorname{epi}} \newcommand{\lev}{\operatorname{lev}} \newcommand{\card}{\operatorname{card}} \newcommand{\comment}{\textcolor{Green}} \newcommand{\B}{\mathbb{B}} \newcommand{\C}{\mathbb{C}} \newcommand{\G}{\mathbb{G}} \newcommand{\M}{\mathbb{M}} \newcommand{\N}{\mathbb{N}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\T}{\mathbb{T}} \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\W}{\mathbb{W}} \newcommand{\bU}{\mathfrak{U}} \newcommand{\bu}{\mathfrak{u}} \newcommand{\bI}{\mathfrak{I}} \newcommand{\cA}{\mathcal{A}} \newcommand{\cB}{\mathcal{B}} \newcommand{\cC}{\mathcal{C}} \newcommand{\cD}{\mathcal{D}} \newcommand{\cE}{\mathcal{E}} \newcommand{\cF}{\mathcal{F}} \newcommand{\cG}{\mathcal{G}} \newcommand{\cg}{\mathcal{g}} \newcommand{\cH}{\mathcal{H}} \newcommand{\cI}{\mathcal{I}} \newcommand{\cJ}{\mathcal{J}} \newcommand{\cK}{\mathcal{K}} \newcommand{\cL}{\mathcal{L}} \newcommand{\cM}{\mathcal{M}} \newcommand{\cN}{\mathcal{N}} \newcommand{\cO}{\mathcal{O}} \newcommand{\cP}{\mathcal{P}} \newcommand{\cQ}{\mathcal{Q}} \newcommand{\cR}{\mathcal{R}} \newcommand{\cS}{\mathcal{S}} \newcommand{\cT}{\mathcal{T}} \newcommand{\cU}{\mathcal{U}} \newcommand{\cu}{\mathcal{u}} \newcommand{\cV}{\mathcal{V}} \newcommand{\cW}{\mathcal{W}} \newcommand{\cX}{\mathcal{X}} \newcommand{\cY}{\mathcal{Y}} \newcommand{\cZ}{\mathcal{Z}} \newcommand{\sF}{\mathsf{F}} \newcommand{\sM}{\mathsf{M}} \newcommand{\sG}{\mathsf{G}} \newcommand{\sT}{\mathsf{T}} \newcommand{\sB}{\mathsf{B}} \newcommand{\sC}{\mathsf{C}} \newcommand{\sP}{\mathsf{P}} \newcommand{\sQ}{\mathsf{Q}} \newcommand{\sq}{\mathsf{q}} \newcommand{\sR}{\mathsf{R}} \newcommand{\sS}{\mathsf{S}} \newcommand{\sd}{\mathsf{d}} \newcommand{\cp}{\mathsf{p}} \newcommand{\cc}{\mathsf{c}} \newcommand{\cf}{\mathsf{f}} \newcommand{\eU}{{\boldsymbol{U}}} \newcommand{\eb}{{\boldsymbol{b}}} \newcommand{\ed}{{\boldsymbol{d}}} \newcommand{\eu}{{\boldsymbol{u}}} \newcommand{\ew}{{\boldsymbol{w}}} \newcommand{\ep}{{\boldsymbol{p}}} \newcommand{\eX}{{\boldsymbol{X}}} \newcommand{\ex}{{\boldsymbol{x}}} \newcommand{\eY}{{\boldsymbol{Y}}} \newcommand{\eB}{{\boldsymbol{B}}} \newcommand{\eC}{{\boldsymbol{C}}} \newcommand{\eD}{{\boldsymbol{D}}} \newcommand{\eW}{{\boldsymbol{W}}} \newcommand{\eR}{{\boldsymbol{R}}} \newcommand{\eQ}{{\boldsymbol{Q}}} \newcommand{\eS}{{\boldsymbol{S}}} \newcommand{\eT}{{\boldsymbol{T}}} \newcommand{\eA}{{\boldsymbol{A}}} \newcommand{\eH}{{\boldsymbol{H}}} \newcommand{\ea}{{\boldsymbol{a}}} \newcommand{\ey}{{\boldsymbol{y}}} \newcommand{\eZ}{{\boldsymbol{Z}}} \newcommand{\eG}{{\boldsymbol{G}}} \newcommand{\ez}{{\boldsymbol{z}}} \newcommand{\es}{{\boldsymbol{s}}} \newcommand{\et}{{\boldsymbol{t}}} \newcommand{\ev}{{\boldsymbol{v}}} \newcommand{\ee}{{\boldsymbol{e}}} \newcommand{\eq}{{\boldsymbol{q}}} \newcommand{\bnu}{{\boldsymbol{\nu}}} \newcommand{\barX}{\overline{\eX}} \newcommand{\eps}{\varepsilon} \newcommand{\Eps}{\mathcal{E}} \newcommand{\carrier}{{\mathfrak{X}}} \newcommand{\Ball}{{\mathbb{B}}^{d}} \newcommand{\Sphere}{{\mathbb{S}}^{d-1}} \newcommand{\salg}{\mathfrak{F}} \newcommand{\ssalg}{\mathfrak{B}} \newcommand{\one}{\mathbf{1}} \newcommand{\Prob}[1]{\P\{#1\}} \newcommand{\yL}{\ey_{\mathrm{L}}} \newcommand{\yU}{\ey_{\mathrm{U}}} \newcommand{\yLi}{\ey_{\mathrm{L}i}} \newcommand{\yUi}{\ey_{\mathrm{U}i}} \newcommand{\xL}{\ex_{\mathrm{L}}} \newcommand{\xU}{\ex_{\mathrm{U}}} \newcommand{\vL}{\ev_{\mathrm{L}}} \newcommand{\vU}{\ev_{\mathrm{U}}} \newcommand{\dist}{\mathbf{d}} \newcommand{\rhoH}{\dist_{\mathrm{H}}} \newcommand{\ti}{\to\infty} \newcommand{\comp}[1]{#1^\mathrm{c}} \newcommand{\ThetaI}{\Theta_{\mathrm{I}}} \newcommand{\crit}{q} \newcommand{\CS}{CS_n} \newcommand{\CI}{CI_n} \newcommand{\cv}[1]{\hat{c}_{n,1-\alpha}(#1)} \newcommand{\idr}[1]{\mathcal{H}_\sP[#1]} \newcommand{\outr}[1]{\mathcal{O}_\sP[#1]} \newcommand{\idrn}[1]{\hat{\mathcal{H}}_{\sP_n}[#1]} \newcommand{\outrn}[1]{\mathcal{O}_{\sP_n}[#1]} \newcommand{\email}[1]{\texttt{#1}} \newcommand{\possessivecite}[1]{\ltref name="#1"\gt\lt/ref\gt's \citeyear{#1}} \newcommand\xqed[1]{% \leavevmode\unskip\penalty9999 \hbox{}\nobreak\hfill \quad\hbox{#1}} \newcommand\qedex{\xqed{$\triangle$}} \newcommand\independent{\protect\mathpalette{\protect\independenT}{\perp}} \DeclareMathOperator{\Int}{Int} \DeclareMathOperator{\conv}{conv} \DeclareMathOperator{\cov}{Cov} \DeclareMathOperator{\var}{Var} \DeclareMathOperator{\Sel}{Sel} \DeclareMathOperator{\Bel}{Bel} \DeclareMathOperator{\cl}{cl} \DeclareMathOperator{\sgn}{sgn} \DeclareMathOperator{\essinf}{essinf} \DeclareMathOperator{\esssup}{esssup} \newcommand{\mathds}{\mathbb} \renewcommand{\P}{\mathbb{P}} [/math]

\label{sec:inference}

Framework and Scope of the Discussion

The identification analysis carried out in [[guide:Ec36399528#sec:prob:distr |Sections-]] presumes knowledge of the joint distribution [math]\sP[/math] of the observable variables. That is, it presumes that [math]\sP[/math] can be learned with certainty from observation of the entire population. In practice, one observes a sample of size [math]n[/math] drawn from [math]\sP[/math]. For simplicity I assume it to be a random sample.Cite error: Closing </ref> missing for <ref> tag for a treatment of inference with dependent observations. [1] study inference in games of complete information as in Identification Problem, imposing the i.i.d. assumption on the unobserved payoff shifters [math]\{\eps_{i1},\eps_{i2}\}_{i=1}^n[/math]. The authors note that because the selection mechanism picking the equilibrium played in the regions of multiplicity (see Section Static, Simultaneous-Move Finite Games with Multiple Equilibria) is left completely unspecified and may be arbitrarily correlated across markets, the resulting observed variables [math]\{\ew_i\}_{i=1}^n[/math] may not be independent and identically distributed, and they propose an inference method to address this issue.</ref> Statistical inference on [math]\idr{\theta}[/math] needs to be conducted using knowledge of [math]\sP_n[/math], the empirical distribution of the observable outcomes and covariates. Because [math]\idr{\theta}[/math] is not a singleton, this task is particularly delicate. To start, care is required to choose a proper notion of consistency for a set estimator [math]\idrn{\theta}[/math] and to obtain palatable conditions under which such consistency attains. Next, the asymptotic behavior of statistics designed to test hypothesis or build confidence sets for [math]\idr{\theta}[/math] or for [math]\vartheta\in\idr{\theta}[/math] might change with [math]\vartheta[/math], creating technical challenges for the construction of confidence sets that are not encountered when [math]\theta[/math] is point identified. Many of the sharp identification regions derived in [[guide:Ec36399528#sec:prob:distr |Sections-]] can be written as collections of vectors [math]\vartheta\in\Theta[/math] that satisfy conditional or unconditional moment (in)equalities. For simplicity, I assume that [math]\Theta[/math] is a compact and convex subset of [math]\R^d[/math], and I use the formalization for the case of a finite number of unconditional moment (in)equalities:

[[math]] \begin{align} \idr{\theta}=\{\vartheta\in\Theta: \E_\sP(m_j(\ew_i;\vartheta))&\le 0~\forall j\in\cJ_1,~ \E_\sP(m_j(\ew_i;\vartheta))=0~\forall j\in\cJ_2\}.\label{eq:sharp_id_for_inference} \end{align} [[/math]]

In \eqref{eq:sharp_id_for_inference}, [math]\ew_i\in\cW\subseteq\R^{d_\cW}[/math] is a random vector collecting all observable variables, with [math]\ew\sim\sP[/math]; [math]m_j:\cW\times\Theta\to\R[/math], [math]j\in\cJ\equiv\cJ_1\cup\cJ_2[/math], are known measurable functions characterizing the model; and [math]\cJ[/math] is a finite set equal to [math]\{1,\dots,|\cJ|\}[/math].Cite error: Closing </ref> missing for <ref> tag and [2](Supplementary Appendix B) for inference methods in the presence of a continuum of conditional moment (in)equalities.</ref> Instances where [math]\idr{\theta}[/math] is characterized through a finite number of conditional moment (in)equalities and the conditioning variables have finite support can easily be recast as in \eqref{eq:sharp_id_for_inference}.Cite error: Closing </ref> missing for <ref> tag, [3], [4], [5], [6][7], [8], [9], and [10], for inference methods in the case that the conditioning variables have a continuous distribution.</ref> Consider, for example, the two player entry game model in Identification Problem on p.~\pageref{IP:entry_game}, where [math]\ew=(\ey_1,\ey_2,\ex_1,\ex_2)[/math]. Using (in)equalities eq:CT_00-eq:CT_01L and assuming that the distribution of [math](\ex_1,\ex_2)[/math] has [math]\bar{k}[/math] points of support, denoted [math](x_{1,k},x_{2,k}),k=1,\dots,\bar{k}[/math], we have [math]|\cJ|=4\bar{k}[/math] and for [math]k=1,\dots,\bar{k}[/math],[Notes 1]

[[math]] \begin{align*} m_{4k-3}(\ew_i;\vartheta)&=[\one((\ey_1,\ey_2)=(0,0))-\Phi((-\infty,-\ex_1b_1),(-\infty,-\ex_2b_2);r)]\one((\ex_1,\ex_2)=(x_{1,k},x_{2,k}))\\ m_{4k-2}(\ew_i;\vartheta)&=[\one((\ey_1,\ey_2)=(1,1))-\Phi([-\ex_1b_1-d_1,\infty),[-\ex_2b_2-d_2,\infty);r)]\one((\ex_1,\ex_2)=(x_{1,k},x_{2,k}))\\ m_{4k-1}(\ew_i;\vartheta)&=[\one((\ey_1,\ey_2)=(0,1))-\Phi((-\infty,-\ex_1b_1-d_1),(-\ex_2b_2,\infty);r)]\one((\ex_1,\ex_2)=(x_{1,k},x_{2,k}))\\ m_{4k}(\ew_i;\vartheta)&=\Big[\one((\ey_1,\ey_2)=(0,1))-\Big\{\Phi((-\infty,-\ex_1b_1-d_1),(-\ex_2b_2,\infty);r)\notag\\ &\quad\quad-\Phi((-\ex_1b_1,-\ex_1b_1-d_1),(-\ex_2b_2,-\ex_2b_2-d_2);r)\Big\}\Big]\one((\ex_1,\ex_2)=(x_{1,k},x_{2,k})). \end{align*} [[/math]]


In point identified moment equality models it has been common to conduct estimation and inference using a criterion function that aggregates moment violations [11]. [12] adapt this idea to the partially identified case, through a criterion function [math]\crit_\sP:\Theta\to\R_+[/math] such that [math]\crit_\sP(\vartheta)=0[/math] if and only if [math]\vartheta\in\idr{\theta}[/math]. Many criterion functions can be used (see, e.g. [12][13][14][15][16][17][18][19][20]). Some simple and commonly employed ones include

[[math]] \begin{align} \crit_{\sP,\mathrm{sum}}(\vartheta) &= \sum_{j\in\cJ_1}\left[\frac{\E_\sP(m_j(\ew_i;\vartheta))}{\sigma_{\sP,j}(\vartheta)}\right]_+^2 + \sum_{j\in\cJ_2}\left[\frac{\E_\sP(m_j(\ew_i;\vartheta))}{\sigma_{\sP,j}(\vartheta)}\right]^2,\label{eq:criterion_fn_sum}\\ \crit_{\sP,\mathrm{max}}(\vartheta) &= \max\left\{\max_{j\in\cJ_1}\left[\frac{\E_\sP(m_j(\ew_i;\vartheta))}{\sigma_{\sP,j}(\vartheta)}\right]_+,\max_{j\in\cJ_2}\left|\frac{\E_\sP(m_j(\ew_i;\vartheta))}{\sigma_{\sP,j}(\vartheta)}\right|\right\}^2,\label{eq:criterion_fn_max} \end{align} [[/math]]

where [math][x]_+=\max\{x,0\}[/math] and [math]\sigma_{\sP,j}(\vartheta)[/math] is the population standard deviation of [math]m_j(\ew_i;\vartheta)[/math]. In \eqref{eq:criterion_fn_sum}-\eqref{eq:criterion_fn_max} the moment functions are standardized, as doing so is important for statistical power (see, e.g., [18](p. 127)). To simplify notation, I omit the label and simply use [math]\crit_\sP(\vartheta)[/math]. Given the criterion function, one can rewrite \eqref{eq:sharp_id_for_inference} as

[[math]] \begin{align} \label{eq:define:idr} \idr{\theta}=\{\vartheta\in\Theta:\crit_\sP(\vartheta)=0\}.\end{align} [[/math]]


To keep this chapter to a manageable length, I focus my discussion of statistical inference exclusively on consistent estimation and on different notions of coverage that a confidence set may be required to satisfy and that have proven useful in the literature.[Notes 2] The topics of test of hypotheses and construction of confidence sets in partially identified models are covered in [21], who provide a comprehensive survey devoted entirely to them in the context of moment inequality models. [22](Chapters 4 and 5) provide a thorough discussion of related methods based on the use of random set theory.

Consistent Estimation

When the identified object is a set, it is natural that its estimator is also a set. In order to discuss statistical properties of a set-valued estimator [math]\idrn{\theta}[/math] (to be defined below), and in particular its consistency, one needs to specify how to measure the distance between [math]\idrn{\theta}[/math] and [math]\idr{\theta}[/math]. Several distance measures among sets exist (see, e.g., [23](Appendix D)). A natural generalization of the commonly used Euclidean distance is the Hausdorff distance, see Definition, which for given [math]A,B\subset\R^d[/math] can be written as

[[math]] \begin{align*} \dist_H(A,B) = \inf\Big\{r \gt 0:\; A\subseteq B^r,\; B\subseteq A^r\Big\}=\max\left\{\sup_{a \in A} \dist(a,B), \sup_{b \in B} \dist(b,A) \right\},\end{align*} [[/math]]

with [math]\dist(a,B)\equiv\inf_{b\in B}\Vert a-b\Vert[/math].[Notes 3] In words, the Hausdorff distance between two sets measures the furthest distance from an arbitrary point in one of the sets to its closest neighbor in the other set. It is easy to verify that [math]\dist_H[/math] metrizes the family of non-empty compact sets; in particular, given non-empty compact sets [math]A,B\subset\R^d[/math], [math]\dist_H(A,B) =0[/math] if and only if [math]A=B[/math]. If either [math]A[/math] or [math]B[/math] is empty, [math]\dist_H(A,B) =\infty[/math]. The use of the Hausdorff distance to conceptualize consistency of set valued estimators in econometrics was proposed by [24](Section 2.4) and [12](Section 3.2).Cite error: Closing </ref> missing for <ref> tag[25].</ref>

Definition (Hausdorff Consistency)

An estimator [math]\idrn{\theta}[/math] is consistent for [math]\idr{\theta}[/math] if

[[math]] \begin{align*} \dist_H(\idrn{\theta},\idr{\theta}) \stackrel{p}{\rightarrow} 0 ~\text{as } n\to \infty. \end{align*} [[/math]]

[26] establishes Hausdorff consistency of a plug-in estimator of the set [math]\{\vartheta\in\Theta:g_\sP(\vartheta)\le 0\}[/math], with [math]g_\sP:\cW\times\Theta \to \R[/math] a lower semicontinuous function of [math]\vartheta\in\Theta[/math] that can be consistently estimated by a lower semicontinuous function [math]g_n[/math] uniformly over [math]\Theta[/math]. The set estimator is [math]\{\vartheta\in\Theta:g_n(\vartheta)\le 0\}[/math]. The fundamental assumption in [26] is that [math]\{\vartheta\in\Theta:g_\sP(\vartheta)\le 0\}\subseteq\cl(\{\vartheta\in\Theta:g_\sP(\vartheta) \lt 0\})[/math], see [22](Section 5.2) for a discussion. There are important applications where this condition holds. [27] provide results related to [26], as well as important extensions for the construction of confidence sets, and show that these can be applied to carry out statistical inference on the Hansen–Jagannathan sets of admissible stochastic discount factors [28], the Markowitz–Fama mean–variance sets for asset portfolio returns [29], and the set of structural elasticities in [30]'s analysis of demand with optimization frictions. However, these methods are not broadly applicable in the general moment (in)equalities framework of this section, as [26]'s key condition generally fails for the set [math]\idr{\theta}[/math] in \eqref{eq:define:idr}.\medskip

Criterion Function Based Estimators

[12] extend the standard theory of extremum estimation of point identified parameters to partial identification, and propose to estimate [math]\idr{\theta}[/math] using the collection of values [math]\vartheta\in\Theta[/math] that approximately minimize a sample analog of [math]\crit_\sP[/math]:

[[math]] \begin{align} \idrn{\theta}=\left\{\vartheta\in\Theta:\crit_n(\vartheta)\le \inf_{\tilde\vartheta\in\Theta}\crit_n(\tilde\vartheta)+\tau_n\right\},\label{eq:define:idrn} \end{align} [[/math]]

with [math]\tau_n[/math] a sequence of non-negative random variables such that [math]\tau_n\stackrel{p}{\rightarrow} 0[/math]. In \eqref{eq:define:idrn}, [math]\crit_n(\vartheta)[/math] is a sample analog of [math]\crit_\sP(\vartheta)[/math] that replaces [math]\E_\sP(m_j(\ew_i;\vartheta))[/math] and [math]\sigma_{\sP,j}(\vartheta)[/math] in \eqref{eq:criterion_fn_sum}-\eqref{eq:criterion_fn_max} with properly chosen estimators, e.g.,

[[math]] \begin{align*} \bar m_{n,j}(\vartheta) &\equiv {\frac{1}{n}\sum_{i=1}^n m_j(\ew_i,\vartheta)},~j=1,\dots, |\cJ| \\ \hat{\sigma}_{n,j}(\vartheta) &\equiv {\left(\frac{1}{n}\sum_{i=1}^n [m_j(\ew_i,\vartheta)]^2-[\bar m_{n,j}(\vartheta)]^2\right)^{1/2}},~j=1,\dots, |\cJ|. \end{align*} [[/math]]


It can be shown that as long as [math]\tau_n=o_p(1)[/math], under the same assumptions used to prove consistency of extremum estimators of point identified parameters (e.g., with uniform convergence of [math]\crit_n[/math] to [math]\crit_\sP[/math] and continuity of [math]\crit_\sP[/math] on [math]\Theta[/math]),

[[math]] \begin{align} \sup_{\vartheta \in \idrn{\theta}} \inf_{\tilde\vartheta \in \idr{\theta}} \Vert \vartheta-\tilde\vartheta \Vert\stackrel{p}{\rightarrow} 0~\text{as } n\to \infty.\label{eq:inner_consistent} \end{align} [[/math]]

This yields that asymptotically each point in [math]\idrn{\theta}[/math] is arbitrarily close to a point in [math]\idr{\theta}[/math], or more formally that [math]\sP(\idrn{\theta}\subseteq\idr{\theta})\to 1[/math]. I refer to \eqref{eq:inner_consistent} as inner consistency henceforth.Cite error: Closing </ref> missing for <ref> tag(Theorem 1) for a pedagogically helpful proof for a semiparametric binary model.</ref> [31] provides an early contribution establishing this type of inner consistency for maximum likelihood estimators when the true parameter is not point identified. However, Hausdorff consistency requires also that

[[math]] \begin{align*} \sup_{\vartheta \in \idr{\theta}} \inf_{\tilde\vartheta \in \idrn{\theta}} \Vert \vartheta-\tilde\vartheta \Vert\stackrel{p}{\rightarrow} 0~\text{as } n\to \infty, \end{align*} [[/math]]

i.e., that each point in [math]\idr{\theta}[/math] is arbitrarily close to a point in [math]\idrn{\theta}[/math], or more formally that [math]\sP(\idr{\theta}\subseteq\idrn{\theta})\to 1[/math]. To establish this result for the sharp identification regions in Theorem SIR- (parametric regression with interval covariate) and Theorem SIR- (semiparametric binary model with interval covariate), [12](Propositions 3 and 5) require the rate at which [math]\tau_n\stackrel{p}{\rightarrow} 0[/math] to be slower than the rate at which [math]\crit_n[/math] converges uniformly to [math]\crit_\sP[/math] over [math]\Theta[/math]. What might go wrong in the absence of such a restriction? A simple example can help understand the issue. Consider a model with linear inequalities of the form

[[math]] \begin{align*} \theta_1 &\le \E_\sP(\ew_1),\\ -\theta_1 &\le \E_\sP(\ew_2),\\ \theta_2 &\le \E_\sP(\ew_3)+ \E_\sP(\ew_4)\theta_1,\\ -\theta_2 &\le \E_\sP(\ew_5)+ \E_\sP(\ew_6)\theta_1. \end{align*} [[/math]]

Suppose [math]\ew\equiv(\ew_1,\dots,\ew_6)[/math] is distributed multivariate normal, with [math]\E_\sP(\ew)=[6~0~2~0~{-2}~0]^\top[/math] and [math]\cov_\sP(\ew)[/math] equal to the identity matrix. Then [math]\idr{\theta}=\{\vartheta=[\vartheta_1~\vartheta_2]^\top\in\Theta:\vartheta_1\in[0,6]~\text{and}~\vartheta_2=2\}[/math]. However, with positive probability in any finite sample [math]\crit_n(\vartheta)=0[/math] for [math]\vartheta[/math] in a random region (e.g., a triangle if [math]\crit_n[/math] is the sample analog of \eqref{eq:criterion_fn_max}) that only includes points that are close to a subset of the points in [math]\idr{\theta}[/math]. Hence, with positive probability the minimizer of [math]\crit_n[/math] cycles between consistent estimators of subsets of [math]\idr{\theta}[/math], but does not estimate the entire set. Enlarging the estimator to include all points that are close to minimizing [math]\crit_n[/math] up to a tolerance that converges to zero sufficiently slowly removes this problem.\medskip [13] significantly generalize the consistency results in [12]. They work with a normalized criterion function equal to [math]\crit_n(\vartheta)-\inf_{\tilde\vartheta\in\Theta}\crit_n(\tilde\vartheta)[/math], but to keep notation light I simply refer to it as [math]\crit_n[/math].[Notes 4] Under suitable regularity conditions, they establish consistency of an estimator that can be a smaller set than the one proposed by [12], and derive its convergence rate. Some of the key conditions required by [13](Conditions C1 and C2) to study convergence rates include that [math]\crit_n[/math] is lower semicontinuous in [math]\vartheta[/math], satisfies various convergence properties among which [math]\sup_{\vartheta\in\idr{\theta}}\crit_n=O_p(1/a_n)[/math] for a sequence of normalizing constants [math]a_n\to\infty[/math], that [math]\tau_n\ge \sup_{\vartheta\in\idr{\theta}}\crit_n(\vartheta)[/math] with probability approaching one, and that [math]\tau_n\to 0[/math]. They also require that there exist positive constants [math](\delta,\kappa,\gamma)[/math] such that for any [math]\epsilon\in(0,1)[/math] there are [math](d_\epsilon,n_\epsilon)[/math] such that

[[math]] \begin{align*} \forall n\ge n_\epsilon, \, \crit_n(\vartheta)\ge\kappa[\min\{\delta,\dist(\vartheta,\idr{\theta})\}]^\gamma \end{align*} [[/math]]

uniformly on [math]\{\vartheta\in\Theta:\dist(\vartheta,\idr{\theta})\ge(d_\epsilon/a_n)^{1/\gamma}\}[/math] with probability at least [math]1-\epsilon[/math]. In words, the assumption, referred to as polynomial minorant condition, rules out that [math]\crit_n[/math] can be arbitrarily close to zero outside [math]\idr{\theta}[/math]. It posits that [math]\crit_n[/math] changes as at least a polynomial of degree [math]\gamma[/math] in the distance of [math]\vartheta[/math] from [math]\idr{\theta}[/math]. Under some additional regularity conditions, [13] establish that

[[math]] \begin{align} \dist_H(\idrn{\theta},\idr{\theta})=O_p(\max\{1/a_n,\tau_n\})^{1/\gamma}.\label{eq:CHT_rate} \end{align} [[/math]]


What is the role played by the polynomial minorant condition for the result in \eqref{eq:CHT_rate}? Under the maintained assumptions [math]\tau_n\ge \sup_{\vartheta\in\idr{\theta}}\crit_n(\vartheta)\ge\kappa[\min\{\delta,\dist(\vartheta,\idr{\theta})\}]^\gamma[/math], and the latter part of the inequality is used to obtain \eqref{eq:CHT_rate}. When could the polynomial minorant condition be violated? In moment (in)equalities models, [13] require [math]\gamma=2[/math].Cite error: Closing </ref> missing for <ref> tag(equation (4.1) and equation (4.6)) set [math]\gamma=1[/math] because they report the assumption for a criterion function that does not square the moment violations.</ref> Consider a simple stylized example with (in)equalities of the form

[[math]] \begin{align*} -\theta_1 &\le \E_\sP(\ew_1),\\ -\theta_2 &\le \E_\sP(\ew_2),\\ \theta_1\theta_2 &= \E_\sP(\ew_3), \end{align*} [[/math]]

with [math]\E_\sP(\ew_1)=\E_\sP(\ew_2)=\E_\sP(\ew_3)=0[/math], and note that the sample means [math](\bar{\ew}_1,\bar{\ew}_2,\bar{\ew}_3)[/math] are [math]\sqrt{n}[/math]-consistent estimators of [math](\E_\sP(\ew_1),\E_\sP(\ew_2),\E_\sP(\ew_3))[/math]. Suppose [math](\ew_1,\ew_2,\ew_3)[/math] are distributed multivariate standard normal. Consider a sequence [math]\vartheta_n=[\vartheta_{1n}~\vartheta_{2n}]^\top=[n^{-1/4}~n^{-1/4}]^\top[/math]. Then [math][\dist(\vartheta_n,\idr{\theta})]^\gamma=O_p(n^{-1/2})[/math]. On the other hand, with positive probability [math]\crit_n(\vartheta_n)=(\bar{\ew}_3-\vartheta_{1n}\vartheta_{2n})^2=O_p\left(n^{-1}\right)[/math], so that for [math]n[/math] large enough [math]\crit_n(\vartheta_n) \lt [\dist(\vartheta_n,\idr{\theta})]^\gamma[/math], violating the assumption. This occurs because the gradient of the moment equality vanishes as [math]\vartheta[/math] approaches zero, rendering the criterion function flat in a neighborhood of [math]\idr{\theta}[/math]. As intuition would suggest, rates of convergence are slower the flatter [math]\crit_n[/math] is outside [math]\idr{\theta}[/math]. [32] show that in moment inequality models with smooth moment conditions, the polynomial minorant assumption with [math]\gamma=2[/math] implies the Abadie constraint qualification (ACQ); see, e.g., [33](Chapter 5) for a definition and discussion of ACQ. The example just given to discuss failures of the polynomial minorant condition is in fact a known example where ACQ fails at [math]\vartheta=[0~0]^\top[/math]. [13](Condition C.3, referred to as degeneracy) also consider the case that [math]\crit_n[/math] vanishes on subsets of [math]\Theta[/math] that converge in Hausdorff distance to [math]\idr{\theta}[/math] at rate [math]a_n^{-1/\gamma}[/math]. While degeneracy might be difficult to verify in practice, [13] show that if it holds, [math]\tau_n[/math] can be set to zero. [34] provides conditions on the moment functions, which are closely related to constraint qualifications (as discussed in [32]) under which it is possible to set [math]\tau_n=0[/math]. [35] studies estimation of [math]\idr{\theta}[/math] when the number of moment inequalities is large relative to sample size (possibly infinite). He provides a consistency result for criterion-based estimators that use a number of unconditional moment inequalities that grows with sample size. He also considers estimators based on conditional moment inequalities, and derives the fastest possible rate for estimating [math]\idr{\theta}[/math] under smoothness conditions on the conditional moment functions. He shows that the rates achieved by the procedures in [6][7] are (minimax) optimal, and cannot be improved upon. \begin{BI} [12] extend the notion of extremum estimation from point identified to partially identified models. They do so by putting forward a generalized criterion function whose zero-level set can be used to define [math]\idr{\theta}[/math] in partially identified structural semiparametric models. It is then natural to define the set valued estimator [math]\idrn{\theta}[/math] as the collection of approximate minimizers of the sample analog of this criterion function. [12]'s analysis of statistical inference focuses exclusively on providing consistent estimators. [13] substantially generalize the analysis of consistency of criterion function-based set estimators. They provide a comprehensive study of convergence rates in partially identified models. Their work highlights the challenges a researcher faces in this context, and puts forward possible solutions in the form of assumptions under which specific rates of convergence attain. \end{BI}

Support Function Based Estimators

[36] introduce to the econometrics literature inference methods for set valued estimators based on random set theory. They study the class of models where [math]\idr{\theta}[/math] is convex and can be written as the Aumann (or selection) expectation of a properly defined random closed set.Cite error: Closing </ref> missing for <ref> tag. If [math]\idr{\theta}[/math] is not convex, [36]'s analysis applies to its convex hull.</ref> They propose to carry out estimation and inference leveraging the representation of convex sets through their support function (given in Definition), as it is done in random set theory; see [23](Chapter 3) and [22](Chapter 4). Because the support function fully characterizes the boundary of [math]\idr{\theta}[/math], it allows for a simple sample analog estimator, and for inference procedures with desirable properties. An example of a framework where the approach of [36] can be applied is that of best linear prediction with interval outcome data in Identification Problem.Cite error: Closing </ref> missing for <ref> tag(Supplementary Appendix F) establish that if [math]\ex[/math] has finite support, [math]\idr{\theta}[/math] in Theorem SIR- can be written as the collection of [math]\vartheta\in\Theta[/math] that satisfy a finite number of moment inequalities, as posited in this section.</ref> Recall that in that case, the researcher observes random variables [math](\yL,\yU,\ex)[/math] and wishes to learn the best linear predictor of [math]\ey|\ex[/math], with [math]\ey[/math] unobserved and [math]\sR(\yL\le\ey\le\yU)=1[/math]. For simplicity let [math]\ex[/math] be a scalar. Given a random sample [math]\{\yLi,\yUi,\ex_i\}_{i=1}^n[/math] from [math]\sP[/math], the researcher can construct a random segment [math]\eG_i[/math] for each [math]i[/math] and a consistent estimator [math]\hat{\Sigma}_n[/math] of the random matrix [math]\Sigma_\sP[/math] in eq:G_and_Sigma as

[[math]] \begin{align*} \eG_i=\left\{ \begin{pmatrix} \ey_i\\ \ey_i\ex_i \end{pmatrix}  :\; \ey_i \in \Sel(\eY_i)\right\}\subset\R^2, ~~\text{and}~~ \hat\Sigma_n= \begin{pmatrix} 1 & \overline\ex\\ \overline\ex & \overline{\ex^2} \end{pmatrix},\end{align*} [[/math]]

where [math]\eY_i=[\yLi,\yUi][/math] and [math]\overline\ex,\overline{\ex^2}[/math] are the sample means of [math]\ex_i[/math] and [math]\ex^2_i[/math] respectively. Because in this problem [math]\idr{\theta}=\Sigma_\sP^{-1}\E_\sP\eG[/math] (see Theorem SIR- on p.~\pageref{SIR:BLP_intervalY}), a natural sample analog estimator replaces [math]\Sigma_\sP[/math] with [math]\hat{\Sigma}_n[/math], and [math]\E_\sP\eG[/math] with a Minkowski average of [math]\eG_i[/math] (see Appendix, p.~\pageref{def:mink:sum} for a formal definition), yielding

[[math]] \begin{align} \idrn{\theta}=\hat\Sigma_n^{-1}\frac{1}{n}\sum_{i=1}^n\eG_i.\label{eq:BLP_estimator} \end{align} [[/math]]

The support function of [math]\idrn{\theta}[/math] is the sample analog of that of [math]\idr{\theta}[/math] provided in eq:supfun:BLP:

[[math]] \begin{align*} h_{\idrn{\theta}}(u)=\frac{1}{n}\sum_{i=1}^n[(\yLi\one(f(\ex_i,u) \lt 0)+\yUi\one(f(\ex_i,u)\ge 0))f(\ex_i,u)],~~u\in\mathbb{S}, \end{align*} [[/math]]

where [math]f(\ex_i,u)=[1~\ex_i]\hat\Sigma_n^{-1}u[/math]. [36] use the Law of Large Numbers for random sets reported in Theorem to show that [math]\idrn{\theta}[/math] in \eqref{eq:BLP_estimator} is [math]\sqrt{n}[/math]-consistent under standard conditions on the moments of [math](\yLi,\yUi,\ex_i)[/math]. [37] and [38] significantly expand the applicability of \possessivecite{ber:mol08} estimator. [37] show that it can be used in a large class of partially identified linear models, including ones that allow for the availability of instrumental variables. [38] show that it can be used for best linear approximation of any function [math]f(x)[/math] that is known to lie within two identified bounding functions. The lower and upper functions defining the band are allowed to be any functions, including ones carrying an index, and can be estimated parametrically or nonparametrically. The method allows for estimation of the parameters of the best linear approximations to the set identified functions in many of the identification problems described in Section. It can also be used to estimate the sharp identification region for the parameters of a binary choice model with interval or discrete regressors under the assumptions of [39], characterized in eq:SIR:mag:mau in Section Semiparametric Binary Choice Models with Interval Valued Covariates. [40] develop a theory of efficiency for estimators of sets [math]\idr{\theta}[/math] as in \eqref{eq:sharp_id_for_inference} under the additional requirements that the inequalities [math]\E_\sP(m_j(\ew,\vartheta))[/math] are convex in [math]\vartheta\in\Theta[/math] and smooth as functionals of the distribution of the data. Because of the convexity of the moment inequalities, [math]\idr{\theta}[/math] is convex and can be represented through its support function. Using the classic results in [41], [40] show that under suitable regularity conditions, the support function admits for [math]\sqrt{n}[/math]-consistent regular estimation. They also show that a simple plug-in estimator based on the support function attains the semiparametric efficiency bound, and the corresponding estimator of [math]\idr{\theta}[/math] minimizes a wide class of asymptotic loss functions based on the Hausdorff distance. As they establish, this efficiency result applies to the estimators proposed by [36], including that in \eqref{eq:BLP_estimator}, and by [37]. [42] further enlarges the applicability of the support function approach by establishing its duality with the criterion function approach, for the case that [math]\crit_\sP[/math] is a convex function and [math]\crit_n[/math] is a convex function almost surely. This allows one to use the support function approach also when a representation of [math]\idr{\theta}[/math] as the Aumann expectation of a random closed set is not readily available. [42] considers [math]\idr{\theta}[/math] and its level set estimator [math]\idrn{\theta}[/math] as defined, respectively, in \eqref{eq:define:idr} and \eqref{eq:define:idrn}, with [math]\Theta[/math] a convex subset of [math]\R^d[/math]. Because [math]\crit_\sP[/math] and [math]\crit_n[/math] are convex functions, [math]\idr{\theta}[/math] and [math]\idrn{\theta}[/math] are convex sets. Under the same assumptions as in [13], including the polynomial minorant and the degeneracy conditions, one can set [math]\tau_n=0[/math] and have [math]\dist_H(\idrn{\theta},\idr{\theta})=O_p(a_n^{-1/\gamma})[/math]. Moreover, due to its convexity, [math]\idr{\theta}[/math] is fully characterized by its support function, which in turn can be consistently estimated (at the same rate as [math]\idr{\theta}[/math]) using sample analogs as [math]h_{\idrn{\theta}}(u)=\max_{a_n\crit_n(\vartheta)\le 0}u^\top\vartheta[/math]. The latter can be computed via convex programming.\medskip [43] consider consistent estimation of [math]\idr{\theta}[/math] in the context of Bayesian inference. They focus on partially identified models where [math]\idr{\theta}[/math] depends on a ‘`reduced form" parameter [math]\phi[/math] (e.g., a vector of moments of observable random variables). They recognize that while a prior on [math]\phi[/math] can be revised in light of the data, a prior on [math]\theta[/math] cannot, due to the lack of point identification. As such they propose to choose a single prior for the revisable parameters, and a set of priors for the unrevisable ones. The latter is the collection of priors such that the distribution of [math]\theta|\phi[/math] places probability one on [math]\idr{\theta}[/math]. A crucial observation in [43] is that once [math]\phi[/math] is viewed as a random vector, as in the Bayesian paradigm, under mild regularity conditions [math]\idr{\theta}[/math] is a random closed set, and Bayesian inference on it can be carried out using elements of random set theory. In particular, they show that the set of posterior means of [math]\theta|\ew[/math] equals the Aumann expectation of [math]\idr{\theta}[/math] (with the underlying probability measure of [math]\phi|\ew[/math]). They also show that this Aumann expectation converges in Hausdorff distance to the ``true" identified set if the latter is convex, or otherwise to its convex hull. They apply their method to analyze impulse-response in set-identified Structural Vector Autoregressions, where standard Bayesian inference is otherwise sensitive to the choice of an unrevisable prior.Cite error: Closing </ref> missing for <ref> tag, [44], and [45], concerned with Bayesian inference with a non-informative prior for non-identified parameters. I refer to [46](Chapter 13) for a thorough review. Frequentist inference for impulse response functions in Structural Vector Autoregression models is carried out, e.g., in [47] and [48]. </ref> \begin{BI} [36] show that elements of random set theory can be employed to obtain inference methods for partially identified models that are easy to implement and have desirable statistical properties. Whereas they apply their findings to a specific class of models based on the Aumann expectation, the ensuing literature demonstrates that random set methods are widely applicable to obtain estimators of sharp identification regions and establish their consistency. \end{BI} [4] propose an alternative to the notion of consistent estimator. Rather than asking that [math]\idrn{\theta}[/math] satisfies the requirement in Definition, they propose the notion of ’'half-median-unbiased estimator. This notion is easiest to explain in the case of interval identified scalar parameters. Take, e.g., the bound in Theorem SIR- for the conditional expectation of selectively observed data. Then an estimator of that interval is half-median-unbiased if the estimated upper bound exceeds the true upper bound, and the estimated lower bound falls below the true lower bound, each with probability at least [math]1/2[/math] asymptotically. More generally, one can obtain a half-median-unbiased estimator as

[[math]] \begin{align} \idrn{\theta}=\left\{\vartheta\in\Theta:a_n\crit_n(\vartheta)\le c_{1/2}(\vartheta)\right\},\label{eq:idrn:half:med:unb} \end{align} [[/math]]

where [math]c_{1/2}(\vartheta)[/math] is a critical value chosen so that [math]\idrn{\theta}[/math] asymptotically contains [math]\idr{\theta}[/math] (or any fixed element in [math]\idr{\theta}[/math]; see the discussion in Section Coverage of $\idr{\theta}$ vs. Coverage of $\theta$ below) with at least probability [math]1/2[/math]. As discussed in the next section, [math]c_{1/2}(\vartheta)[/math] can be further chosen so that this probability is uniform over [math]\sP\in\cP[/math]. The requirement of half-median unbiasedness has the virtue that, by construction, an estimator such as \eqref{eq:idrn:half:med:unb} is a subset of a [math]1-\alpha[/math] confidence set as defined in \eqref{eq:CS} below for any [math]\alpha \lt 1/2[/math], provided [math]c_{1-\alpha}(\vartheta)[/math] is chosen using the same criterion for all [math]\alpha\in(0,1)[/math]. In contrast, a consistent estimator satisfying the requirement in Definition needs not be a subset of a confidence set. This is because the sequence [math]\tau_n[/math] in \eqref{eq:define:idrn} may be larger than the critical value used to obtain the confidence set, see equation \eqref{eq:CS} below, unless regularity conditions such as degeneracy or others allow one to set [math]\tau_n[/math] equal to zero. Moreover, choice of the sequence [math]\tau_n[/math] is not data driven, and hence can be viewed as arbitrary. This raises a concern for the scope of consistent estimation in general settings. However, reporting a set estimator together with a confidence set is arguably important to shed light on how much of the volume of the confidence set is due to statistical uncertainty and how much is due to a large identified set. One can do so by either using a half-median unbiased estimator as in \eqref{eq:idrn:half:med:unb}, or the set of minimizers of the criterion function in \eqref{eq:define:idrn} with [math]\tau_n=0[/math] (which, as previously discussed, satisfies the inner consistency requirement in \eqref{eq:inner_consistent} under weak conditions, and is Hausdorff consistent in some well behaved cases).

Confidence Sets Satisfying Various Coverage Notions

Coverage of [math]\idr{\theta}[/math] vs. Coverage of [math]\theta[/math]

I first discuss confidence sets [math]\CS\subset\R^d[/math] defined as level sets of a criterion function. To simplify notation, henceforth I assume [math]a_n=n[/math].

[[math]] \begin{align} \CS=\left\{\vartheta\in\Theta:n\crit_n(\vartheta)\le c_{1-\alpha}(\vartheta)\right\}.\label{eq:CS} \end{align} [[/math]]

In \eqref{eq:CS}, [math]c_{1-\alpha}(\vartheta)[/math] may be constant or vary in [math]\vartheta\in\Theta[/math]. It is chosen to that [math]\CS[/math] satisfies (asymptotically) a certain coverage property with respect to either [math]\idr{\theta}[/math] or each [math]\vartheta\in\idr{\theta}[/math]. Correspondingly, different appearances of [math]c_{1-\alpha}(\vartheta)[/math] may refer to different critical values associated with different coverage notions. The challenging theoretical aspect of inference in partial identification is the determination of [math]c_{1-\alpha}[/math] and of methods to approximate it.\medskip A first classification of coverage notions pertains to whether the confidence set should cover [math]\idr{\theta}[/math] or each of its elements with a prespecified asymptotic probability. Early on, within the study of interval-identified parameters, [49][50] put forward a confidence interval that expands each of the sample analogs of the extreme points of the population bounds by an amount designed so that the confidence interval asymptotically covers the population bounds with prespecified probability. [13] study the general problem of inference for a set [math]\idr{\theta}[/math] defined as the zero-level set of a criterion function. The coverage notion that they propose is pointwise coverage of the set, whereby [math]c_{1-\alpha}[/math] is chosen so that:

[[math]] \begin{align} \liminf_{n\to\infty}\sP(\idr{\theta}\subseteq\CS)\ge 1-\alpha~\text{for all}~\sP\in\cP.\label{eq:CS_coverage:set:pw} \end{align} [[/math]]

[13] provide conditions under which [math]\CS[/math] satisfies \eqref{eq:CS_coverage:set:pw} with [math]c_{1-\alpha}[/math] constant in [math]\vartheta[/math], yielding the so called criterion function approach to statistical inference in partial identification. Under the same coverage requirement, [51] and [52] introduce novel bootstrap methods for inference in moment inequality models. [53] propose an inference method for finite games of complete information that exploits the structure of these models. [36] propose a method to test hypotheses and build confidence sets satisfying \eqref{eq:CS_coverage:set:pw} based on random set theory, the so called support function approach, which yields simple to compute confidence sets with asymptotic coverage equal to [math]1-\alpha[/math] when [math]\idr{\theta}[/math] is strictly convex. The reason for the strict convexity requirement is that in its absence, the support function of [math]\idr{\theta}[/math] is not fully differentiable, but only directionally differentiable, complicating inference. Indeed, [54] show that standard bootstrap methods are consistent if and only if full differentiability holds, and they provide modified bootstrap methods that remain valid when only directional differentiability holds. [38] propose a data jittering method that enforces full differentiability at the price of a small conservative distortion. [40] extend the applicability of the support function approach to other moment inequality models and establish efficiency results. [27] show that an Hausdorff distance-based test statistic can be weighted to enforce either exact or first-order equivariance to transformations of parameters. [55] provide empirical likelihood based inference methods for the support function approach. The test statistics employed in the criterion function approach and in the support function approach are asymptotically equivalent in specific moment inequality models [36][42], but the criterion function approach is more broadly applicable. \medskip The field's interest changed to a different notion of coverage when [56] pointed out that often there is one ‘`true" data generating [math]\theta[/math], even if it is only partially identified. Hence, they proposed confidence sets that cover each [math]\vartheta\in\idr{\theta}[/math] with a prespecified probability. For pointwise coverage, this leads to choosing [math]c_{1-\alpha}[/math] so that:

[[math]] \begin{align} \liminf_{n\to\infty}\sP(\vartheta\in\CS)\ge 1-\alpha~\text{for all}~\sP\in\cP~\text{and}~\vartheta\in\idr{\theta}.\label{eq:CS_coverage:point:pw} \end{align} [[/math]]

If [math]\idr{\theta}[/math] is a singleton then \eqref{eq:CS_coverage:set:pw} and \eqref{eq:CS_coverage:point:pw} both coincide with the pointwise coverage requirement employed for point identified parameters. However, as shown in [56](Lemma 1), if [math]\idr{\theta}[/math] contains more than one element, the two notions differ, with confidence sets satisfying \eqref{eq:CS_coverage:point:pw} being weakly smaller than ones satisfying \eqref{eq:CS_coverage:set:pw}. [15] provides confidence sets for general moment (in)equalities models that satisfy \eqref{eq:CS_coverage:point:pw} and are easy to compute. Although confidence sets that take each [math]\vartheta\in\idr{\theta}[/math] as the object of interest (and which satisfy the ’'uniform coverage requirements described in Section Pointwise vs. Uniform Coverage below) have received the most attention in the literature on inference in partially identified models, this choice merits some words of caution. First, [57] point out that if confidence sets are to be used for decision making, a policymaker concerned with robust decisions might prefer ones satisfying \eqref{eq:CS_coverage:set:pw} (respectively, \eqref{eq:CS_coverage:set} below once uniformity is taken into account) to ones satisfying \eqref{eq:CS_coverage:point:pw} (respectively, \eqref{eq:CS_coverage:point} below with uniformity). Second, while in many applications a ‘`true" data generating [math]\theta[/math] exists, in others it does not. For example, [58] and [59] query survey respondents (in the American Life Panel and in the Health and Retirement Study, respectively) about their subjective beliefs on the probability chance of future events. A large fraction of these respondents, when given the possibility to do so, report imprecise beliefs in the form of intervals. In this case, there is no ``true" point-valued belief: the ``truth" is interval-valued. If one is interested in (say) average beliefs, the sharp identification region is the (Aumann) expectation of the reported intervals, and the appropriate coverage requirement for a confidence set is that in \eqref{eq:CS_coverage:set:pw} (respectively, \eqref{eq:CS_coverage:set} below with uniformity).

Pointwise vs. Uniform Coverage

In the context of interval identified parameters, such as, e.g., the mean with missing data in Theorem SIR- with [math]\theta\in\R[/math], [56] pointed out that extra care should be taken in the construction of confidence sets for partially identified parameters, as otherwise they may be asymptotically valid only pointwise (in the distribution of the observed data) over relevant classes of distributions.[Notes 5] For example, consider a confidence interval that expands each of the sample analogs of the extreme points of the population bounds by a one-sided critical value. This confidence interval controls the asymptotic coverage probability pointwise for any DGP at which the width of the population bounds is positive. This is because the sampling variation becomes asymptotically negligible relative to the (fixed) width of the bounds, making the inference problem essentially one-sided. However, for every [math]n[/math] one can find a distribution [math]\sP\in\cP[/math] and a parameter [math]\vartheta\in\idr{\theta}[/math] such that the width of the population bounds (under [math]\sP[/math]) is small relative to [math]n[/math] and the coverage probability for [math]\vartheta[/math] is below [math]1-\alpha[/math]. This happens because the proposed confidence interval does not take into account the fact that for some [math]\sP\in\cP[/math] the problem has a two-sided nature. This observation naturally leads to a more stringent requirement of ’'uniform coverage, whereby \eqref{eq:CS_coverage:set:pw}-\eqref{eq:CS_coverage:point:pw} are replaced, respectively, by

[[math]] \begin{align} \liminf_{n\to\infty}\inf_{\sP\in\cP}\sP(\idr{\theta}\subseteq\CS)&\ge 1-\alpha,\label{eq:CS_coverage:set}\\ \liminf_{n\to\infty}\inf_{\sP\in\cP}\inf_{\vartheta\in\idr{\theta}}\sP(\vartheta\in\CS)&\ge 1-\alpha,\label{eq:CS_coverage:point} \end{align} [[/math]]

and [math]c_{1-\alpha}[/math] is chosen accordingly, to obtain either \eqref{eq:CS_coverage:set} or \eqref{eq:CS_coverage:point}. Sets satisfying \eqref{eq:CS_coverage:set} are referred to as confidence regions for [math]\idr{\theta}[/math] that are uniformly consistent in level (over [math]\sP\in\cP[/math]). [20] propose such confidence regions, study their properties, and provide a step-down procedure to obtain them. [60] propose confidence sets that are contour sets of criterion functions using cutoffs that are computed via Monte Carlo simulations from the quasi‐posterior distribution of the criterion and satisfy the coverage requirement in \eqref{eq:CS_coverage:set}. They recommend the use of a Sequential Monte Carlo algorithm that works well also when the quasi-posterior is irregular and multi-modal. They establish exact asymptotic coverage, non-trivial local power, and validity of their procedure in point identified and partially identified regular models, and validity in irregular models (e.g., in models where the reduced form parameters are on the boundary of the parameter space). They also establish efficiency of their procedure in regular models that happen to be point identified.

Sets satisfying \eqref{eq:CS_coverage:point} are referred to as confidence regions for points in [math]\idr{\theta}[/math] that are uniformly consistent in level (over [math]\sP\in\cP[/math]). Within the framework of [56], [61] shows that one can obtain a confidence interval satisfying \eqref{eq:CS_coverage:point} by pre-testing whether the lower and upper population bounds are sufficiently close to each other. If so, the confidence interval expands each of the sample analogs of the extreme points of the population bounds by a two-sided critical value; otherwise, by a one-sided. [61] provides important insights clarifying the connection between superefficient (i.e., faster than [math]O_p(1/\sqrt{n})[/math]) estimation of the width of the population bounds when it equals zero, and certain challenges in [56]'s proposed method.Cite error: Closing </ref> missing for <ref> tag can be thought of as using a Hodges-type shrinkage estimator (see, e.g., [62]) for the width of the population bounds.</ref> [37] leverage [61]'s results to obtain confidence sets satisfying \eqref{eq:CS_coverage:point} using the support function approach for set identified linear models. Obtaining confidence sets that satisfy the requirement in \eqref{eq:CS_coverage:point} becomes substantially more complex in the context of general moment (in)equalities models. One of the key challenges to uniform inference stems from the fact that the behavior of the limit distribution of the test statistic depends on [math]\sqrt{n}\E_\sP(m_j(\ew_i;\vartheta)),~j=1,\dots,|\cJ|[/math], which cannot be consistently estimated. [14][17][18][19][63][64], among others, make significant contributions to circumvent these difficulties in the context of a finite number of unconditional moment (in)equalities. [3][4][5][6][7][8][10], among others, make significant contributions to circumvent these difficulties in the context of a finite number of conditional moment (in)equalities (with continuously distributed conditioning variables). [9] and [65] study, respectively, the challenging frameworks where the number of moment inequalities grows with sample size and where there is a continuum of conditional moment inequalities. I refer to [21](Section 4) for a thorough discussion of these methods and a comparison of their relative (de)merits (see also [66][67]).

Coverage of the Vector [math]\theta[/math] vs. Coverage of a Component of [math]\theta[/math]

The coverage requirements in \eqref{eq:CS_coverage:set}-\eqref{eq:CS_coverage:point} refer to confidence sets in [math]\R^d[/math] for the entire [math]\theta[/math] or [math]\idr{\theta}[/math]. Often empirical researchers are interested in inference on a specific component or (smooth) function of [math]\theta[/math] (e.g., the returns to education; the effect of market size on the probability of entry; the elasticity of demand for insurance to price, etc.). For simplicity, here I focus on the case of a component of [math]\theta[/math], which I represent as [math]u^\top\theta[/math], with [math]u[/math] a standard basis vector in [math]\R^d[/math]. In this case, the (sharp) identification region of interest is

[[math]] \begin{align*} \idr{u^\top\theta}=\{s\in[-h_\Theta(-u),h_\Theta(u)]:s=u^\top\vartheta~\text{and}~\vartheta\in\idr{\theta}\}. \end{align*} [[/math]]

One could report as confidence interval for [math]u^\top\theta[/math] the projection of [math]\CS[/math] in direction [math]\pm u[/math]. The resulting confidence interval is asymptotically valid but typically conservative. The extent of the conservatism increases with the dimension of [math]\theta[/math] and is easily appreciated in the case of a point identified parameter. Consider, for example, a linear regression in [math]\R^{10}[/math], and suppose for simplicity that the limiting covariance matrix of the estimator is the identity matrix. Then a 95\% confidence interval for [math]u^\top\theta[/math] is obtained by adding and subtracting [math]1.96[/math] to that component's estimate. In contrast, projection of a 95\% confidence ellipsoid for [math]\theta[/math] on each component amounts to adding and subtracting [math]4.28[/math] to that component's estimate. It is therefore desirable to provide confidence intervals [math]\CI[/math] specifically designed to cover [math]u^\top\theta[/math] rather then the entire [math]\theta[/math]. Natural counterparts to \eqref{eq:CS_coverage:set}-\eqref{eq:CS_coverage:point} are

[[math]] \begin{align} \liminf_{n\to\infty}\inf_{\sP\in\cP}\sP(\idr{u^\top\theta} \subseteq \CI)&\ge 1-\alpha,\label{eq:CS_coverage:set:proj}\\ \liminf_{n\to\infty}\inf_{\sP\in\cP}\inf_{\vartheta\in\idr{\theta}}\sP(u^\top\vartheta\in \CI)&\ge 1-\alpha. \label{eq:CS_coverage:point:proj} \end{align} [[/math]]

As shown in [36] and [42] for the case of pointwise coverage, obtaining asymptotically valid confidence intervals is simple if the identified set is convex and one uses the support function approach. This is because it suffices to base the test statistic on the support function in direction [math]u[/math], and it is often possible to easily characterize the limiting distribution of this test statistic. See [22](Chapters 4 and 5) for details. The task is significantly more complex in general moment inequality models when [math]\idr{\theta}[/math] is non-convex and one wants to satisfy the criterion in \eqref{eq:CS_coverage:set:proj} or that in \eqref{eq:CS_coverage:point:proj}. [14] and [68] propose confidence intervals of the form

[[math]] \begin{align} \CI = \left\{s\in[-h_\Theta(-u),h_\Theta(u)]:\inf_{\vartheta\in\Theta(s)}n\crit_n(\vartheta)\le c_{1-\alpha}(s)\right\},\label{eq:CI:BCS} \end{align} [[/math]]

where [math]\Theta(s)=\{\vartheta\in\Theta:u^\top\vartheta=s\}[/math] and [math]c_{1-\alpha}[/math] is such that \eqref{eq:CS_coverage:point:proj} holds. An important idea in this proposal is that of profiling the test statistic [math]n\crit_n(\vartheta)[/math] by minimizing it over all [math]\vartheta[/math]s such that [math]u^\top\vartheta=s[/math]. One then includes in the confidence interval all values [math]s[/math] for which the profiled test statistic's value is not too large. [14] propose the use of subsampling to obtain the critical value [math]c_{1-\alpha}(s)[/math] and provide high-level conditions ensuring that \eqref{eq:CS_coverage:point:proj} holds. [68] substantially extend and improve the profiling approach by providing a bootstrap-based method to obtain [math]c_{1-\alpha}[/math] so that \eqref{eq:CS_coverage:point:proj} holds. Their method is more powerful than subsampling (for reasonable choices of subsample size). [69] further enlarge the domain of applicability of the profiling approach by proposing a method based on this approach that is asymptotically uniformly valid when the number of moment conditions is large, and can grow with the sample size, possibly at exponential rates. [70] propose a bootstrap-based calibrated projection approach where

[[math]] \begin{align} \CI= [-h_{\eC_n(c_{1-\alpha})}(-u),h_{\eC_n(c_{1-\alpha})}(u)],\label{eq:def:CI} \end{align} [[/math]]

with

[[math]] \begin{align} h_{\eC_n(c_{1-\alpha})}(u)\equiv\sup_{\vartheta\in\Theta}~u^\top\vartheta~\text{s.t.}~\frac{\sqrt{n}\bar{m}_{n,j}(\vartheta)}{\hat{\sigma}_{n,j}(\vartheta)}\leq c_{1-\alpha}(\vartheta),~j=1,\dots,|\cJ|\label{eq:KMS:proj} \end{align} [[/math]]

and [math]c_{1-\alpha}[/math] a critical level function calibrated so that \eqref{eq:CS_coverage:point:proj} holds. Compared to the simple projection of [math]\CS[/math] mentioned at the beginning of Section Coverage of the Vector $\theta$ vs. Coverage of a Component of $\theta$, calibrated projection (weakly) reduces the value of [math]c_{1-\alpha}[/math] so that the projection of [math]\theta[/math], rather than [math]\theta[/math] itself, is asymptotically covered with the desired probability uniformly. [60] provide methods to build confidence intervals and confidence sets on projections of [math]\idr{\theta}[/math] as contour sets of criterion functions using cutoffs that are computed via Monte Carlo simulations from the quasi‐posterior distribution of the criterion, and that satisfy the coverage requirement in \eqref{eq:CS_coverage:set:proj}. One of their procedures, designed specifically for scalar projections, delivers a confidence interval as the contour set of a profiled quasi-likelihood ratio with critical value equal to a quantile of the Chi-squared distribution with one degree of freedom.

A Brief Note on Bayesian Methods

The confidence sets discussed in this section are based on the frequentist approach to inference. It is natural to ask whether in partially identified models, as in well behaved point identified models, one can build Bayesian credible sets that at least asymptotically coincide with frequentist confidence sets. This question was first addressed by [71], with a negative answer for the case that the coverage in \eqref{eq:CS_coverage:point} is sought out. In particular, they showed that the resulting Bayesian credible sets are a subset of [math]\idr{\theta}[/math], and hence too narrow from the frequentist perspective. This discrepancy can be ameliorated when inference is sought out for [math]\idr{\theta}[/math] rather than for each [math]\vartheta\in\idr{\theta}[/math]. [72], [73], [43], and [74] propose Bayesian credible regions that are valid for frequentist inference in the sense of \eqref{eq:CS_coverage:set:pw}, where the first two build on the criterion function approach and the second two on the support function approach. All these contributions rely on the model being separable, in the sense that it yields moment inequalities that can be written as the sum of a function of the data only, and a function of the model parameters only (as in, e.g., eq:CT_00-eq:CT_01L). In these models, the function of the data only (the reduced form parameter) is point identified, it is related to the structural parameters [math]\theta[/math] through a known mapping, and under standard regularity conditions it can be [math]\sqrt{n}[/math]-consistently estimated. The resulting estimator has an asymptotically Normal distribution. The various approaches place a prior on the reduced form parameter, and standard tools in Bayesian analysis are used to obtain a posterior. The known mapping from reduced form to structural parameters is then applied to this posterior to obtain a credible set for [math]\idr{\theta}[/math].

General references

Molinari, Francesca (2020). "Microeconometrics with Partial Identification". arXiv:2004.11751 [econ.EM].

Notes

  1. In these expressions an index of the form [math]jk[/math] not separated by a comma equals the product of [math]j[/math] with [math]k[/math].
  2. Using the well known duality between tests of hypotheses and confidence sets, the discussion could be re-framed in terms of size of the test.
  3. The definition of the Hausdorff distance can be generalized to an arbitrary metric space by replacing the Euclidean metric by the metric specified on that space.
  4. Using this normalized criterion function is especially important in light of possible model misspecification, see Section.
  5. This discussion draws on many conversations with J\"{o}rg Stoye, as well as on notes that he shared with me, for which I thank him.

References

  1. Cite error: Invalid <ref> tag; no text was provided for refs named eps:kai:seo16
  2. Cite error: Invalid <ref> tag; no text was provided for refs named ber:mol:mol11
  3. 3.0 3.1 Cite error: Invalid <ref> tag; no text was provided for refs named and:shi13
  4. 4.0 4.1 4.2 Cite error: Invalid <ref> tag; no text was provided for refs named che:lee:ros13
  5. 5.0 5.1 Cite error: Invalid <ref> tag; no text was provided for refs named lee:son:wha13
  6. 6.0 6.1 6.2 Cite error: Invalid <ref> tag; no text was provided for refs named arm14b
  7. 7.0 7.1 7.2 Cite error: Invalid <ref> tag; no text was provided for refs named arm15
  8. 8.0 8.1 Cite error: Invalid <ref> tag; no text was provided for refs named arm:cha16
  9. 9.0 9.1 Cite error: Invalid <ref> tag; no text was provided for refs named che:che:kat18
  10. 10.0 10.1 Cite error: Invalid <ref> tag; no text was provided for refs named che18
  11. Cite error: Invalid <ref> tag; no text was provided for refs named han82
  12. 12.0 12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 Cite error: Invalid <ref> tag; no text was provided for refs named man:tam02
  13. 13.00 13.01 13.02 13.03 13.04 13.05 13.06 13.07 13.08 13.09 13.10 Cite error: Invalid <ref> tag; no text was provided for refs named che:hon:tam07
  14. 14.0 14.1 14.2 14.3 Cite error: Invalid <ref> tag; no text was provided for refs named rom:sha08
  15. 15.0 15.1 Cite error: Invalid <ref> tag; no text was provided for refs named ros08
  16. Cite error: Invalid <ref> tag; no text was provided for refs named gal:hen09
  17. 17.0 17.1 Cite error: Invalid <ref> tag; no text was provided for refs named and:gug09b
  18. 18.0 18.1 18.2 Cite error: Invalid <ref> tag; no text was provided for refs named and:soa10
  19. 19.0 19.1 Cite error: Invalid <ref> tag; no text was provided for refs named can10
  20. 20.0 20.1 Cite error: Invalid <ref> tag; no text was provided for refs named rom:sha10
  21. 21.0 21.1 Cite error: Invalid <ref> tag; no text was provided for refs named can:sha17
  22. 22.0 22.1 22.2 22.3 Cite error: Invalid <ref> tag; no text was provided for refs named mol:mol18
  23. 23.0 23.1 Cite error: Invalid <ref> tag; no text was provided for refs named mo1
  24. Cite error: Invalid <ref> tag; no text was provided for refs named han:hea:lut95
  25. Cite error: Invalid <ref> tag; no text was provided for refs named gin:hah:zin83
  26. 26.0 26.1 26.2 26.3 Cite error: Invalid <ref> tag; no text was provided for refs named mol98
  27. 27.0 27.1 Cite error: Invalid <ref> tag; no text was provided for refs named che:koc:men15
  28. Cite error: Invalid <ref> tag; no text was provided for refs named han:jag91
  29. Cite error: Invalid <ref> tag; no text was provided for refs named mar52
  30. Cite error: Invalid <ref> tag; no text was provided for refs named che12b
  31. Cite error: Invalid <ref> tag; no text was provided for refs named red81
  32. 32.0 32.1 Cite error: Invalid <ref> tag; no text was provided for refs named kai:mol:sto19CQ
  33. Cite error: Invalid <ref> tag; no text was provided for refs named baz:she:she06
  34. Cite error: Invalid <ref> tag; no text was provided for refs named yil12
  35. Cite error: Invalid <ref> tag; no text was provided for refs named men14
  36. 36.0 36.1 36.2 36.3 36.4 36.5 36.6 36.7 36.8 Cite error: Invalid <ref> tag; no text was provided for refs named ber:mol08
  37. 37.0 37.1 37.2 37.3 Cite error: Invalid <ref> tag; no text was provided for refs named bon:mag:mau12
  38. 38.0 38.1 38.2 Cite error: Invalid <ref> tag; no text was provided for refs named cha:che:mol:sch18
  39. Cite error: Invalid <ref> tag; no text was provided for refs named mag:mau08
  40. 40.0 40.1 40.2 Cite error: Invalid <ref> tag; no text was provided for refs named kai:san14
  41. Cite error: Invalid <ref> tag; no text was provided for refs named bic:kla:rit:wel93
  42. 42.0 42.1 42.2 42.3 Cite error: Invalid <ref> tag; no text was provided for refs named kai16
  43. 43.0 43.1 43.2 Cite error: Invalid <ref> tag; no text was provided for refs named kit:gia18
  44. Cite error: Invalid <ref> tag; no text was provided for refs named can:den02
  45. Cite error: Invalid <ref> tag; no text was provided for refs named uhl05
  46. Cite error: Invalid <ref> tag; no text was provided for refs named kil:lut17
  47. Cite error: Invalid <ref> tag; no text was provided for refs named gra:moo:sch18
  48. Cite error: Invalid <ref> tag; no text was provided for refs named gaf:mei:mon18
  49. Cite error: Invalid <ref> tag; no text was provided for refs named hor:man98
  50. Cite error: Invalid <ref> tag; no text was provided for refs named hor:man00
  51. Cite error: Invalid <ref> tag; no text was provided for refs named bug10
  52. Cite error: Invalid <ref> tag; no text was provided for refs named gal:hen13
  53. Cite error: Invalid <ref> tag; no text was provided for refs named hen:mea:que15
  54. Cite error: Invalid <ref> tag; no text was provided for refs named fan:san18
  55. Cite error: Invalid <ref> tag; no text was provided for refs named adu:ots16
  56. 56.0 56.1 56.2 56.3 56.4 Cite error: Invalid <ref> tag; no text was provided for refs named imb:man04
  57. Cite error: Invalid <ref> tag; no text was provided for refs named hen:ona12
  58. Cite error: Invalid <ref> tag; no text was provided for refs named man:mol10
  59. Cite error: Invalid <ref> tag; no text was provided for refs named giu:man:mol19
  60. 60.0 60.1 Cite error: Invalid <ref> tag; no text was provided for refs named che:chr:tam18
  61. 61.0 61.1 61.2 Cite error: Invalid <ref> tag; no text was provided for refs named sto09
  62. Cite error: Invalid <ref> tag; no text was provided for refs named van97
  63. Cite error: Invalid <ref> tag; no text was provided for refs named and:bar12
  64. Cite error: Invalid <ref> tag; no text was provided for refs named rom:sha:wol14
  65. Cite error: Invalid <ref> tag; no text was provided for refs named and:shi17
  66. Cite error: Invalid <ref> tag; no text was provided for refs named bug:can:gug12
  67. Cite error: Invalid <ref> tag; no text was provided for refs named bug16
  68. 68.0 68.1 Cite error: Invalid <ref> tag; no text was provided for refs named bug:can:shi17
  69. Cite error: Invalid <ref> tag; no text was provided for refs named bel:bug:che18
  70. Cite error: Invalid <ref> tag; no text was provided for refs named kai:mol:sto19
  71. Cite error: Invalid <ref> tag; no text was provided for refs named moo:sch12
  72. Cite error: Invalid <ref> tag; no text was provided for refs named nor:tan14
  73. Cite error: Invalid <ref> tag; no text was provided for refs named kli:tam16
  74. Cite error: Invalid <ref> tag; no text was provided for refs named lia:sim19