guide:521939d27a: Difference between revisions

From Stochiki
No edit summary
 
mNo edit summary
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<div class="d-none"><math>
\newcommand{\edis}{\stackrel{d}{=}}
\newcommand{\fd}{\stackrel{f.d.}{\rightarrow}}
\newcommand{\dom}{\operatorname{dom}}
\newcommand{\eig}{\operatorname{eig}}
\newcommand{\epi}{\operatorname{epi}}
\newcommand{\lev}{\operatorname{lev}}
\newcommand{\card}{\operatorname{card}}
\newcommand{\comment}{\textcolor{Green}}   
\newcommand{\B}{\mathbb{B}}   
\newcommand{\C}{\mathbb{C}}
\newcommand{\G}{\mathbb{G}}
\newcommand{\M}{\mathbb{M}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\Q}{\mathbb{Q}}
\newcommand{\T}{\mathbb{T}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\E}{\mathbb{E}}
\newcommand{\W}{\mathbb{W}}   
\newcommand{\bU}{\mathfrak{U}}   
\newcommand{\bu}{\mathfrak{u}}
\newcommand{\bI}{\mathfrak{I}} 
\newcommand{\cA}{\mathcal{A}} 
\newcommand{\cB}{\mathcal{B}} 
\newcommand{\cC}{\mathcal{C}}
\newcommand{\cD}{\mathcal{D}}
\newcommand{\cE}{\mathcal{E}}
\newcommand{\cF}{\mathcal{F}}
\newcommand{\cG}{\mathcal{G}}
\newcommand{\cg}{\mathcal{g}}
\newcommand{\cH}{\mathcal{H}}
\newcommand{\cI}{\mathcal{I}}
\newcommand{\cJ}{\mathcal{J}}
\newcommand{\cK}{\mathcal{K}} 
\newcommand{\cL}{\mathcal{L}} 
\newcommand{\cM}{\mathcal{M}}
\newcommand{\cN}{\mathcal{N}}
\newcommand{\cO}{\mathcal{O}}
\newcommand{\cP}{\mathcal{P}}
\newcommand{\cQ}{\mathcal{Q}} 
\newcommand{\cR}{\mathcal{R}}
\newcommand{\cS}{\mathcal{S}}
\newcommand{\cT}{\mathcal{T}}
\newcommand{\cU}{\mathcal{U}}
\newcommand{\cu}{\mathcal{u}} 
\newcommand{\cV}{\mathcal{V}}
\newcommand{\cW}{\mathcal{W}} 
\newcommand{\cX}{\mathcal{X}} 
\newcommand{\cY}{\mathcal{Y}} 
\newcommand{\cZ}{\mathcal{Z}} 
\newcommand{\sF}{\mathsf{F}} 
\newcommand{\sM}{\mathsf{M}} 
\newcommand{\sG}{\mathsf{G}} 
\newcommand{\sT}{\mathsf{T}} 
\newcommand{\sB}{\mathsf{B}} 
\newcommand{\sC}{\mathsf{C}} 
\newcommand{\sP}{\mathsf{P}} 
\newcommand{\sQ}{\mathsf{Q}} 
\newcommand{\sq}{\mathsf{q}} 
\newcommand{\sR}{\mathsf{R}} 
\newcommand{\sS}{\mathsf{S}}
\newcommand{\sd}{\mathsf{d}} 
\newcommand{\cp}{\mathsf{p}}
\newcommand{\cc}{\mathsf{c}}
\newcommand{\cf}{\mathsf{f}}
\newcommand{\eU}{{\boldsymbol{U}}}
\newcommand{\eb}{{\boldsymbol{b}}}
\newcommand{\ed}{{\boldsymbol{d}}}
\newcommand{\eu}{{\boldsymbol{u}}}
\newcommand{\ew}{{\boldsymbol{w}}}
\newcommand{\ep}{{\boldsymbol{p}}}
\newcommand{\eX}{{\boldsymbol{X}}}
\newcommand{\ex}{{\boldsymbol{x}}}
\newcommand{\eY}{{\boldsymbol{Y}}}
\newcommand{\eB}{{\boldsymbol{B}}}
\newcommand{\eC}{{\boldsymbol{C}}}
\newcommand{\eD}{{\boldsymbol{D}}}
\newcommand{\eW}{{\boldsymbol{W}}}
\newcommand{\eR}{{\boldsymbol{R}}}
\newcommand{\eQ}{{\boldsymbol{Q}}}
\newcommand{\eS}{{\boldsymbol{S}}}
\newcommand{\eT}{{\boldsymbol{T}}}
\newcommand{\eA}{{\boldsymbol{A}}}
\newcommand{\eH}{{\boldsymbol{H}}}
\newcommand{\ea}{{\boldsymbol{a}}}
\newcommand{\ey}{{\boldsymbol{y}}}
\newcommand{\eZ}{{\boldsymbol{Z}}}
\newcommand{\eG}{{\boldsymbol{G}}}
\newcommand{\ez}{{\boldsymbol{z}}}
\newcommand{\es}{{\boldsymbol{s}}}
\newcommand{\et}{{\boldsymbol{t}}}
\newcommand{\ev}{{\boldsymbol{v}}}
\newcommand{\ee}{{\boldsymbol{e}}}
\newcommand{\eq}{{\boldsymbol{q}}}
\newcommand{\bnu}{{\boldsymbol{\nu}}}
\newcommand{\barX}{\overline{\eX}}   


\newcommand{\eps}{\varepsilon}
\newcommand{\Eps}{\mathcal{E}}
\newcommand{\carrier}{{\mathfrak{X}}}
\newcommand{\Ball}{{\mathbb{B}}^{d}}
\newcommand{\Sphere}{{\mathbb{S}}^{d-1}}
\newcommand{\salg}{\mathfrak{F}}
\newcommand{\ssalg}{\mathfrak{B}}
\newcommand{\one}{\mathbf{1}}
\newcommand{\Prob}[1]{\P\{#1\}}
\newcommand{\yL}{\ey_{\mathrm{L}}}
\newcommand{\yU}{\ey_{\mathrm{U}}}
\newcommand{\yLi}{\ey_{\mathrm{L}i}}
\newcommand{\yUi}{\ey_{\mathrm{U}i}}
\newcommand{\xL}{\ex_{\mathrm{L}}}
\newcommand{\xU}{\ex_{\mathrm{U}}}
\newcommand{\vL}{\ev_{\mathrm{L}}}
\newcommand{\vU}{\ev_{\mathrm{U}}}
\newcommand{\dist}{\mathbf{d}}
\newcommand{\rhoH}{\dist_{\mathrm{H}}}
\newcommand{\ti}{\to\infty}
\newcommand{\comp}[1]{#1^\mathrm{c}}
\newcommand{\ThetaI}{\Theta_{\mathrm{I}}}
\newcommand{\crit}{q}
\newcommand{\CS}{CS_n}
\newcommand{\CI}{CI_n}
\newcommand{\cv}[1]{\hat{c}_{n,1-\alpha}(#1)}
\newcommand{\idr}[1]{\mathcal{H}_\sP[#1]}
\newcommand{\outr}[1]{\mathcal{O}_\sP[#1]}
\newcommand{\idrn}[1]{\hat{\mathcal{H}}_{\sP_n}[#1]}
\newcommand{\outrn}[1]{\mathcal{O}_{\sP_n}[#1]}
\newcommand{\email}[1]{\texttt{#1}}
\newcommand{\possessivecite}[1]{\citeauthor{#1}'s \citeyear{#1}}
\newcommand\xqed[1]{%
  \leavevmode\unskip\penalty9999 \hbox{}\nobreak\hfill
  \quad\hbox{#1}}
\newcommand\qedex{\xqed{$\triangle$}}
\newcommand\independent{\perp\!\!\!\perp}
\DeclareMathOperator{\Int}{Int}
\DeclareMathOperator{\conv}{conv}
\DeclareMathOperator{\cov}{Cov}
\DeclareMathOperator{\var}{Var}
\DeclareMathOperator{\Sel}{Sel}
\DeclareMathOperator{\Bel}{Bel}
\DeclareMathOperator{\cl}{cl}
\DeclareMathOperator{\sgn}{sgn}
\DeclareMathOperator{\essinf}{essinf}
\DeclareMathOperator{\esssup}{esssup}
\newcommand{\mathds}{\mathbb}
\renewcommand{\P}{\mathbb{P}}
</math>
</div>
In this section I focus on the literature concerned with learning features of ''structural econometric models''.
These are models where economic theory is used to postulate relationships among observable outcomes <math>\ey</math>, observable covariates <math>\ex</math>, and unobservable variables <math>\nu</math>.
For example, economic theory may guide assumptions on economic behavior (e.g., utility maximization) and equilibrium that yield a mapping from <math>(\ex,\nu)</math> to <math>\ey</math>.
The researcher is interested in learning features of these relationships (e.g., utility function, distribution of preferences), and to this end may supplement the data and economic theory with functional form assumptions on the mapping of interest and distributional assumptions on the observable and unobservable variables.
The earlier literature on partial identification of features of structural models includes important examples of nonparametric analysis of random utility models and revealed preference extrapolation, e.g. <ref name="blo:mar60"><span style="font-variant-caps:small-caps">Block, H.D.,  <span style="font-variant-caps:normal">and</span> J.Marschak</span>  (1960): “Random Orderings  and Stochastic Theories of Responses” in ''Contributions to Probability  and Statistics: Essays in Honor of Harold Hotelling'', ed. by I.Olkin, pp.  97--132. Stanford University Press.</ref>, <ref name="mar60"><span style="font-variant-caps:small-caps">Marschak, J.</span>  (1960): “Binary Choice Constraints on Random Utility  Indicators” in ''Stanford Symposium on Mathematical Methods in the  Social Sciences'', ed. by K.Arrow. Stanford University Press.</ref>, <ref name="hal73"><span style="font-variant-caps:small-caps">Hall, R.E.</span>  (1973): “On the statistical theory of unobserved  components” MIT Working Paper 117, available at  [https://dspace.mit.edu/bitstream/handle/1721.1/63972/onstatisticalthe00hall.pdf?sequence=1 https://dspace.mit.edu/bitstream/handle/1721.1/63972/onstatisticalthe00hall.pdf?sequence=1].</ref>, <ref name="mcf75"><span style="font-variant-caps:small-caps">McFadden, D.L.</span>  (1975): “Tchebyscheff bounds for the space of agent  characteristics” ''Journal of Mathematical Economics'', 2(2), 225 --  242.</ref>, <ref name="fal78"><span style="font-variant-caps:small-caps">Falmagne, J.</span>  (1978): “A representation theorem for finite random  scale systems” ''Journal of Mathematical Psychology'', 18(1), 52 -- 72.</ref>, <ref name="mcf:ric91"><span style="font-variant-caps:small-caps">McFadden, D.L.,  <span style="font-variant-caps:normal">and</span> M.K. Richter</span>  (1991): “Stochastic  rationality and revealed stochastic preference” in ''Preferences,  Uncertainty and Rationality'', ed. by J.S. Chipman, D.L. McFadden,  <span style="font-variant-caps:normal">  and</span> M.K. Richter, pp. 161--186. Westview Press.</ref>, and others.
The earlier literature also addresses semiparametric analysis, where the underlying models are specified up to parameters that are finite dimensional (e.g., preference parameters) and parameters that are infinite dimensional (e.g., distribution functions); important examples include <ref name="mar:and44"><span style="font-variant-caps:small-caps">Marschak, J.,  <span style="font-variant-caps:normal">and</span> W.H. Andrews</span>  (1944): “Random  Simultaneous Equations and the Theory of Production” ''Econometrica'',  12(3/4), 143--205.</ref>, <ref name="mar52"><span style="font-variant-caps:small-caps">Markowitz, H.</span>  (1952): “Portfolio selection” ''Journal of  Finance'', 7, 77--91.</ref>, <ref name="fis66"><span style="font-variant-caps:small-caps">Fisher, F.M.</span>  (1966): ''The Identification Problem in  Econometrics''. McGraw-Hill Book Company.</ref>{{rp|at=Section 2.10}}, <ref name="har:kre79"><span style="font-variant-caps:small-caps">Harrison, J.,  <span style="font-variant-caps:normal">and</span> D.M. Kreps</span>  (1979): “Martingales and  arbitrage in multiperiod securities markets” ''Journal of Economic  Theory'', 20(3), 381 -- 408.</ref>, <ref name="kre81"><span style="font-variant-caps:small-caps">Kreps, D.M.</span>  (1981): “Arbitrage and equilibrium in economies with  infinitely many commodities” ''Journal of Mathematical Economics'',  8(1), 15 -- 35.</ref>, <ref name="lea81"><span style="font-variant-caps:small-caps">Leamer, E.E.</span>  (1981): “Is it a Demand Curve, Or Is It A Supply  Curve? Partial Identification through Inequality Constraints” ''The  Review of Economics and Statistics'', 63(3), 319--327.</ref>, <ref name="man88"><span style="font-variant-caps:small-caps">Manski, C.F.</span>  (1988b): “Identification of Binary Response Models”  ''Journal of the American Statistical Association'', 83(403), 729--738.</ref>, <ref name="jov89"><span style="font-variant-caps:small-caps">Jovanovic, B.</span>  (1989): “Observable Implications of Models with  Multiple Equilibria” ''Econometrica'', 57(6), 1431--1437.</ref>, <ref name="phi89"><span style="font-variant-caps:small-caps">Phillips, P. C.B.</span>  (1989): “Partially Identified Econometric  Models” ''Econometric Theory'', 5(2), 181--240.</ref>, <ref name="han:jag91"><span style="font-variant-caps:small-caps">Hansen, L.P.,  <span style="font-variant-caps:normal">and</span> R.Jagannathan</span>  (1991): “Implications of  Security Market Data for Models of Dynamic Economies” ''Journal of  Political Economy'', 99(2), 225--262.</ref>, <ref name="han:hea:lut95"><span style="font-variant-caps:small-caps">Hansen, L.P., J.Heaton,  <span style="font-variant-caps:normal">and</span> E.G.J. Luttmer</span>  (1995):  “Econometric Evaluation of Asset Pricing Models” ''The Review of  Financial Studies'', 8(2), 237--274.</ref>, <ref name="lut96"><span style="font-variant-caps:small-caps">Luttmer, E. G.J.</span>  (1996): “Asset Pricing in Economies with  Frictions” ''Econometrica'', 64(6), 1439--1467.</ref>, and others.
Contrary to the nonparametric bounds results discussed in [[guide:Ec36399528#sec:prob:distr |Section]], and especially in the case of semiparametric models, structural partial identification often yields an identification region that is ''not'' constructive.<ref group="Notes" >Of course, this is not always the case, as exemplified by the bounds in {{ref|name=han:jag91}}.</ref>
Indeed, the boundary of the set is not obtained in closed form as a functional of the distribution of the observable data.
Rather, the identification region can often be characterized as a ''level set'' of a properly specified criterion function.
The recent spark of interest in partial identification of structural microeconometric models was fueled by the work of <ref name="man:tam02"><span style="font-variant-caps:small-caps">Manski, C.F.,  <span style="font-variant-caps:normal">and</span> E.Tamer</span>  (2002): “Inference on  Regressions with Interval Data on a Regressor or Outcome”  ''Econometrica'', 70(2), 519--546.</ref>, <ref name="tam03"><span style="font-variant-caps:small-caps">Tamer, E.</span>  (2003): “Incomplete Simultaneous Discrete Response Model  with Multiple Equilibria” ''The Review of Economic Studies'', 70(1),  147--165.</ref> and <ref name="cil:tam09"><span style="font-variant-caps:small-caps">Ciliberto, F.,  <span style="font-variant-caps:normal">and</span> E.Tamer</span>  (2009): “Market Structure and  Multiple Equilibria in Airline Markets” ''Econometrica'', 77(6),  1791--1828.</ref>, and <ref name="hai:tam03"><span style="font-variant-caps:small-caps">Haile, P.A.,  <span style="font-variant-caps:normal">and</span> E.Tamer</span>  (2003): “Inference with an  Incomplete Model of English Auctions” ''Journal of Political Economy'',  111(1), 1--51.</ref>.
Each of these papers has advanced the literature in fundamental ways, studying conceptually very distinct problems.
<ref name="man:tam02"/> are concerned with partial identification of the decision process yielding binary outcomes in a semiparametric model, when one of the explanatory variables is interval valued.
Hence, the root cause of the identification problem they study is that the ''data is incomplete''.<ref group="Notes" >{{ref|name=man:tam02}} study also partial identification (and estimation) of nonparametric, semiparametric, and parametric conditional expectation functions that are well defined in the absence of a structural model, when one of the conditioning variables is interval valued. I refer to [[guide:Ec36399528#sec:prob:distr |Section]] for a discussion.</ref>
<ref name="tam03"/> and <ref name="cil:tam09"/> are concerned with identification (and estimation) of simultaneous equation models with dummy endogeneous variables which are representations of two-player entry games with multiple equilibria.<ref group="Notes" >{{ref|name=cil:tam09}} consider more general multi-player entry games.</ref>
<ref name="hai:tam03"/> are concerned with nonparametric identification and estimation of the distribution of valuations in a model of English auctions under weak assumptions on bidders' behavior.
In both cases, the root cause of the identification problem is that the ''structural model is incomplete''.
This is because the model makes multiple predictions for the observed outcome variables (respectively: the players' actions; and the bidders' bids), but does not specify how one of them is selected to yield the observed data.
''Set-valued predictions'' for the observable outcome (endogenous variables) are a key feature of partially identified structural models.
The goal of this section is to explain how they result in a wide array of theoretical frameworks, and how sharp identification regions can be characterized using a unified approach based on random set theory.
Although the work of <ref name="man:tam02"/>, <ref name="tam03"/> and <ref name="cil:tam09"/>, and <ref name="hai:tam03"/> has spurred many of the developments discussed in this section, for pedagogical reasons I organize the presentation based on application topic rather than chronologically.
The work of <ref name="pak10"><span style="font-variant-caps:small-caps">Pakes, A.</span>  (2010): “Alternative models for moment inequalities”  ''Econometrica'', 78(6), 1783--1822.</ref> and <ref name="pak:por:ho:ish15"><span style="font-variant-caps:small-caps">Pakes, A., J.Porter, K.Ho,  <span style="font-variant-caps:normal">and</span> J.Ishii</span>  (2015): “Moment  Inequalities and Their Application” ''Econometrica'', 83(1), 315--334.</ref> further stimulated a large empirical literature that applies partial identification methods to a wide array of questions of substantive economic importance, to which I return in Section [[#subsec:applications:struct |Further Theoretical Advances and Empirical Applications]].
===<span id="subsec:single:ag:RUM"></span>Discrete Choice in Single Agent Random Utility Models===
Let <math>\cI</math> denote a population of decision makers and <math>\cY=\{c_1,\dots,c_{|\cY|}\}</math> a finite universe of potential alternatives (''feasible set'' henceforth).
Let <math>\bU</math> be a family of real valued functions defined over the elements of <math>\cY</math>.
Let <math>\in^* </math> denote “is chosen from.”
Then observed choice is consistent with a ’'random utility model'' if there exists a function <math>\bu_i</math> drawn from <math>\bU</math> according to some probability distribution, such that <math>\P(c \in^* C)=\P(\bu_i(c) \ge \bu_i(b)\forall b \in C)</math> for all <math>c\in C</math>, all non empty sets <math>C \subset \cY</math>, and all <math>i\in\cI</math> <ref name="blo:mar60"/>.
See <ref name="man07a"><span style="font-variant-caps:small-caps">Manski, C.F.</span>  (2007a): ''Identification for Prediction and Decision''.  Harvard University Press.</ref>{{rp|at=Chapter 13}} for a textbook presentation of this class of models, and <ref name="mat07"><span style="font-variant-caps:small-caps">Matzkin, R.L.</span>  (2007): “Chapter 73 -- Nonparametric identification” in  ''Handbook of Econometrics'', ed. by J.J. Heckman,  <span style="font-variant-caps:normal">and</span> E.E.  Leamer, vol.6, chap.73, pp. 5307 -- 5368. Elsevier.</ref> for a review of sufficient conditions for point identification of nonparametric and semiparametric limited dependent variables models.
As in the seminal work of <ref name="mcf73"><span style="font-variant-caps:small-caps">McFadden, D.L.</span>  (1974): “Conditional Logit Analysis of Qualitative  Choice Behavior” in ''Frontiers in Econometrics'', ed. by P.Zarembka.  Academic Press.</ref>, assume that the decision makers and alternatives are characterized by observable and unobservable vectors of real valued attributes.
Denote the observable attributes by <math>\ex_i \equiv \{\ex_i^1,(\ex_{ic}^2,c\in\cY)\},i\in\cI</math>.
These include attribute vectors <math>\ex_i^1</math> that are specific to the decision maker, as well as attribute vectors <math>\ex_{ic}^2</math> that include components that are specific to the alternative and components that are indexed by both.
Denote the unobservable attributes (preferences) by <math>\nu_i\equiv(\zeta_i,\{\epsilon_{ic},c\in\cY\}),i\in\cI</math>.
These are idiosyncratic to the decision maker and similarly may include alternative and decision maker specific terms.
Denote <math>\cX,\cV</math> the supports of <math>\ex,\nu</math>, respectively.
In what follows, I label “standard” a random utility model that maintains some form of exogeneity for <math>\ex_i</math> (e.g., mean or quantile or statistical independence with <math>\nu_i</math>) and presupposes observation of data that include <math>\{(\eC_i,\ey_i,\ex_i):\ey_i \in^* \eC_i\}, i=1,\dots,n</math>, with <math>\eC_i</math> the choice set faced by decision maker <math>i</math> and <math>|\eC_i|\ge 2</math> (e.g., <ref name="man75"><span style="font-variant-caps:small-caps">Manski, C.F.</span>  (1975): “Maximum score estimation of the stochastic  utility model of choice” ''Journal of Econometrics'', 3(3), 205 -- 228.</ref>{{rp|at=Assumption 1}}).
Often it is also assumed that all members of the population face the same choice set, <math>\eC_i=D</math> for all <math>i\in\cI</math> and some known <math>D\subseteq\cY</math>, although this requirement is not critical to identification analysis.
====<span id="subsubsec:man:tam02"></span>Semiparametric Binary Choice Models with Interval Valued Covariates====
<ref name="man:tam02"/> provide inference methods for nonparametric, semiparametric, and parametric conditional expectation functions when one of the conditioning variables is interval valued.
I have discussed their nonparametric and parametric sharp bounds on conditional expectations with interval valued covariates in Identification [[guide:Ec36399528#IP:interval_covariate |Problems]] [[guide:Ec36399528#IP:man:tam02_param |and]], and Theorems [[guide:Ec36399528#SIR:man:tam:nonpar |SIR-]] and [[guide:Ec36399528#SIR:man:tam02_param |SIR-]], respectively.
Here I focus on their analysis of semiparametric binary choice models.
Compared to the generic notation set forth at the beginning of Section [[#subsec:single:ag:RUM |Discrete Choice in Single Agent Random Utility Models]], I let <math>\eC_i=\cY=\{0,1\}</math> for all <math>i\in\cI</math>, and with some abuse of notation I denote the vector of observed covariates <math>(\xL,\xU,\ew)</math>.
{{proofcard|Identification Problem (Semiparametric Binary Regression with Interval Covariate Data)|IP:man:tam02_binary|Let <math>(\ey,\xL,\xU,\ew)\sim\sP</math> be observable random variables in <math>\{0,1\}\times\R\times\R\times\R^d</math>, <math>d < \infty</math>, and let <math>\ex\in\R</math> be an unobservable random variable.
Let <math>\ey=\one(\ew\theta + \delta\ex +\epsilon > 0)</math>.
Assume <math>\delta > 0</math>, and further normalize <math>\delta=1</math> because the threshold-crossing condition is invariant to the scale of the parameters.
Here <math>\epsilon</math> is an unobserved heterogeneity term with continuous distribution conditional on <math>(\ew,\ex,\xL,\xU)</math>, <math>(\ew,\ex,\xL,\xU)</math>-a.s., and <math>\theta\in\Theta\subset\R^d</math> is a parameter vector representing decision makers’ preferences, with compact parameter space <math>\Theta</math>.
Assume that <math>\sR</math>, the joint distribution of <math>(\ey,\ex,\xL,\xU,\ew,\epsilon)</math>, is such that <math>\sR(\xL\le\ex\le\xU)=1</math>; <math>
\sR(\epsilon |\ew,\ex,\xL,\xU)=\sR(\epsilon|\ew,\ex)</math>; and for a specified <math>\alpha \in (0,1)</math>, <math>\sq_{\sR}^\epsilon(\alpha,\ew,\ex)=0</math> and <math>\sR(\epsilon \le 0|\ew,\ex)=\alpha</math>, <math>(\ew,\ex)</math>-a.s..
In the absence of additional information, what can the researcher learn about <math>\theta</math>?
|}}
Compared to Identification [[guide:Ec36399528#IP:interval_covariate |Problem]], here one continues to impose <math>\ex\in[\xL,\xU]</math> a.s.
The sign restriction on <math>\delta</math> replaces the monotonicity restriction (M) in Identification [[guide:Ec36399528#IP:interval_covariate |Problem]], but does not imply it unless the distribution of <math>\epsilon</math> is independent of <math>\ex</math> conditional on <math>\ew</math>.
The quantile independence restriction is inspired by <ref name="man85"><span style="font-variant-caps:small-caps">Manski, C.F.</span>  (1985): “Semiparametric analysis of discrete response:  Asymptotic properties of the maximum score estimator” ''Journal of  Econometrics'', 27(3), 313 -- 333.</ref>.
For given <math>\theta\in\Theta</math>, this model yields set valued predictions because <math>\ey=1</math> can occur whenever <math>\epsilon >  -\ew\theta-\xU</math>, whereas <math>\ey=0</math> can occur whenever <math>\epsilon\le -\ew\theta-\xL</math>, and <math>-\ew\theta-\xU \le -\ew\theta-\xL</math>.
Conversely, observation of <math>\ey=1</math> allows one to conclude that <math>\epsilon\in(-\ew\theta-\xU,+\infty)</math>, whereas observation of <math>\ey=0</math> allows one to conclude that <math>\epsilon\in(-\infty,-\ew\theta-\xL]</math>, and these regions of possible realizations of <math>\epsilon</math> overlap.
In contrast, when <math>\ex</math> is observed the prediction is unique because the value <math>-\ew\theta-\ex</math> partitions the space of realizations of <math>\epsilon</math> in two disjoint sets, one associated with <math>\ey=1</math> and the other with <math>\ey=0</math>.
[[#fig:set_valued_pred:man:tam:binary|Figure]] depicts the model's set-valued predictions for <math>\ey</math> given <math>(\ew,\xL,\xU)</math> as a function of <math>\epsilon</math>, and the model's set valued predictions for <math>\epsilon</math> given <math>(\ew,\xL,\xU)</math> as a function of <math>\ey</math>.<ref group="Notes" >[[#fig:set_valued_pred:man:tam:binary|Figure]] is based on Figure 1 in {{ref|name=man:tam02}}. See {{ref|name=che:ros19}}{{rp|at=Chapter XXX in this Volume}} for an extensive discussion of the duality between the model's set valued predictions for <math>\ey</math> as a function of <math>\epsilon</math> and for <math>\epsilon</math> as a function of <math>\ey</math>, in both cases given the observed covariates.</ref>
Why does this set-valued prediction hinder point identification?
The reason is that the distribution of the observable data relates to the model structure in an ''incomplete'' manner.
The model predicts <math>\sM(\ey=1|\ew,\xL,\xU)=\int \sR(\ey=1|\ew,\ex,\xL,\xU)d\sR(\ex|\ew,\xL,\xU)=\int \sR(\epsilon > -\ew\theta-\ex|\ew,\ex)d\sR(\ex|\ew,\xL,\xU),(\ew,\xL,\xU)</math>-a.s.
Because the distribution <math>\sR(\ex|\ew,\xL,\xU)</math> is left completely unspecified, one can find multiple values for <math>(\theta,\sR(\ex|\ew,\xL,\xU),\sR(\epsilon|\ew,\ex))</math>, satisfying the assumptions in Identification [[#IP:man:tam02_binary |Problem]], such that <math>\sM(\ey=1|\ew,\xL,\xU)=\sP(\ey=1|\ew,\xL,\xU),(\ew,\xL,\xU)</math>-a.s.
Nonetheless, in general, not all values of <math>\theta\in\Theta</math> can be paired with some <math>\sR(\ex|\ew,\xL,\xU)</math> and <math>\sR(\epsilon|\ew,\ex)</math> so that they are compatible with <math>\sP(\ey=1|\ew,\xL,\xU),(\ew,\xL,\xU)</math>-a.s. and with the maintained assumptions.
Hence, <math>\theta</math> can be partially identified using the information in the model and observed data.
<div id="fig:set_valued_pred:man:tam:binary" class="d-flex justify-content-center">
[[File:guide_d9532_fig_set_valued_pred_man_tam_binary.png | 700px | thumb | Predicted value of <math>\ey</math> as a function of <math>\epsilon</math>, and admissible values of <math>\epsilon</math> for each realization of <math>\ey</math>, in Identification [[#IP:man:tam02_binary |Problem]], conditional on <math>(\ew,\xL,\xU)</math>. ]]
</div>
{{proofcard|Theorem (Semiparametric Binary Regression with Interval Covariate Data)|SIR:man:tam02_binary|
Under the Assumptions of Identification [[#IP:man:tam02_binary |Problem]], the sharp identification region for <math>\theta</math> is
<math display="block">
\begin{multline}
\idr{\theta}=\Big\{\vartheta\in \Theta: \sP\Big((\ew,\xL,\xU):\, \{0\le\ew\vartheta+\xL\cap \sP(\ey=1|\ew,\xL,\xU)\le 1-\alpha\}\\
\cup \{\ew\vartheta+\xU\le 0\cap \sP(\ey=1|\ew,\xL,\xU)\ge 1-\alpha\}\Big) = 0 \Big\}.\label{eq:ThetaI_man:tam02_binary}
\end{multline}
</math>|For any <math>\vartheta\in\Theta</math>, define the set of possible values for the unobservable associated with the possible realizations of <math>(\ey,\ew,\xL,\xU)</math>, illustrated in [[#fig:set_valued_pred:man:tam:binary|Figure]], as <ref group="Notes" >In the definition of <math>\Eps_\vartheta(1,\ew,\xL,\xU)</math> I exploit the fact that under the maintained assumptions <math>\P(\epsilon=-\ew\vartheta-\xU|\ew,\ex,\xL,\xU)=0</math> to enforce its closedness.</ref>
<math display="block">
\begin{align}
\Eps_\vartheta(\ey,\ew,\xL,\xU) =\left \{
\begin{array}{ll}
(-\infty,-\ew\vartheta-\xL] & \textrm{if}  &\ey=0,\\
[-\ew\vartheta-\xU,+\infty) & \textrm{if} &\ey=1.
\end{array}
\right.\label{eq:def_Epsilon:man:tam}
\end{align}
</math>
Then <math>\Eps_\vartheta(\ey,\ew,\xL,\xU)</math> is a random closed set as per [[guide:379e0dcd67#def:rcs |Definition]].
To simplify notation, let <math>\Eps_\vartheta(\ey)\equiv\Eps_\vartheta(\ey,\ew,\xL,\xU)</math> suppressing the dependence on <math>(\ew,\xL,\xU)</math>.
Let <math>(\Eps_\vartheta(\ey),\ew,\xL,\xU)=\Eps_\vartheta(\ey)\times(\ew,\xL,\xU)=\{(\mathbf{e},\ew,\xL,\xU):\mathbf{e}\in\Eps_\vartheta(\ey)\}</math>.
If the model is correctly specified, for the data generating value <math>\theta</math>, <math>(\epsilon,\ew,\xL,\xU) \in (\Eps_\theta(\ey),\ew,\xL,\xU)</math> a.s.
By [[guide:379e0dcd67#thr:artstein |Theorem]] and Theorem 2.33 in <ref name="mol:mol18"><span style="font-variant-caps:small-caps">Molchanov, I.,  <span style="font-variant-caps:normal">and</span> F.Molinari</span>  (2018): ''Random Sets in Econometrics''. Econometric  Society Monograph Series, Cambridge University Press, Cambridge UK.</ref>, this occurs if and only if
<math display="block">
\begin{align}
\sR(\epsilon\in C|\ew,\xL,\xU)&\ge \sP(\Eps_\theta(\ey)\subset C|\ew,\xL,\xU),(\ew,\xL,\xU)\text{-a.s.}\forall C\in\cF,\label{eq:Artstein_on_man:tam}
\end{align}
</math>
where <math>\cF</math> here denotes the collection of closed subsets of <math>\R</math>.
We then have that <math>\vartheta</math> is observationally equivalent to <math>\theta</math> if and only if \eqref{eq:Artstein_on_man:tam} holds for <math>\Eps_\vartheta(\ey)</math> as defined in \eqref{eq:def_Epsilon:man:tam}.
The condition can be rewritten as
<math display="block">
\begin{align*}
\int \sR(\epsilon\in C|\ew,\ex,\xL,\xU)d\sR(\ex|\ew,\xL,\xU)&\ge \sP(\Eps_\vartheta(\ey)\subset C|\ew,\xL,\xU),(\ew,\xL,\xU)\text{-a.s.}\forall C\in\cF.
\end{align*}
</math>
The assumption that <math>\sR(\epsilon|\ew,\ex,\xL,\xU)=\sR(\epsilon|\ew,\ex)</math> yields that the above system of inequalities reduces to
<math display="block">
\begin{align*}
\int \sR(\epsilon\in C|\ew,\ex)d\sR(\ex|\ew,\xL,\xU)&\ge \sP(\Eps_\vartheta(\ey)\subset C|\ew,\xL,\xU),(\ew,\xL,\xU)\text{-a.s.}\forall C\in\cF.
\end{align*}
</math>
Next, note that given the possible realizations of <math>\Eps_\vartheta(\ey)</math>, the above inequality is trivially satisfied unless <math>C=(-\infty,t]</math> or <math>C=[t,\infty)</math> for some <math>t\in\R</math>.
Finally, the only restriction on the distribution of <math>\epsilon</math> is the quantile independence condition, hence it suffices to consider <math>t=0</math>.
To see why this is the case, let for example <math>t > 0</math> and fix a realization <math>(w,x_L,x_U)</math> for <math>(\ew,\xL,\xU)</math>.<ref group="Notes" >There are no <math>(\ew,\xL,\xU)</math>-cross restrictions.</ref>
Then for the inequality not to be trivially satisfied it must be that either <math>w\vartheta+x_L\ge -t</math> or <math>w\vartheta+x_U\le -t</math> (both are not possible because <math>w\vartheta+x_L\le w\vartheta+x_U</math>).
If <math>w\vartheta+x_U\le -t</math>, it must be that <math>t\in(0,-w\vartheta-x_U]</math> and <math>-w\vartheta-x_U > 0</math>.
Then a distribution <math>\sR</math> such that <math>\int \sR(\epsilon\in [0,t)|\ew=w,\ex)d\sR(\ex|\ew=w,\xL=x_L,\xU=x_U)=0</math> is always feasible for <math>t\in(0,-w\vartheta-x_U]</math>.
A similar argument holds if <math>w\vartheta+x_L\ge -t</math>; and also if <math>t < 0</math>.
We then have that if the inequalities are satisfied for <math>t=0</math>, they are satisfied also for <math>t\neq 0</math>.
Finally, using the definition of <math>\Eps_\vartheta(\ey)</math>, for <math>t=0</math> we have
<math display="block">
\begin{align}
1-\alpha &\ge \sP(\ey=1|\ew,\xL,\xU)\text{for all}(\ew,\xL,\xU)\text{such that } \ew\vartheta+\xU\le 0,\label{eq:key_sharp:man:tam02_1}\\
1-\alpha &  \le \sP(\ey=1|\ew,\xL,\xU)\text{for all}(\ew,\xL,\xU)\text{such that } \ew\vartheta+\xL  \ge 0.\label{eq:key_sharp:man:tam02_2}
\end{align}
</math>
Any given <math>\vartheta\in\Theta</math>, <math>\vartheta\neq\theta</math>, violates the above conditions if and only if <math>\sP\big((\ew,\xL,\xU):\, \{0\le\ew\vartheta+\xL\cap \sP(\ey=1|\ew,\xL,\xU)\le 1-\alpha\}\cup \{\ew\vartheta+\xU\le 0\cap \sP(\ey=1|\ew,\xL,\xU)\ge 1-\alpha\}\big)  >  0</math>.}}
'''Key Insight:'''
<i>
The analysis in <ref name="man:tam02"/> systematically studies what can be learned under increasingly strong sets of assumptions.
These include both assumptions that constrain the model from fully nonparametric to semiparametric to parametric, as well as assumptions that constrain the distribution of the observable covariates.
For example, <ref name="man:tam02"/>{{rp|at=Corollary to Proposition 2}} provide sufficient conditions on the joint distribution of <math>(\ew,\xL,\xU)</math> that allow for identification of the sign of components of <math>\theta</math>, as well as for point identification of <math>\theta</math>.<ref group="Notes" >This Corollary is related in spirit to the analysis in {{ref|name=man88}}.</ref>
The careful analysis of the identifying power of increasingly stronger assumptions is the pillar of the partial identification approach to empirical research proposed by Manski, as illustrated in [[guide:Ec36399528#sec:prob:distr |Section]].
The work of <ref name="man:tam02"/> was the first example of this kind in semiparametric structural models.
</i>
Revisiting <ref name="man:tam02"/> study of Identification [[#IP:man:tam02_binary |Problem]] nearly 20 years later yields important insights on the differences between point and partial identification analysis.
It is instructive to take as a point of departure the analysis of <ref name="man85"></ref>, which under the additional assumption that <math>(\ey,\ew,\ex)</math>
is observed yields
<math display="block">
\begin{align*}
\ew\theta+\ex > 0 \Leftrightarrow \sP(\ey=1|\ew,\ex) > 1-\alpha.
\end{align*}
</math>
In this case, <math>\theta</math> is identified relative to <math>\vartheta\in\Theta</math> if
<math display="block">
\begin{align}
\sP\left((\ew,\ex):\, \{\ew\theta+\ex\le 0 < \ew\vartheta+\ex\}
\cup \{\ew\vartheta+\ex\le 0 < \ew\theta+\ex\}\right)  >  0.\label{eq:manski85}
\end{align}
</math>
<ref name="man:tam02"/> extend this reasoning to the case that <math>\ex</math> is unobserved, but known to satisfy <math>\ex\in [\xL,\xU]</math> a.s.
The first part of their analysis, collected in their Proposition 2, characterizes the collection of values that cannot be distinguished from <math>\theta</math> on the basis of <math>\sP(\ew,\xL,\xU)</math> alone, through a clear generalization of \eqref{eq:manski85}:
<math display="block">
\begin{align}
\{\vartheta\in \Theta: \sP\left((\ew,\xL,\xU):\, \{\ew\theta+\xU\le 0 < \ew\vartheta+\xL\}
\cup \{\ew\vartheta+\xU\le 0 < \ew\theta+\xL\}\right) = 0\}.\label{eq:region:man:tam02:potential}
\end{align}
</math>
It is worth emphasizing that the characterization in \eqref{eq:region:man:tam02:potential} depends on <math>\theta</math>, and makes no use of the information in <math>\sP(\ey|\ew,\xL,\xU)</math>.
The Corollary to Proposition 2 yields conditions on <math>\sP(\ew,\xL,\xU)</math> under which either the sign of components of <math>\theta</math>, or <math>\theta</math> itself, can be identified, regardless of the distribution of <math>\ey|\ew,\xL,\xU</math>.
<ref name="man:tam02"/>{{rp|at=Lemma 1}} provide a second characterization, which presupposes knowledge of <math>\sP(\ey,\ew,\xL,\xU)</math>, yields a set smaller than the one in \eqref{eq:region:man:tam02:potential}, and coincides with the result in Theorem [[#SIR:man:tam02_binary |SIR-]].
<ref name="man:tam02"/> use the same notation for the two sets, although the sets are conceptually and mathematically distinct.<ref group="Notes" >This was confirmed in personal communication with Chuck Manski and Elie Tamer.</ref>
The result in Theorem [[#SIR:man:tam02_binary |SIR-]] is due to <ref name="man:tam02"/>{{rp|at=Lemma 1}}, but the proof provided here is new, as is the use of random set theory in this application.<ref group="Notes" >The proof closes a gap in the argument in {{ref|name=man:tam02}} connecting their Proposition 2 and Lemma 1, due to the fact that for a given <math>\vartheta</math> the sets <math display = "block">\{(\ew,\xL,\xU):\, \{\ew\theta+\xU\le 0 < \ew\vartheta+\xL\} \cup \{\ew\vartheta+\xU\le 0 < \ew\theta+\xL\}\}</math> and <math display = "block">\begin{split}\{(\ew,\xL,\xU):\, \{0 < \ew\vartheta+\xL\cap \sP(\ey=1|\ew,\xL,\xU)\le 1-\alpha\} \\ \cup \{\ew\vartheta+\xU\le 0\cap \sP(\ey=1|\ew,\xL,\xU) >  1-\alpha\}\}\end{split}</math>
need not coincide, with the former being a subset of the latter due to part (c) of the proof of Proposition 2 in {{ref|name=man:tam02}}.</ref>
'''Key Insight:'''<i><span id="remark:man:tam02:che:ros"/>The preceding discussion allows me to draw a novel connection between the two characterizations in <ref name="man:tam02"/>, and the distinction put forward by <ref name="che:ros17"><span style="font-variant-caps:small-caps">Chesher, A.,  <span style="font-variant-caps:normal">and</span> A.M. Rosen</span>  (2017a): “Generalized instrumental variable models”  ''Econometrica'', 85, 959--989.</ref> and <ref name="che:ros19"><span style="font-variant-caps:small-caps">Chesher, A.,  <span style="font-variant-caps:normal">and</span> A.M. Rosen</span>  (2019): “Generalized instrumental variable models,  methods, and applications” in ''Handbook of Econometrics''. Elsevier.</ref>{{rp|at=Chapter XXX in this Volume, Definition 2}} in partial identification between ''potential observational equivalence'' and ''observational equivalence''.<ref group="Notes" >This distinction echos the distinction drawn by {{ref|name=man88book}}{{rp|at=Section 1.1.1}} between ''point identification'' and ''uniform point identification''.
{{ref|name=man88book}} considers a scenario where a parameter vector of interest <math>\theta</math> is defined as the solution to an equation of the form <math>\crit_\sP(\theta)=0</math> for some criterion function <math>\crit_\sP:\Theta\mapsto\R_+</math>.
Then <math>\theta</math> is point identified relative to <math>(\sP,\Theta)</math> if it is the unique solution to <math>\crit_\sP(\theta)=0</math>.
It is ''uniformly'' point identified relative to <math>(\cP,\Theta)</math>, with <math>\cP</math> a space of probability distributions to which <math>\sP</math> belongs, if for every <math>\tilde\sP\in\cP</math>, <math>\crit_{\tilde\sP}(\vartheta)=0</math> has a unique solution.</ref>
Applying <ref name="che:ros17"/>'s definition, parameter vectors <math>\theta</math> and <math>\vartheta</math> are ''potentially'' observationally equivalent if there exists ''some'' distribution of <math>\ey|\ew,\xL,\xU</math> for which conditions \eqref{eq:key_sharp:man:tam02_1}-\eqref{eq:key_sharp:man:tam02_2} hold.
Simple algebra confirms that this yields the region in \eqref{eq:region:man:tam02:potential}.
This notion of potential observational equivalence parallels one of the notions used to obtain sufficient conditions for point identification in the semiparametric literature (as in, e.g. <ref name="man85"/>).
Both notions, as explained in <ref name="che:ros19"/>{{rp|at=Section 4.1}}, make no reference to the conditional distribution of outcomes given covariates delivered by the process being studied.
To obtain that parameters <math>\theta</math> and <math>\vartheta</math> ''are'' observationally equivalent one requires instead that conditions \eqref{eq:key_sharp:man:tam02_1}-\eqref{eq:key_sharp:man:tam02_2} hold for the ''observed'' distribution <math>\sP(\ey=1|\ew,\xL,\xU)</math> (as opposed to “for some distribution” as in the case of potential observational equivalence).
This yields the sharp identification region in \eqref{eq:ThetaI_man:tam02_binary}.
</i>
<ref name="man10"><span style="font-variant-caps:small-caps">Manski, C.F.</span>  (2010): “Random Utility Models with Bounded Ambiguity”  in ''Structural Econometrics'', ed. by B.Dutta, pp. 272--284. Oxford  University Press, 1 edn.</ref> studies random ’'expected'' utility models, where agents choose the alternative that maximizes their expected utility.
The core difference with standard models is that <ref name="man10"/> does not fully specify the subjective beliefs that agents use to form their expectations, but only a ''set'' of such beliefs.
<ref name="man10"/> shows that the resulting, partially identified, discrete choice model can be formulated similarly to how <ref name="man:tam02"/> treat interval valued covariates, and leverages their results to obtain bounds on preference parameters.<ref group="Notes" >{{ref|name=ber:mol:mol11}}{{rp|at=Supplementary Appendix F}} extend the analysis of {{ref|name=man:tam02}} to multinomial choice models with interval covariates.</ref>
<ref name="mag:mau08"><span style="font-variant-caps:small-caps">Magnac, T.,  <span style="font-variant-caps:normal">and</span> E.Maurin</span>  (2008): “Partial Identification  in Monotone Binary Models: Discrete Regressors and Interval Data” ''The  Review of Economic Studies'', 75(3), 835--864.</ref> consider a different but closely related model to the semiparametric binary response model studied by <ref name="man:tam02"/>.
They assume that an instrumental variable <math>\ez</math> is available, that <math>\epsilon</math> is independent of <math>\ex</math> conditional on <math>(\ew,\ez)</math>, and that <math>Corr(\ez,\epsilon)=0</math>.
They assume that the distribution of <math>\ex</math> is absolutely continuous with support <math>[v_1,v_k]</math>, and that <math>\ex</math> is not a deterministic linear function of <math>(\ew,\ez)</math>.
They consider the case that <math>\ex</math> is unobserved but known to belong to one of the fixed (and known) intervals <math>[v_i,v_{i+1})</math>, <math>i=1,\dots,k-1</math>, with <math>\sR[\ex\in[v_i,v_{i+1})|\ew,\ez] > 0</math> almost surely for all <math>i</math>.
Finally, they assume that <math>(-\ew\theta-\epsilon)\in [v_1,v_k]</math> with probability one.
They do not, however, make quantile independence assumptions.
Their point of departure is the fact that under these conditions, if <math>\ex</math> were observed, one could employ a transformation proposed by <ref name="lew00"><span style="font-variant-caps:small-caps">Lewbel, A.</span>  (2000): “Semiparametric qualitative response model  estimation with unknown heteroscedasticity or instrumental variables”  ''Journal of Econometrics'', 97(1), 145 -- 177.</ref> for the binary outcome <math>\ey</math>, such that <math>\theta</math> can be identified through a simple linear moment condition.
Specifically, let
<math display="block">
\begin{align*}
\tilde{\ey}=\frac{\ey - \one_{\ex > 0}}{f_\ex(\ex|\ew,\ez)},
\end{align*}
</math>
where <math>f_\ex(\cdot|\ew,\ez)</math> is the conditional density function of <math>\ex</math>.
Then, using the assumption that <math>\ez</math> and <math>\epsilon</math> are uncorrelated, one has
<math display="block">
\begin{align}
\E_\sP(\ez \tilde{\ey})-\E_\sP(\ez \ew^\top) \theta = 0.\label{eq:sem-bin}
\end{align}
</math>
With interval valued <math>\ex</math>, <ref name="mag:mau08"/> denote by <math>\ex^*</math> the random variable that takes value <math>i\in\{1,\dots,k-1\}</math> if <math>\ex\in[v_i,v_{i+1})</math>, so that the observed data are draws from the joint distribution of <math>(\ey,\ew,\ez,\ex^*)</math>.
They let <math>\delta(\ex^*)=v_{\ex^*+1}-v_{\ex^*}</math> denote the length of the <math>\ex^*</math>-th interval, and define the transformed outcome variable:
<math display="block">
\ey^*=\frac{\delta(\ex^*)}{\sP(\ex^*=i|\ew,\ez)}\ey-v_k.
</math>
The assumptions on <math>\ex</math> yield that, given <math>\ez</math> and <math>\ew</math>, <math>\epsilon</math> does not depend on <math>\ex^*</math>.
Moreover, <math>\sP(\ey=1|\ex^*,\ew,\ez)</math> is non-decreasing in <math>\ex^*</math> and <math>\sF_\epsilon(\cdot|\ez,\ew,\ex,\ex^*)=\sF_\epsilon(\cdot|\ez,\ew)</math>.
<ref name="mag:mau08"/> show that the sharp identification region for <math>\theta</math> is
<math display="block">
\begin{align}
\idr{\theta}=\E_\sP(\ez \ew^\top)^{-1}\E_\sP(\ez \ey^* + \ez \eU),\label{eq:SIR:mag:mau}
\end{align}
</math>
where <math>\E_\sP(\ez \ey^* + \ez \eU)</math> is the Aumann (or selection) expectation of the random interval <math>\ez \ey^* + \ez \eU</math>, see [[guide:379e0dcd67#def:sel-exp |Definition]], with
<math display="block">
\begin{align*}
\eU=\left[-\sum_{i=1}^{k-1}(r_i(\ew,\ez)-r_{i-1}(\ew,\ez))(v_{i+1}-v_i),
\sum_{i=1}^{k-1}(r_{i+1}(\ew,\ez)-r_i(\ew,\ez))(v_{i+1}-v_i) \right].
\end{align*}
</math>
In this expression, <math>r_{\ex^*}(\ew,\ez)\equiv\sP(\ey=1|\ex^*,\ew,\ez)</math> and by convention <math>r_0(\ew,\ez)=0</math> and <math>r_K(\ew,\ez)=1</math>, see <ref name="mag:mau08"/>{{rp|at=Theorem 4}}.
If <math>r_i(\ew,\ez),i=0,\dots,k</math>, were observed, this characterization would be very similar to the one provided by <ref name="ber:mol08"><span style="font-variant-caps:small-caps">Beresteanu, A.,  <span style="font-variant-caps:normal">and</span> F.Molinari</span>  (2008): “Asymptotic  Properties for a Class of Partially Identified Models” ''Econometrica'',  76(4), 763--814.</ref> for Identification [[guide:Ec36399528#IP:param_pred_interval |Problem]], see equation [[guide:Ec36399528#eq:ThetaI_BLP |eq:ThetaI_BLP]].
However, these random functions need to be estimated.
While the first-stage estimation of <math>r_i(\ew,\ez),i=0,\dots,k</math>, does not affect the identification arguments, it does complicate inference, see <ref name="cha:che:mol:sch18"><span style="font-variant-caps:small-caps">Chandrasekhar, A., V.Chernozhukov, F.Molinari,  <span style="font-variant-caps:normal">and</span>  P.Schrimpf</span>  (2018): “Best linear approximations to set identified  functions: with an application to the gender wage gap” CeMMAP working paper  CWP09/19, available at [https://www.cemmap.ac.uk/publication/id/13913 https://www.cemmap.ac.uk/publication/id/13913].</ref> and the discussion in [[guide:6d1a428897#sec:inference |Section]].
====<span id="subsubsec:CRS"></span>Endogenous Explanatory Variables====
Whereas the standard random utility model presumes some form of exogeneity for <math>\ex</math>, in practice often some explanatory variables are endogenous.
This problem has been addressed in the literature to obtain point identification of the model through a combination of several assumptions, including large support conditions, special regressors, control function restrictions, and more (see, e.g., <ref name="mat93"><span style="font-variant-caps:small-caps">Matzkin, R.L.</span>  (1993): “Nonparametric identification and estimation  of polychotomous choice models” ''Journal of Econometrics'', 58(1), 137  -- 168.</ref><ref name="ber:lev:pak95"><span style="font-variant-caps:small-caps">Berry, S.T., J.Levinsohn,  <span style="font-variant-caps:normal">and</span> A.Pakes</span>  (1995):  “Automobile Prices in Market Equilibrium” ''Econometrica'', 63(4),  841--890.</ref><ref name="lew00"/><ref name="pet:tra10"><span style="font-variant-caps:small-caps">Petrin, A.,  <span style="font-variant-caps:normal">and</span> K.Train</span>  (2010): “A Control Function  Approach to Endogeneity in Consumer Choice Models” ''Journal of  Marketing Research'', 47(1), 3--13.</ref>).
<ref name="hon:tam03"><span style="font-variant-caps:small-caps">Hong, H.,  <span style="font-variant-caps:normal">and</span> E.Tamer</span>  (2003b): “Inference in Censored Models with Endogenous  Regressors” ''Econometrica'', 71(3), 905--932.</ref> analyze the distinct but related problem of identification in a censored regression model with endogeneous explanatory variables, and provide sufficient conditions for point identification.<ref group="Notes" >The estimator that they propose extends the minimum distance estimator put forward by {{ref|name=man:tam02}}, see Section [[guide:6d1a428897#subsec:consistent |Consistent Estimation]], so that if the conditions required for point identification do not hold, it estimates the parameter's identification region (under regularity conditions).
{{ref|name=hon:tam03letters}} carry out a similar analysis for the binary choice model with endogenous explanatory variables.</ref>
Here I discuss how to carry out identification analysis in the absence of such assumptions when instrumental variables <math>\ez</math> are available, as proposed by <ref name="che:ros:smo13"><span style="font-variant-caps:small-caps">Chesher, A., A.M. Rosen,  <span style="font-variant-caps:normal">and</span> K.Smolinski</span>  (2013): “An  instrumental variable model of multiple discrete choice” ''Quantitative  Economics'', 4(2), 157--196.</ref>.
They consider a more general case than I do here, with utility function that is not parametrically specified and not restricted to be separable in the unobservables.
Even in that more general case, the identification analysis follows through similar steps as reported here.
{{proofcard|Identification Problem (Discrete Choice with Endogenous Explanatory Variables) |IP:discrete:choice:endogenous|
Let <math>(\ey,\ex,\ez)\sim\sP</math> be observable random variables in <math>\cY\times\cX\times\cZ</math>.
Let all members of the population face the same choice set <math>\cY</math>.
Suppose that each alternative has one unobservable attribute <math>\epsilon_c,c\in\cY</math> and let <math>\nu\equiv(\epsilon_{c_1},\dots,\epsilon_{c_{|\cY|}})</math>.<ref group="Notes" >Compared to the general model put forward in Section [[#subsec:single:ag:RUM |Discrete Choice in Single Agent Random Utility Models]], in this model there are no preference heterogeneity terms <math>\zeta</math> (random coefficients) that vary only across decision makers.</ref>
Let <math>\nu\sim\sQ</math> and assume that <math>\nu\independent\ez</math>.
Suppose <math>\sQ</math> belongs to a nonparametric family of distributions <math>\cT</math>, and that the conditional distribution of <math>\nu|\ex,\ez</math>, denoted <math>\sR(\nu|\ex,\ez)</math>, is absolutely continuous with respect to Lebesgue measure with everywhere positive density on its support, <math>(\ex,\ez)</math>-a.s.
Suppose utility is separable in unobservables and has a functional form known up to finite dimensional parameter vector <math>\theta\in\Theta\subset\R^m</math>, so that <math>\bu_i(c)=g(\ex_c;\theta)+\epsilon_c</math>, <math>(\ex_c,\epsilon_c)</math>-a.s., for all <math>c\in\cY</math>.
Maintain the normalizations <math>g(\ex_{c_{|\cY|}};\theta)=0</math> for all <math>\theta\in\Theta</math> and all <math>\ex\in\cX</math>, and <math>g(x_c^0;\theta)=\bar{g}</math> for known <math>(x_c^0,\bar{g})</math> for all <math>\theta\in\Theta</math> and <math>c\in\cY</math>.<ref group="Notes" >Of course, under these conditions one can work directly with utility differences. To try and economize on notation, I do not explicitly do so here.</ref>
Given <math>(\ex,\ez,\nu)</math>, suppose <math>\ey</math> is the utility maximizing choice in <math>\cY</math>.
In the absence of additional information, what can the researcher learn about <math>(\theta,\sQ)</math>?|}}
The key challenge to identification here results because the distribution of <math>\nu</math> can vary across different values of <math>\ex</math>, both conditional and unconditional on <math>\ez</math>.
Why does this fact hinder point identification?
For a given <math>\vartheta\in\Theta</math> and for any <math>c\in\cY</math> and <math>x\in\cX</math>, the model yields that <math>c</math> is optimal, and hence chosen, if and only if <math>\nu</math> realizes in the set
<math display="block">
\begin{align}
\cE_\vartheta(c,x)=\{e\in\cV:g(x_c;\vartheta)+e_c\ge g(x_d;\vartheta)+e_d\forall d\in\cY\}.\label{eq:che:ros:E}
\end{align}
</math>
[[#fig:discrete:choice:endogenous|Figure]] plots the set <math>\cE_\vartheta(\ey,\ex)</math> in a stylized example with <math>\cY=\{1,2,3\}</math> and <math>\cX=\{x^1,x^2\}</math>, as a function of <math>(\epsilon_1-\epsilon_3,\epsilon_2-\epsilon_3)</math>.<ref group="Notes" >This figure is based on Figures 1-3 in {{ref|name=che:ros:smo13}}.</ref>
Consider the model implied distribution, denoted <math>\sM</math> below, of the optimal choice.
Then, recalling the restriction <math>\ez\independent\nu</math>, we have
<math display="block">
\begin{align}
\sM(c|\ex\in R_x,\ez;\vartheta)&=\int_{x\in R_x}\sR(\cE_\vartheta(c,\ex)|\ex=x,\ez)d\sP(x|z),\forall R_x\subseteq\cX,\ez\text{-a.s.}\label{eq:che:ros:model:distrib}\\
\sQ(F)&=\int_{x\in\cX}\sR(F|\ex=x,\ez)d\sP(x|z),\forall F\subseteq\cV,\ez\text{-a.s.},\label{eq:che:ros:instrument}
\end{align}
</math>
Because the joint distribution of <math>(\ex,\nu)</math> conditional on <math>\ez</math> is left completely unrestricted (other than \eqref{eq:che:ros:instrument}), one can find multiple triplets <math>(\vartheta,\sQ,\sR(\nu|\ex,\ez))</math> satisfying the maintained assumptions and with <math>\sM(c|\ex\in R_x,\ez;\vartheta)=\sP(c|\ex\in R_x,\ez)</math> for all <math>c\in\cY</math> and <math>R_x\subseteq\cX</math>, <math>\ez</math>-a.s.
<div id="fig:discrete:choice:endogenous" class="d-flex justify-content-center">
[[File:guide_d9532_fig_discrete_choice_endogenous.png | 700px | thumb | The set <math>\cE_\vartheta</math> in equation \eqref{eq:che:ros:E} and the corresponding admissible values for <math>(\ey,\ex)</math> as a function of <math>(\epsilon_1-\epsilon_3,\epsilon_2-\epsilon_3)</math> under the simplifying assumption that <math>\cX=\{x^1,x^2\}</math> and <math>\cY=\{1,2,3\}</math>.
The admissible values for <math>(\ey,\ex)</math> are <math>\{(c,x^1)\}</math> in the gray area, and <math>\{(c,x^2)\}</math> in the area with vertical lines.
Because the two areas overlap, the model has set-valued predictions for <math>(\ey,\ex)</math>. ]]
</div>
It is instructive to compare \eqref{eq:che:ros:model:distrib}-\eqref{eq:che:ros:instrument} with <ref name="mcf73"/> conditional logit.
Under the standard assumptions, <math>\ex\independent\nu</math> so that no instrumental variables are needed.
This yields <math>\sQ(\nu)=\sR(\nu|\ex)</math> <math>\ex</math>-a.s., and in addition <math>\sQ</math> is typically known, with corresponding simplifications in \eqref{eq:che:ros:model:distrib}.
The resulting system of equalities can be inverted under standard order and rank conditions to yield point identification of <math>\theta</math>.
Further insights can be gained by looking at [[#fig:discrete:choice:endogenous|Figure]].
As the value of <math>\ex</math> changes from <math>x^1</math> to <math>x^2</math>, the region of values where, say, alternative 1 is optimal changes.
When <math>\ex</math> is exogenous, say independent of <math>\nu</math>, this yields a system of equalities relating <math>(\theta,\sQ)</math> to the observed distribution <math>\sP(\ey,\ex)</math> which, as stated above, can be inverted to obtain point identification.
When <math>\ex</math> is endogenous, this reasoning breaks down because the conditional distribution <math>\sR(\nu|\ex,\ez)</math> may change across realizations of <math>\ex</math>.
[[#fig:discrete:choice:endogenous|Figure]] also offers an instructive way to connect Identification [[#IP:discrete:choice:endogenous |Problem]] with the identification problem studied in Section [[#subsubsec:man:tam02 |Semiparametric Binary Choice Models with Interval Valued Covariates]] (as well as with those in Sections [[#subsec:multiple:eq |Static, Simultaneous-Move Finite Games with Multiple Equilibria]]-[[#subsec:auctions |Auction Models with Independent Private Values]] below).
In the latter, the model has set-valued predictions for the ''outcome variable'' given realizations of the covariates and unobserved heterogeneity terms, which overlap across realizations of the unobserved heterogeneity terms.
In the problem studied here, the model has singleton-valued predictions for the outcome variable of interest <math>\ey</math> as a function of the observable explanatory variables <math>\ex</math> and unobservables <math>\nu</math>.
However, for given realization of <math>\nu</math>, the model admits ''sets'' of values for the ''endogenous variables'' <math>(\ey,\ex)</math>, which overlap across realizations of <math>\nu</math>.
Because the model is silent on the joint distribution of <math>(\ex,\nu)</math> (except for requiring that the marginal distribution of <math>\nu</math> does not depend on <math>\ez</math>), partial identification results.
It is possible to couple the maintained assumptions with the observed data to learn features of <math>(\theta,\sQ)</math>.
Because the observed choice <math>\ey</math> is assumed to maximize utility, for the data generating <math>(\theta,\sQ)</math> the model yields
<math display="block"></ref>, it follows that <math>(\vartheta,\tilde\sQ)</math> is observationally equivalent to <math>(\theta,\sQ)</math> if and only if
<math display="block">
\begin{align*}
\tilde\sQ(F|\ex,\ez)\ge \sP(\cE_\vartheta(\ey,\ex)\subseteq F|\ex,\ez),\forall F\in\cF,(\ex,\ez)\text{-a.s.}
\end{align*}
</math>
As the distribution of <math>\nu</math> is only restricted so that <math>\nu\independent\ez</math>, one can integrate both sides of the inequality with respect to <math>\ex</math>.
The final result follows because <math>\tilde\sQ</math> does not depend on <math>\ez</math>.}}
While Theorem [[#SIR:discrete:choice:endogenous |SIR-]] relies on checking inequality \eqref{eq:SIR:discrete:choice:endogenous} for all <math>F\in\cF</math>, the results in <ref name="che:ros:smo13"/>{{rp|at=Theorem 2}} and <ref name="mol:mol18"/>{{rp|at=Chapter 2}} can be used to obtain a smaller collection of sets over which to verify it.
In particular, if <math>\ex</math> has a discrete distribution, it suffices to use a finite collection of sets.
For example, in the case depicted in [[#fig:discrete:choice:endogenous|Figure]] with <math>\cX=\{x^1,x^2\}</math>, <ref name="che:ros:smo13"/>{{rp|at=Section 3.3 of the 2011 CeMMAP working paper version CWP39/11}} show that <math>\idr{\theta,\sQ}</math> is obtained by checking at most twelve inequalities in \eqref{eq:SIR:discrete:choice:endogenous}.
The left hand side of these inequalities is a linear function of six values that the distribution <math>\tilde\sQ</math> assigns to each of the component regions depicted in [[#fig:discrete:choice:endogenous|Figure]] (the one where <math>\cE_\vartheta(1,x^1)\cap\cE_\vartheta(1,x^2)</math> realizes; the one where <math>\cE_\vartheta(1,x^1)\cap\cE_\vartheta(3,x^2)</math> realizes; etc.)
Hence, in this example, <math>(\vartheta,\tilde\sQ)\in\idr{\theta,\sQ}</math> if and only if <math>\tilde\sQ</math> assigns to these six regions a probability mass such that for <math>\vartheta</math> the twelve inequalities characterized by <ref name="che:ros:smo13"/> hold.
'''Key Insight:'''
<i>A conceptual contribution of <ref name="che:ros:smo13"/> is to show that one can frame models with endogenous explanatory variables as ''incomplete'' models.
Incompleteness here results from the fact that the model does not specify how the endogenous variables <math>\ex</math> are determined.
One can then think of these as models with set-valued predictions for the endogeneous variables (<math>\ey</math> and <math>\ex</math> in this application), even though the outcome of the model (<math>\ey</math>) is uniquely predicted by the realization of the observed explanatory variables (<math>\ex</math>) and the unobserved heterogeneity terms (<math>\nu</math>).
Random set theory can again be leveraged to characterize sharp identification regions.
</i>
<ref name="che:ros19"/>{{rp|at=Chapter XXX in this Volume}} discuss related generalized instrumental variables models where random set methods are used to obtain characterizations of sharp identification regions in the presence of endogenous explanatory variables.
====<span id="subsubsec:BCMT"></span>Unobserved Heterogeneity in Choice Sets and/or Consideration Sets====
Compared to the general framework set forth at the beginning of Section [[#subsec:single:ag:RUM |Discrete Choice in Single Agent Random Utility Models]], as pointed out in <ref name="man77"><span style="font-variant-caps:small-caps">Manski, C.F.</span>  (1977): “The structure of random utility models”  ''Theory and Decision'', 8(3), 229--254.</ref>, often the researcher observes <math>(\ey_i,\ex_i)</math> but not <math>\eC_i</math>, <math>i=1,\dots,n</math>.
Even when <math>\eC_i</math> is observable, the researcher may be unaware of which of its elements the decision maker actually evaluates before selecting one.
In what follows, to shorten expressions, I refer to both the measurement problem of unobserved choice sets and the (cognitive) problem of limited consideration as “unobserved heterogeneity in choice sets.”
Learning features of preferences using discrete choice data in the presence of unobserved heterogeneity in choice sets is a formidable task.
When a decision maker chooses an alternative, this may be because her choice set equals the feasible set and the chosen alternative is the one yielding the highest utility.
Then observed choice reveals  preferences.
But it can also be that the decision maker has access to/considers only the chosen alternative (e.g., <ref name="blo:mar60"/>{{rp|at=p. 99}}).
Then observed choice is driven entirely by choice set composition, and is silent about preferences.
A plethora of scenarios between these extremes is possible, but the researcher does not know which has generated the observed data.
This fundamental identification problem calls either for restrictions on the random utility model and consideration set formation process, or for collection of richer data that eliminates unobserved heterogeneity in <math>\eC_i</math> or allows for enhanced modeling of it (see, e.g., <ref name="cap16"><span style="font-variant-caps:small-caps">Caplin, A.</span>  (2016): “Measuring and Modeling Attention” ''Annual  Review of Economics'', 8(1), 379--403.</ref>).
A sizable literature spanning behavioral economics, econometrics, experimental economics, marketing, microeconomics, and psychology, has put forward different models to formalize the complex process that leads to the formation of the set of alternatives that the agent considers or can choose from (see, e.g., <ref name="sim59"><span style="font-variant-caps:small-caps">Simon, H.A.</span>  (1959): “Theories of Decision-Making in Economics and  Behavioral Science” ''The American Economic Review'', 49(3), 253--283.</ref><ref name=" how63"><span style="font-variant-caps:small-caps">Howard, J.A.</span>  (1963): ''Consumer behavior: application of  theory''. New York: McGraw-Hill, Includes indexes.</ref><ref name=" tve72"><span style="font-variant-caps:small-caps">Tversky, A.</span>  (1972): “Elimination by aspects: A theory of choice”  ''Psychological review'', 79(4), 281.</ref>{{rp|at=for early contributions}}).
<ref name="man77"/> proposes both a general econometric model where decision makers draw choice sets from an unknown distribution, as well as a specific model of choice set formation, independent from preferences, and studies their implications for the distributional structure of random utility models.<ref group="Notes" >The specific model in {{ref|name=man77}}{{rp|at=Section II-A}} is often used in applications.
It posits that each alternative <math>c\in\cY</math> enters the decision maker’s choice set with probability <math>\phi_c</math>, independently of the other alternatives.
The probability <math>\phi_c</math> may depend on observable individual characteristics, and <math>\phi_c=1</math> for at least one option <math>c\in\cY</math> (the “default” good).</ref>
However, assumptions about the choice set formation process are often rooted in a desire to achieve point identification rather than in information contained in the model or observed data.<ref group="Notes" >These assumptions are akin to assumptions about selection mechanisms in models with multiple equilibria.
The latter are discussed further below in Section [[#subsubsec:tam03:cil:tam09 |An Inference Approach Robust to the Presence of Multiple Equilibria]], along with their criticisms.</ref>
It is then important to ask what can be learned about decision maker’s preferences under minimal assumptions on the choice set formation process.
Allowing for unrestricted dependence between choice sets and preferences, while challenging for identification analysis, is especially relevant.
Indeed, decision makers' unobserved attributes may determine both their preferences and which items in the feasible set they pay attention to or are available to them (e.g., through unobserved liquidity constraints, unobserved characteristics such as religious preferences in the context of school choice, or behavioral phenomena such as aversion to extremes, salience, etc.).
Here I use the framework put forward by <ref name="bar:cou:mol:tei18"><span style="font-variant-caps:small-caps">Barseghyan, L., M.Coughlin, F.Molinari,  <span style="font-variant-caps:normal">and</span> J.C.  Teitelbaum</span>  (2019): “Heterogeneous Choice Sets and Preferences” available  at [https://arxiv.org/abs/1907.02337 https://arxiv.org/abs/1907.02337].</ref> to study identification of discrete choice models with unobserved heterogeneity in choice sets and preferences.
\begin{IP}[Discrete Choice with Unobserved Heterogeneity in Choice Sets and Preferences]\label{IP:BCMT}
Let <math>(\ey,\ex)\sim \sP</math> be observable random variables in <math>\cY\times\cX</math>.
Assume that there exists a real valued function <math>g</math>, which for simplicity I posit known up to parameter <math>\delta\in\Delta\subset\R^m</math> and continuous in its second argument, such that <math>\bu_i(c)=g(\ex_{ic},\nu_i;)</math>, <math>(\ex_{ic},\nu_i)</math>-a.s., for all <math>c\in\cY,i\in\cI</math>, where <math>\ex_{ic}</math> denotes the vectors of attributes relevant to alternative <math>c</math>, and includes attributes that are alternative invariant and ones that are alternative specific (respectively, <math>\ex_i^1</math> and <math>\ex_{ic}^2</math> in the general notation laid out in Section [[#subsec:single:ag:RUM |Discrete Choice in Single Agent Random Utility Models]]).
Suppose that <math>\ey=\arg\max_{c\in \eC}g(\ex_c,\nu;\delta)</math>, where ties are assumed to occur with probability zero and <math>\eC</math> is an unobservable choice set drawn from the subsets of <math>\cY</math> according to some unknown probability distribution.
Suppose <math>\sR(|\eC|\ge\kappa)=1</math> for some known constant <math>\kappa\ge 2</math>.
Let <math>\sQ</math> denote the distribution of <math>\nu</math>, and assume that it is known up to a finite dimensional parameter <math>\gamma\in\Gamma\subset\R^k</math>.
For simplicity, assume that <math>\nu\independent\ex</math>.<ref group="Notes" >This assumption can be relaxed as discussed in {{ref|name=mat07}}. The procedure proposed here can also be adapted to allow for endogenous explanatory variables as in Section [[#subsubsec:CRS |Endogenous Explanatory Variables]] by combining the results in {{ref|name=bar:cou:mol:tei18}} with those in {{ref|name=che:ros:smo13}}.</ref>
In the absence of additional information, what can the researcher learn about <math>\theta\equiv[\delta;\gamma]</math>?
\qedex
}}
<div id="fig:set_valued_BCMT" class="d-flex justify-content-center">
[[File:guide_d9532_fig_set_valued_BCMT.png | 700px | thumb | Predicted value of <math>\ey</math> in Identification [[#IP:BCMT |Problem]] as a function of <math>\nu</math> for <math>\kappa=|\cY|-1</math>. In this case, <math>\eC=\cY\setminus\{c\}</math> for some <math>c\in\cY</math>, and the model predicts either the first or the second best alternative in <math>\cY</math>. ]]
</div>
The model just laid out has set valued predictions for the decision maker's optimal choice, because different alternatives might be optimal depending on which choice set the decision maker draws.
[[#fig:set_valued_BCMT|Figure]], which is based on the analysis in <ref name="bar:cou:mol:tei18"/>, illustrates the set valued predictions in a stylized example.
In the figure <math>\nu</math> is assumed to be a scalar; <math>\bar{\nu}_{j,m}</math> denotes the threshold value of <math>\nu</math> above which <math>c_j</math> yields higher utility than <math>c_m</math> and below which <math>c_m</math> yields higher utility than <math>c_j</math> (the threshold's dependence on <math>(\ex;\delta)</math> is suppressed for notational convenience).
Consider the case that <math>\nu\in[\bar{\nu}_{2,3},\bar{\nu}_{1,2}]</math>, so that <math>c_2</math> is the option yielding the highest utility among all options in <math>\cY</math>.
When <math>\kappa=|\cY|-1</math>, the agent may draw a choice set that does not include one of the alternatives in <math>\cY</math>.
If the excluded alternative is not <math>c_2</math> (or if <math>\eC</math> realizes equal to <math>\cY</math>), the model predicts that the decision maker chooses <math>c_2</math>.
If <math>\eC</math> realizes equal to <math>\cY\setminus\{c_2\}</math>, the model predicts that the decision maker chooses the second best: <math>c_1</math> if <math>\nu\in[\bar{\nu}_{1,3},\bar{\nu}_{1,2}]</math>, and <math>c_3</math> if <math>\nu\in[\bar{\nu}_{2,3},\bar{\nu}_{1,3}]</math>.
Conversely, observation of <math>\ey=c_1</math> allows one to conclude that <math>\nu\ge\bar\nu_{1,3}</math>, and <math>\ey=c_2</math> that <math>\nu\ge\bar\nu_{2,4}</math>, with <math>\bar\nu_{2,4}\le\bar\nu_{1,3}</math>, and these regions of possible realizations of <math>\nu</math>  overlap.
Why does this set valued prediction hinder point identification?
The reason is similar to the explanation given for Identification [[#IP:man:tam02_binary |Problem]]:
the distribution of the observable data relates to the model structure in an ''incomplete'' manner, because the distribution of the (unobserved) choice sets is left completely unspecified.
<ref name="bar:cou:mol:tei18"/> show that one can find multiple candidate distributions for <math>\eC</math> and parameter vectors <math>\vartheta</math>, such that together they yield a model implied distribution for <math>\ey|\ex</math> that matches <math>\sP(\ey|\ex)</math>, <math>\ex</math>-a.s.
<ref name="bar:cou:mol:tei18"/> propose to work directly with the set of model implied optimal choices given <math>(\ex,\nu)</math> associated with each possible realization of <math>\eC</math>, which is depicted in [[#fig:set_valued_BCMT|Figure]] for a specific example.
The key idea is that, according to the model, the observed choice maximizes utility among the alternatives in <math>\eC</math>.
Hence, for the data generating value of <math>\theta</math>, it belongs to the set of model implied optimal choices.
With this, the authors are able to characterize <math>\idr{\theta}</math> through [[guide:379e0dcd67#thr:artstein |Theorem]] as the collection of parameter vectors that satisfy a finite number of conditional moment inequalities.
'''Key Insight:'''<i>
<ref name="bar:cou:mol:tei18"/> show that working directly with the set of model implied optimal choices given <math>(\ex,\nu)</math> allows one to dispense with considering all possible distributions of choice sets that are allowed for in Identification [[#IP:BCMT |Problem]] to complete the model.
Such distributions may depend on <math>\nu</math> even after conditioning on observables and may constitute an infinite dimensional nuisance parameter, which creates great difficulties for the computation of <math>\idr{\theta}</math> and for inference.
</i>
Identification [[#IP:BCMT |Problem]] sets up a structure where preferences include idiosyncratic components <math>\nu</math> that are decision maker specific and can depend on <math>\eC</math>, and where heterogeneity in <math>\eC</math> can be driven either by a measurement problem, or by the decision maker's limited attention to the options available to her.
However, for computational and finite sample inference reasons, it restricts the family of utility functions to be known up to a finite dimensional parameter vector <math>\delta</math>.
A rich literature in decision theory has analyzed a different framework, where the decision maker's choice set is observable to the researcher, but the decision maker does not consider all alternatives in it (for recent contributions see, e.g., <ref name="mas:nak:ozb12"><span style="font-variant-caps:small-caps">Masatlioglu, Y., D.Nakajima,  <span style="font-variant-caps:normal">and</span> E.Y. Ozbay</span>  (2012):  “Revealed Attention” ''American Economic Review'', 102(5), 2183--2205.</ref><ref name="man:mar14"><span style="font-variant-caps:small-caps">Manzini, P.,  <span style="font-variant-caps:normal">and</span> M.Mariotti</span>  (2014): “Stochastic Choice  and Consideration Sets” ''Econometrica'', 82(3), 1153--1176.</ref>).
In this literature, the utility function is left completely unspecified, so that interest focuses on identification of preference orderings of the available options.
Unobserved heterogeneity in preferences is assumed away, so that heterogeneous choice is driven by randomness in consideration sets.
If the consideration set formation process is left unspecified or is subject only to weak restrictions, point identification of the preference orderings is not possible even if preferences are homogeneous and the researcher observes a representative agent facing multiple distinct choice problems with varying choice sets.
<ref name="cat:ma:mas:sul17"><span style="font-variant-caps:small-caps">Cattaneo, M.D., X.Ma, Y.Masatlioglu,  <span style="font-variant-caps:normal">and</span> E.Suleymanov</span>  (2019): “A Random Attention Model” ''Journal of Political Economy'',  forthcoming, available at [https://arxiv.org/abs/1712.03448 https://arxiv.org/abs/1712.03448].</ref> propose a general model for the consideration set formation process where the only restriction is a weak and intuitive monotonicity condition: the probability that any particular consideration set is drawn does not decrease when the number of possible consideration sets decreases.
Within this framework, they provide revealed preference theory and testable implications for observable choice probabilities.
\begin{IP}[Homogeneous Preference Orderings in Random Attention Models]\label{IP:RAM}
Let <math>(\ey,\eC)\sim\sP</math> be a pair of observable random variable and random set in <math>\cY\times\mathfrak{D}</math>, where <math>\mathfrak{D}=\{D:D\subseteq\cY\}\setminus\emptyset</math>.<ref group="Notes" >
Here I omit observable covariates <math>\ex</math> for simplicity.</ref>
Let <math>\mu:\mathfrak{D}\times\mathfrak{D}\to[0,1]</math> denote an ''attention rule'' such that <math>\mu(A|G)\ge 0</math> for all <math>A\subseteq G</math>, <math>\mu(A|G)=0</math> for all <math>A\nsubseteq G</math>, and <math>\sum_{A\subset G}\mu(A|G)=1</math>, <math>A,G\in\mathfrak{D}</math>.
Assume that for any <math>b\in G\setminus A</math>,
<math display="block">
\begin{align}
\label{eq:RAM:monotonicity}
\mu(A|G)\le\mu(A|G\setminus\{b\}),
\end{align}
</math>
and that the decision maker has a strict preference ordering <math>\succ</math> on <math>\cY</math>.<ref group="Notes" >
Specifically, <math>\succ</math> is an asymmetric, transitive and complete binary relation.</ref>
In the absence of additional information, what can the researcher learn about <math>\succ</math>?
\qedex
}}
<ref name="cat:ma:mas:sul17"/> posit that an observed distribution of choice <math>\sP(\ey|\eC)</math> has a random attention representation, and hence they name it a ''random attention model'', if there exists a preference ordering <math>\succ</math> over <math>\cY</math> and a monotonic attention rule <math>\mu</math> such that
<math display="block">
\begin{align}
\cp(c|G)\equiv\sP(\ey=c|\eC=G)=\sum_{A\subseteq G}\one(c\text{ is }\succ\text{-best in }A)\mu(A|G),\forall c\in G,\forall G\in\mathfrak{D}.\label{eq:RAM}
\end{align}
</math>
The sharp identification region for the preference ordering, denoted <math>\idr{\succ}</math> henceforth, is given by the collection of preference orderings for which one can find a monotonic attention rule to pair it with, so that \eqref{eq:RAM} holds.
Of course, an observed distribution of choice can be represented by multiple preference orderings and attention rules.
The authors, however, show in their Lemma 1 that if for ''some'' <math>G\in\mathfrak{D}</math> with <math>\{b,c\}\in G</math>,
<math display="block">
\begin{align}
\cp(c|G) > \cp(c|G\setminus \{b\}),\label{eq:RAM_violation_reg}
\end{align}
</math>
then <math>c \succ b</math> for any <math>\succ</math> for which one can find a monotonic attention rule <math>\mu</math> such that \eqref{eq:RAM} holds.
Because of preference transitivity, one can also learn <math>a\succ b</math> if in addition to the above condition one has <math>\cp(a|G^\prime) > \cp(a|G^\prime\setminus \{c\})</math> for some <math>c\in G^\prime</math> and <math>G^\prime\in\mathfrak{D}</math>.
The authors further show in their Theorem 1 that the collection of preference relations associated with all possible instances of \eqref{eq:RAM_violation_reg} for all <math>c\in G</math> and <math>G\in\mathfrak{D}</math> yield all information about preferences given the observed choice probabilities.
This yields a system of linear inequalities in <math>\cp(c|G)</math> that fully characterize <math>\idr{\succ}</math>.
Let <math>\vec{\cp}</math> denote the vector with elements <math>[\cp(c|G):c\in G,G\in\mathfrak{D}]</math> and <math>\Pi_\succ</math> denote a conformable matrix collecting the constraints on <math>\sP(\ey|\eC)</math> embodied in \eqref{eq:RAM_violation_reg} and its generalizations based on transitive closure. Then
<math display="block">
\begin{align}
\idr{\succ}=\{\succ: \Pi_\succ \vec{\cp}\le 0\}.\label{eq:SIR:RAM}
\end{align}
</math>
The authors show that for any given preference ordering <math>\succ</math>, the matrix <math>\Pi_\succ</math> characterizing whether <math>\succ \in \idr{\succ}</math> through the system of linear inequalities in \eqref{eq:SIR:RAM} is unique, and they provide a simple algorithm to compute it.
They also show that mild additional assumptions, such as, for example, that decision makers facing binary choice sets pay attention to both alternatives frequently enough, can substantially increase the informational content of the data (i.e., substantially tighten <math>\idr{\succ}</math>).
'''Key Insight:'''<i>
<ref name="cat:ma:mas:sul17"/> show that learning features of preference orderings in Identification [[#IP:RAM |Problem]] requires the existence in the data of choice problems where the choice probabilities satisfy \eqref{eq:RAM_violation_reg}.
The latter is a violation of the principle of “regularity” <ref name="luc:sup65"><span style="font-variant-caps:small-caps">Luce, R.D.,  <span style="font-variant-caps:normal">and</span> P.Suppes</span>  (1965): “Chapter 19:  Preference, Utility, and Subjective Probability” in ''Handbook of  Mathematical Psychology'', vol.3, pp. 249--410.</ref> according to which the probability of choosing an alternative from any set is at least as large as the probability of choosing it from any of its supersets.
Regularity is a monotonicity property of choice probabilities, and it is implied by a wide array of models of decision making.
The monotonicity of attention rules in \eqref{eq:RAM:monotonicity} can be viewed as regularity of the process that chooses a consideration set from the subsets of the choice set.
<ref name="cat:ma:mas:sul17"/> show that it is implied by various models of limited attention.
While the violation required in \eqref{eq:RAM_violation_reg} is weak in that it needs only to occur for some <math>G</math>, it sheds a different light on the severity of the identification problem described at the beginning of this section.
Regularity of choice probabilities and (partial) identification of preference orderings can co-exist only under restrictions on the consideration set formation process that are stronger than the regularity of attention rules in \eqref{eq:RAM:monotonicity}.
</i>
<ref name="aba:ada18"><span style="font-variant-caps:small-caps">Abaluck, J.,  <span style="font-variant-caps:normal">and</span> A.Adams</span>  (2018): “What Do Consumers  Consider Before They Choose? Identification from Asymmetric Demand  Responses” available at  [https://abiadams.com/wp-content/uploads/2018/06/DiscreteChoiceInattention_master.pdf https://abiadams.com/wp-content/uploads/2018/06/DiscreteChoiceInattention_master.pdf].</ref> and <ref name="bar:mol:thi19"><span style="font-variant-caps:small-caps">Barseghyan, L., F.Molinari,  <span style="font-variant-caps:normal">and</span> M.Thirkettle</span>  (2019):  “Discrete Choice under Risk with Limited Consideration” available at  [https://arxiv.org/abs/1902.06629 https://arxiv.org/abs/1902.06629].</ref> provide different sets of sufficient conditions for point identification of models of limited consideration.
In both cases, the authors posit specific models of consideration set formation and provide sufficient conditions for point identification under exclusion and large support assumptions.
<ref name="aba:ada18"/> assume that unobserved heterogeneity in preferences and in consideration sets are independent.
They exploit violations of Slutsky symmetry that result from inattention, assuming that for each alternative there is an observable characteristic with large support that does not affect the consideration probability of the other options.
<ref name="bar:mol:thi19"/> provide a thorough analysis of the extent of dependency between consideration and preferences under which semi-nonparametric point identification of the distribution of preferences and consideration attains.
They exploit a requirement of standard economic theory --the Spence-Mirrlees single crossing property of utility functions-- coupled with a mild strengthening of the classic conditions for semi-nonparametric identification of discrete choice models with full consideration and identical choice sets (see, e.g., <ref name="mat07"/>), assuming that there is at least one decision maker-specific characteristic with large support that affects utility but not consideration.
====<span id="subsubsec:counterfactual:choice:set"></span>Prediction of Choice Behavior with Counterfactual Choice Sets====
Building on <ref name="mar60"/>, <ref name="man07b"><span style="font-variant-caps:small-caps">Manski, C.F.</span>  (2007b): “Partial Indentification of Counterfactual Choice  Probabilities” ''International Economic Review'', 48(4), 1393--1410.</ref> studies a question related but distinct from those in Identification [[#IP:BCMT|problem]] - [[#IP:RAM |problem]].
He is concerned with prediction of choice behavior when decision makers face counterfactual choice sets.
<ref name="man07b"/> frames this question as one of predicting treatment response (see Section [[guide:Ec36399528#subsec:programme:eval |Treatment Effects with and without Instrumental Variables]]).
Here the collection of potential treatments is given by <math>\mathfrak{D}</math>, the nonempty subsets of the universe of feasible alternatives <math>\cY</math>, and the response function specifies the alternative chosen by a decision maker when facing choice set <math>G\in\mathfrak{D}</math>.
<ref name="man07b"/> assumes that the researcher observes realized choice sets and chosen alternatives, <math>(\ey,\eC)\sim\sP</math>.<ref group="Notes" >Here I suppress covariates for simplicity.</ref>
Under the standard assumptions laid out at the beginning of Section [[#subsec:single:ag:RUM |Discrete Choice in Single Agent Random Utility Models]], specifically if utility functions are (say) linear in <math>\epsilon_{ic}</math> and the distribution of <math>\epsilon_{ic}</math> is (say) Type I extreme value or multivariate normal, prediction of choice behavior with counterfactual choice sets is immediate (and point identified).
<ref name="man07b"/>, however, leaves utility functions completely unspecified, and in fact works directly with preference orderings, which he labels decision maker’s ''types''.
He places no restriction on the distribution of preference types, except requiring that they are independent of the observed choice sets.
<ref name="man07b"/> shows that under these rather weak assumptions, the distribution of predicted choices from counterfactual choice sets can be partially identified, and characterized as the solution to linear programs.
Specifically, let <math>\ey^*(G)</math> denote the decision maker's optimal choice when facing choice set <math>G\in\mathfrak{D}</math>.
Assume <math>\ey^*(\cdot)\independent\eC</math>, and let <math>y_k</math> denote the choice function for a decision maker of type <math>k</math> --that is, a decision maker with a specific preference ordering labeled <math>k</math>.
One example of such preference ordering might be <math>c_1\succ c_2\succ\dots\succ c_{|\cY|}</math>.
If a decision maker of this type faces, say, choice set <math>G=\{c_2,c_3,c_4\}</math>, then she chooses alternative <math>c_2</math>.
Let <math>K</math> denote the set of logically possible types, and <math>\theta_k</math> the probability that a decision maker in the population is of type <math>k</math>.
Suppose that the researcher posits a behavioral model specifying <math>K</math>, <math>\{y_k,k=1,\dots,K\}</math>, and restrictions that constrain <math>\theta</math> to lie in some specified set of distributions.
Let <math>\Theta</math> denote the values of <math>\vartheta</math> that satisfy these requirements plus the conditions <math>\vartheta_k\ge 0</math> for all <math>k\in K</math> and <math>\sum_{k\in K}\vartheta_k=1</math>.
Then for any <math>c\in\cY</math> and <math>\vartheta\in\Theta</math>, the model predicts
<math display="block">
\begin{align*}
\sQ(\ey^*(G)=c)=\sum_{k\in K}\one(y_k(G)=c)\vartheta_k.
\end{align*}
</math>
How can one partially identify this probability based on the observed data?
Suppose <math>\eC</math> is observed to take realizations <math>D_1,\dots,D_m</math>.
Then the data reveal
<math display="block">
\begin{align*}
\sP(\ey(D_j)=d_j)=\sum_{k\in K}\one(y_k(D_j)=d_j)\theta_k\forall d_j\in D_j,j=1,\dots,m.\end{align*}
</math>
This yields that the sharp identification region for <math>\theta</math> is
<math display="block">
\begin{align*}
\idr{\theta}=\{\vartheta\in\Theta:\sP(\ey(D_j)=d_j)=\sum_{k\in K}\one(y_k(D_j)=d_j)\vartheta_k\forall d_j\in D_j,j=1,\dots,m\}.
\end{align*}
</math>
If the behavioral model is correctly specified, <math>\idr{\theta}</math> is non-empty.
In turn, the sharp identification region for each choice probability is
<math display="block">
\begin{align*}
\idr{\sQ(\ey^*(G)=c)}=\left\{\sum_{k\in K}\one(y_k(G)=c)\vartheta_k:\vartheta\in\idr{\theta}\right\},
\end{align*}
</math>
and its extreme points can be obtained by solving linear programs.
<ref name="kit:sto19"><span style="font-variant-caps:small-caps">Kitamura, Y.,  <span style="font-variant-caps:normal">and</span> J.Stoye</span>  (2019): “Nonparametric Counterfactuals in Random Utility  Models” available at [https://arxiv.org/abs/1902.08350 https://arxiv.org/abs/1902.08350].</ref> provide closely related sharp bounds on features of counterfactual choices in the nonparametric random utility model of demand, where observable choices are repeated cross-sections and one allows for unrestricted, unobserved heterogeneity.
Their approach builds on the work of <ref name="kit:sto18"><span style="font-variant-caps:small-caps">Kitamura, Y.,  <span style="font-variant-caps:normal">and</span> J.Stoye</span>  (2018): “Nonparametric Analysis  of Random Utility Models” ''Econometrica'', 86(6), 1883--1909.</ref>, who test weather agents' behavior is consistent with the Axiom of Revealed Stochastic Preference (SARP) in a random utility model in which the utility function of each consumer over commodity bundles is assumed to satisfy only the basic restriction that “more is better” with no satiation. 
Because the testing exercise is to be carried out using repeated cross-sections data, the authors maintain the assumption that multiple populations of consumers who face distinct choice sets have the same distribution of preferences. 
With this structure in place, de facto the task is to test the full implications of rationality without functional form restrictions.
<ref name="kit:sto18"/>’s approach is based on several novel ideas. 
As a first step, they leverage an earlier insight of <ref name="mcf05"><span style="font-variant-caps:small-caps">McFadden, D.L.</span>  (2005): “Revealed Stochastic Preference: A Synthesis”  ''Economic Theory'', 26(2), 245--264.</ref> to discretize the data without loss of information, so that they can define a large but finite set of rational preferences types. 
As a second step, they show that this implies that rationality can be tested by checking whether observed behavior lies in a cone corresponding to positive linear combinations of preference types. 
While the problem is discrete, its dimension is at first sight  prohibitive. 
Nonetheless, Kitamura and Stoye are able to develop novel computational methods that render the problem tractable. 
They apply their method to the U.K. Household Expenditure Survey, adapting to their framework results on nonparametric instrumental variable analysis by <ref name="imb:new09"><span style="font-variant-caps:small-caps">Imbens, G.W.,  <span style="font-variant-caps:normal">and</span> W.K. Newey</span>  (2009): “Identification and  Estimation of Triangular Simultaneous Equations Models Without Additivity”  ''Econometrica'', 77(5), 1481--1512.</ref> so that they can handle price endogeneity.
<ref name="kam18"><span style="font-variant-caps:small-caps">Kamat, V.</span>  (2018): “Identification with Latent Choice Sets”  available at [https://arxiv.org/abs/1711.02048 https://arxiv.org/abs/1711.02048].</ref> builds on <ref name="man07b"/> to learn program effects when agents are randomly assigned to control or treatment.
The treatment group is provided access to the program, while the control group is not.
However, members of the control group may receive access to the program from outside the experiment, leading to noncompliance with the randomly assigned treatment.
The researcher wants to learn about the average effect of program access on the decision to participate in the program and on the subsequent outcome.
While sufficiently rich data may allow the researcher to learn these effects, <ref name="kam18"/> is concerned with the identification problem that arises when the researcher only observes the treatment assignment status, the program participation decision, and the outcome, but not the receipt of program access for every agent.
<ref name="kam18"/> formalizes this problem as one where the received treatment is selected from a choice set that depends on the assigned treatment and is unobservable to the researcher, and the agents optimally choose whether to participate in the program by maximizing their utility function over their choice set.
Importantly, the utility functions are not subject to parametric restrictions, similarly to <ref name="man07b"/>.
But while <ref name="man07b"/> assumed independence of choice sets and preference types, <ref name="kam18"/> allows them to be arbitrarily dependent on each other, as in <ref name="bar:cou:mol:tei18"/>.
<ref name="kam18"/> approach leverages specific assumptions on random assignment of treatments and on compliance (or lack thereof) of participants to obtain nonparametric bounds on the treatment effects of interest that can be characterized using tractable linear programs.
===<span id="subsec:multiple:eq"></ref> and <ref name="cil:tam09"/> substantially enlarge the scope of partial identification analysis of structural models by showing how to apply it to learn features of payoff functions in static, simultaneous-move finite games of complete information with multiple equilibria.
<ref name="ber:tam06"><span style="font-variant-caps:small-caps">Berry, S.T.,  <span style="font-variant-caps:normal">and</span> E.Tamer</span>  (2006): “Identification in  Models of Oligopoly Entry” in ''Advances in Economics and Econometrics:  Theory and Applications, Ninth World Congress'', ed. by R.Blundell, W.K.  Newey,  <span style="font-variant-caps:normal">and</span> T.E. Persson, vol.2 of ''Econometric Society  Monographs'', p. 46–85. Cambridge University Press.</ref> extend the approach and considerations that follow to games of incomplete information.
To start, here I focus on two-player entry games with complete information.<ref group="Notes" >Completeness of information is motivated by the idea that firms in the industry have settled in a long-run equilibrium, and have detailed knowledge of both their own and their rivals' profit functions.</ref>
{{proofcard|Identification Problem (Complete Information Two Player Entry Game)|IP:entry_game|Let <math>(\ey_1,\ey_2,\ex_1,\ex_2)\sim\sP</math> be observable random variables in <math>\{0,1\}\times\{0,1\}\times\R^d\times\R^d</math>, <math>d < \infty</math>.
Suppose that <math>(\ey_1,\ey_2)</math> result from simultaneous move, pure strategy Nash play (PSNE) in a game where the payoffs are <math>\bu_j(\ey_j,\ey_{3-j},\ex_j;\beta_j,\delta_j)\equiv \ey_j(\ex_j\beta_j+\delta_j\ey_{3-j}+\eps_j)</math>, <math>j=1,2</math> and the strategies are “enter” (<math>\ey_j=1</math>) or “stay out”(<math>\ey_j=0</math>).
Here <math>(\ex_1,\ex_2)</math> are observable payoff shifters, <math>(\eps_1,\eps_2)</math> are payoff shifters observable to the players but not to the econometrician, <math>\delta_1\le 0,\delta_2\le 0</math> are interaction effect parameters, and <math>\beta_1,\beta_2</math> are parameter vectors in <math>B\subset\R^d</math> reflecting the effect of the observable covariates on payoffs.
Each player enters the market if and only if entering yields non-negative payoff, so that <math>\ey_j=\one(\ex_j\beta_j+\delta_j\ey_{3-j}+\eps_j\ge 0)</math>.
For simplicity, assume that <math>\eps\equiv(\eps_1,\eps_2)</math> is independent of <math>\ex\equiv(\ex_1,\ex_2)</math> and has bivariate Normal distribution with mean vector zero, variances equal to one (a normalization required by the threshold crossing nature of the model), and correlation <math>\rho\in [-1,1]</math>.
In the absence of additional information, what can the researcher learn about <math>\theta=[\delta_1\delta_2\beta_1\beta_2\rho]</math>?|}}
From the econometric perspective, this is a generalization of a standard discrete choice model to a bivariate simultaneous response model which yields a stochastic representation of equilibria in a two player, two action game.
Generically, for a given value of <math>\theta</math> and realization of the payoff shifters, the model just laid out admits multiple equilibria (existence of PSNE is guaranteed because the interaction parameters are non-positive).
In other words, it yields set valued predictions as depicted in [[#fig:set_valued_pred:tam03|Figure]].<ref group="Notes" >This figure is based on Figure 1 in {{ref|name=tam03}}.</ref>
Why does this set valued prediction hinder point identification?
Intuitively, the challenge can be traced back to the fact that for different values of <math>\theta\in\Theta</math>, one may find different ways to assign the probability mass in <math>[-\ex_1\beta_1,-\ex_1\beta_1-\delta_1)\times [-\ex_2\beta_2,-\ex_2\beta_2-\delta_2)</math> to <math>(0,1)</math> and <math>(1,0)</math>, so as to match the observed distribution <math>\sP(\ey_1,\ey_2|\ex_1,\ex_2)</math>.
More formally, for fixed <math>\vartheta\in\Theta</math> and given <math>(\ex,\eps)</math> and <math>(y_1,y_2)\in\{0,1\}\times\{0,1\}</math>, let
<math display="block">
\begin{align*}
\cE_\vartheta[(1,0),(0,1);\ex]&\equiv[-\ex_1\beta_1,-\ex_1\beta_1-\delta_1)\times [-\ex_2\beta_2,-\ex_2\beta_2-\delta_2),\\
\cE_\vartheta[(y_1,y_2);\ex]&\equiv\{(\eps_1,\eps_2):(y_1,y_2)\text{is the unique equilibrium}\},
\end{align*}
</math>
so that in [[#fig:set_valued_pred:tam03|Figure]] <math>\cE_\vartheta[(1,0),(0,1);\ex]</math> is the gray region, <math>\cE_\vartheta[(0,1);\ex]</math> is the dotted region, etc.
Let <math>\sR(y_1,y_2|\ex,\eps)</math> be a ’'selection mechanism'' that assigns to each possible outcome of the game <math>(y_1,y_2)\in\{0,1\}\times\{0,1\}</math> the probability that it is played conditional on observable and unobservable payoff shifters.
In order to be ''admissible'', <math>\sR(y_1,y_2|\ex,\eps)</math> must be such that <math>\sR(y_1,y_2|\ex,\eps)\ge 0</math> for all <math>(y_1,y_2)\in\{0,1\}\times\{0,1\}</math>, <math>\sum_{(y_1,y_2)\in\{0,1\}\times\{0,1\}}\sR(y_1,y_2|\ex,\eps)=1</math>, and
<math display="block">
\begin{align}
\forall \eps\in\cE_\vartheta[(1,0),(0,1);\ex],&\sR(0,0|\ex,\eps)=\sR(1,1|\ex,\eps)=0 \label{eq:games:sel:mec:1}\\
\forall \eps\in\cE_\vartheta[(y_1,y_2);\ex],&\sR(\tilde y_1,\tilde y_2|\ex,\eps)=0 \forall(\tilde y_1,\tilde y_2)\in\{0,1\}\times\{0,1\}\text{s.t. }(\tilde y_1,\tilde y_2)\neq(y_1,y_2).\label{eq:games:sel:mec:2}
\end{align}
</math>
Let <math>\Phi_r</math> denote the probability distribution of a bivariate Normal random variable with zero means, unit variances, and correlation <math>r\in[-1,1]</math>.
Let <math>\sM(y_1,y_2|\ex)</math> denote the model predicted probability that the outcome of the game realizes equal to <math>(y_1,y_2)</math>.
Then the model yields
<math display="block">
\begin{align}
\sM(y_1,y_2|\ex)&=\int\sR(y_1,y_2|\ex,\eps)d\Phi_r\notag\\
&=\int_{(\eps_1,\eps_2)\in\cE_\vartheta[(y_1,y_2);\ex]}d\Phi_r+\int_{\eps_1,\eps_2\in\cE_\vartheta[(1,0),(0,1);\ex]}\sR(y_1,y_2|\ex,\eps)d\Phi_r.\label{eq:games_model:pred}
\end{align}
</math>
Because <math>\sR(\cdot|\ex,\eps)</math> is left completely unspecified, other than the basic restrictions listed above that render it an admissible selection mechanism, one can find multiple values for <math>(\vartheta,\sR(\cdot|\ex,\eps))</math> such that <math>\sM(y_1,y_2|\ex)=\sP(y_1,y_2|\ex)</math> for all <math>(y_1,y_2)\in\{0,1\}\times\{0,1\}</math> <math>\ex</math>-a.s.
<div id="fig:set_valued_pred:tam03" class="d-flex justify-content-center">
[[File:guide_d9532_fig_set_valued_pred_tam03.png | 700px | thumb | PSNE outcomes of the game in Identification [[#IP:entry_game |Problem]] as a function of <math>(\eps_1,\eps_2)</math>. ]]
</div>
Multiplicity of equilibria implies that the mapping from the model's exogenous variables <math>(\ex_1,\ex_2,\eps_1,\eps_2)</math> to outcomes <math>(\ey_1,\ey_2)</math> is a correspondence rather than a function.
This violates the classical “principal assumptions”  or  “coherency conditions” for simultaneous discrete response models discussed extensively in the econometrics literature (e.g., <ref name="hec78"><span style="font-variant-caps:small-caps">Heckman, J.J.</span>  (1978): “Dummy Endogenous Variables in a Simultaneous  Equation System” ''Econometrica'', 46(4), 931--959.</ref><ref name="gou80"><span style="font-variant-caps:small-caps">Gourieroux, C., J.J. Laffont,  <span style="font-variant-caps:normal">and</span> A.Monfort</span>  (1980):  “Coherency Conditions in Simultaneous Linear Equation Models with Endogenous  Switching Regimes” ''Econometrica'', 48, 675--695.</ref><ref name="sch81"><span style="font-variant-caps:small-caps">Schmidt, P.</span>  (1981): “Constraints on the Parameters in Simultaneous  Tobit and Probit Models” in ''Structural Analysis of Discrete Data and  Econometric Applications'', ed. by C.F. Manski,  <span style="font-variant-caps:normal">and</span> D.McFadden,  chap.12, pp. 422--434. MIT Press.</ref><ref name="mad83"><span style="font-variant-caps:small-caps">Maddala, G.S.</span>  (1983): ''Limited-Dependent and Qualitative  Variables in Econometrics''. Cambridge University Press, New York.</ref><ref name="blu:smi94"><span style="font-variant-caps:small-caps">Blundell, R.,  <span style="font-variant-caps:normal">and</span> J.R. Smith</span>  (1994): “Coherency and  Estimation in Simultaneous Models with Censored or Qualitative Dependent  Variables” ''Journal of Econometrics'', 64, 355--373.</ref>).
Such coherency conditions require the existence of a unique reduced form, mapping the model's exogenous variables and parameters to a
unique realization of the endogenous variable; hence, they constrain the model to be recursive or triangular in nature.
As pointed out by <ref name="bjo:vuo84"><span style="font-variant-caps:small-caps">Bjorn, P.A.,  <span style="font-variant-caps:normal">and</span> Q.H. Vuong</span>  (1984): “Simultaneous  Equations Models for Dummy Endogenous Variables: A Game Theoretic Formulation  with an Application to Labor Force Participation” CIT working paper SSWP  537, California Institute of Technology, available at  [http://resolver.caltech.edu/CaltechAUTHORS:20170919-140310752 http://resolver.caltech.edu/CaltechAUTHORS:20170919-140310752].</ref>, however, the coherency conditions shut down exactly the social interaction effect of interest by requiring, e.g., that <math>\delta_1\delta_2=0</math>, so that at least one player's action has no impact on the other player's payoff.
The desire to learn about interaction effects coupled with the difficulties generated by multiplicity of equilibria prompted the earlier literature to provide at least two different ways to achieve point identification.
The first one relies on imposing simplifying assumptions that shift focus to outcome features that are common across equilibria.
For example, <ref name="bre:rei88"><span style="font-variant-caps:small-caps">Bresnahan, T.F.,  <span style="font-variant-caps:normal">and</span> P.C. Reiss</span>  (1988): “Do Entry  Conditions Vary Across Markets?” ''Brookings Papers on Economic  Activity'', pp. 833--871.</ref><ref name="bre:rei90"><span style="font-variant-caps:small-caps">Bresnahan, T.F.,  <span style="font-variant-caps:normal">and</span> P.C. Reiss</span>  (1990): “Entry in Monopoly Markets” ''The Review of  Economic Studies'', 57(4), 531--553.</ref><ref name="bre:rei91"><span style="font-variant-caps:small-caps">Bresnahan, T.F.,  <span style="font-variant-caps:normal">and</span> P.C. Reiss</span>  (1991): “Empirical models of discrete games”  ''Journal of Econometrics'', 48(1), 57--81.</ref> and <ref name="ber92"><span style="font-variant-caps:small-caps">Berry, S.T.</span>  (1992): “Estimation of a Model of Entry in the Airline  Industry” ''Econometrica'', 60(4), 889--917.</ref> study entry games where the number, though not the identities, of entrants is uniquely predicted by the model in equilibrium.
Unfortunately, however, these simplifying assumptions substantially constrain the amount of heterogeneity in player's payoffs that the model allows for.
The second approach relies on explicitly modeling a selection mechanism which specifies the equilibrium played in the regions of multiplicity.
For example, <ref name="bjo:vuo84"/> assume it to be a constant; <ref name="baj:hon:rya10"><span style="font-variant-caps:small-caps">Bajari, P., H.Hong,  <span style="font-variant-caps:normal">and</span> S.P. Ryan</span>  (2010):  “Identification and estimation of a discrete game of complete information”  ''Econometrica'', 78(5), 1529--1568.</ref> assume a more flexible, covariate dependent parametrization; and <ref name="ber92"/> considers two possible selection mechanism specifications, one where the incumbent moves first, and the other where the most profitable player moves first.
Unfortunately, however, the chosen selection mechanism can have non-trivial effects on inference, and the data and theory might be silent on which is more appropriate.
A nice example of this appears in <ref name="ber92"/>{{rp|at=Table VII}}.
<ref name="ber:tam06"/> review and extend a number of results on the identification of entry models extensively used in the empirical
literature.
<ref name="jov89"/> discusses the observable implications of models with multiple equilibria, and within the analysis of a model with homogeneous preferences shows that partial identification is possible (see <ref name="jov89"/>{{rp|at=p. 1435}}).
I refer to <ref name="pau13"><span style="font-variant-caps:small-caps">{\noopsort{Paula}}{de Paula}, A.</span>  (2013): “Econometric Analysis of  Games with Multiple Equilibria” ''Annual Review of Economics'', 5(1),  107--131.</ref> for a review of the literature on econometric analysis of games with multiple equilibria.
<ref name="cil:tam09"/> show, on the other hand, that it is possible to partially identify entry models that allow for rich heterogeneity in payoffs and for any possible selection mechanism (even ones that are arbitrarily dependent on the unobservable payoff shifters after conditioning on the observed payoff shifters).
In addition, <ref name="tam03"/> provides sufficient conditions for point identification based on exclusion restrictions and large support assumptions.
<ref name="kli:tam12"><span style="font-variant-caps:small-caps">Kline, B.,  <span style="font-variant-caps:normal">and</span> E.Tamer</span>  (2012): “Bounds for best response  functions in binary games” ''Journal of Econometrics'', 166(1), 92 --  105.</ref> analyze partial identification of nonparametric models of entry in a two-player model, drawing connections with the program evaluation literature.
'''Key Insight:'''<i>
An important conceptual contribution of <ref name="tam03"/> is to clarify the distinction between a model which is ''incoherent'', so that no reduced form exists, and a model which is ''incomplete'', so that multiple reduced forms may exist.
Models with multiple equilibria belong to the latter category.
Whereas the earlier literature in partial identification had been motivated by ''measurement problems'', e.g., missing or interval data, the work of <ref name="tam03"/> and <ref name="cil:tam09"/> is motivated by the fact that economic theory often does not specify how an equilibrium is selected in the regions of the exogenous variables which admit multiple equilibria.
This is a conceptually completely distinct identification problem.
</i>
<ref name="cil:tam09"/> propose to use simple and tractable implications of the model to learn features of the structural parameters of interest.
Specifically, they point out that the probability of observing any outcome of the game cannot be smaller than the model's implied probability that such outcome is the ''unique'' equilibrium of the game, and cannot be larger than the model's implied probability that such outcome is ''one of the possible'' equilibria of the game.
Looking at [[#fig:set_valued_pred:tam03|Figure]] this means, for example, that the observed <math>\sP((\ey_1,\ey_2)=(0,1)|\ex_1,\ex_2)</math> cannot be smaller than the probability that <math>(\eps_1,\eps_2)</math> realizes in the dotted region, and cannot be larger than the probability that it realizes either in the dotted region or in the gray region.
Compared to the model predicted distribution in \eqref{eq:games_model:pred}, this means that <math>\sP((\ey_1,\ey_2)=(0,1)|\ex_1,\ex_2)</math> cannot be smaller than the expression obtained setting, for <math>\eps\in\Eps_\vartheta[(1,0);(0,1);\ex]</math>, <math>\sR(0,1|\ex,\eps)=0</math>, and cannot be larger than that obtained with <math>\sR(0,1|\ex,\eps)=1</math>.
Denote by <math>\Phi(A_1,A_2;\rho)</math> the probability that the bivariate normal with mean vector zero, variances equal to one, and correlation <math>\rho</math> assigns to the event <math>\{\eps_1\in A_1,\eps_2\in A_2\}</math>.
Then <ref name="cil:tam09"/> show that any <math>\vartheta=[d_1,d_2,b_1,b_2,r]</math> that is observationally equivalent to the data generating value <math>\theta</math> satisfies, <math>(\ex_1,\ex_2)</math>-a.s.,
<math display="block">
\begin{align}
\sP((\ey_1,\ey_2)=(0,0)|\ex_1,\ex_2)&=\Phi((-\infty,-\ex_1b_1),(-\infty,-\ex_2b_2);r)\label{eq:CT_00}\\
\sP((\ey_1,\ey_2)=(1,1)|\ex_1,\ex_2)&=\Phi([-\ex_1b_1-d_1,\infty),[-\ex_2b_2-d_2,\infty);r)\label{eq:CT_11}\\
\sP((\ey_1,\ey_2)=(0,1)|\ex_1,\ex_2)&\le\Phi((-\infty,-\ex_1b_1-d_1),(-\ex_2b_2,\infty);r)\label{eq:CT_01U}\\
\sP((\ey_1,\ey_2)=(0,1)|\ex_1,\ex_2)&\ge\Big\{\Phi((-\infty,-\ex_1b_1-d_1),(-\ex_2b_2,\infty);r)\notag\\
&\quad\quad-\Phi((-\ex_1b_1,-\ex_1b_1-d_1),(-\ex_2b_2,-\ex_2b_2-d_2);r)\Big\}\label{eq:CT_01L}
\end{align}
</math>
While the approach of <ref name="cil:tam09"/> is summarized here for a two player entry game, it extends without difficulty to any finite number of players and actions and to solution concepts other than pure strategy Nash equilibrium.
<ref name="ara:tam08"><span style="font-variant-caps:small-caps">Aradillas-Lopez, A.,  <span style="font-variant-caps:normal">and</span> E.Tamer</span>  (2008): “The  Identification Power of Equilibrium in Simple Games” ''Journal of  Business & Economic Statistics'', 26(3), 261--283.</ref> build on the insights of <ref name="cil:tam09"/> to study what is the identification power of equilibrium in games.
To do so, they compare the set-valued model predictions and what can be learned about <math>\theta</math> when one assumes only level-<math>k</math> rationality as opposed to Nash play.
In static entry games of complete information, they find that the model's predictions when <math>k\ge 2</math> are similar to those obtained with Nash behavior and allowing for multiple equilibria and mixed strategies.
<ref name="mol:ros08"><span style="font-variant-caps:small-caps">Molinari, F.,  <span style="font-variant-caps:normal">and</span> A.M. Rosen</span>  (2008): “The Identification  Power of Equilibrium in Games: The Supermodular Case (Comment on  Aradillas-Lopez and Tamer, 2008)” ''Journal of Business and Economic  Statistics'', 26(3), 297--302.</ref> extend the analysis of <ref name="ara:tam08"/> to the class of supermodular games.
The collections of parameter vectors satisfying (in)equalities \eqref{eq:CT_00}-\eqref{eq:CT_01L} yields the sharp identification region <math>\idr{\theta}</math> in the case of two player entry games with pure strategy Nash equilibrium as solution concept, as shown by <ref name="ber:mol:mol11"><span style="font-variant-caps:small-caps">Beresteanu, A., I.Molchanov,  <span style="font-variant-caps:normal">and</span> F.Molinari</span>  (2011): “Sharp identification regions in models with  convex moment predictions” ''Econometrica'', 79(6), 1785--1821.</ref>{{rp|at=Supplementary Appendix D, Corollary D.4}}.
When there are more than two players or more than two actions (or with different solutions concepts, such as, e.g., mixed strategy Nash equilibrium; correlated equilibrium; or rationality of level <math>k</math> as in <ref name="ara:tam08"/>), the characterization in <ref name="cil:tam09"/> obtained by extending the reasoning just laid out yields an outer region.
<ref name="ber:mol:mol11"/> use elements of random set theory to provide a general and computationally tractable characterization of the identification region that is sharp, regardless of the number of players and actions, or the solution concept adopted.
For the case of PSNE with any finite number of players or actions, <ref name="gal:hen11"><span style="font-variant-caps:small-caps">Galichon, A.,  <span style="font-variant-caps:normal">and</span> M.Henry</span>  (2011): “Set Identification in Models with Multiple  Equilibria” ''The Review of Economic Studies'', 78(4), 1264--1298.</ref> provide a computationally tractable sharp characterization of the identification region using elements of optimal transportation theory.
====<span id="subsubsec:sharp:games"></span>Characterization of Sharpness through Random Set Theory====
<ref name="ber:mol:mol11"/> provide a general approach based on random set theory that delivers sharp identification regions on parameters of structural semiparametric models with set valued predictions.
Here I summarize it for the case of static, simultaneous move finite games of complete information, first with PSNE as solution concept and then with mixed strategy Nash equilibrium.
Then I discuss games of incomplete information.
For a given <math>\vartheta\in\Theta</math>, denote the set of pure strategy Nash equilibria (depicted in [[#fig:set_valued_pred:tam03|Figure]]) as <math>\eY_\vartheta(\ex,\eps)</math>.
It is easy to show that <math>\eY_\vartheta(\ex,\eps)</math> is a random closed set as in [[guide:379e0dcd67#def:rcs |Definition]].
Under the assumption in Identification [[#IP:entry_game |Problem]] that <math>\ey</math> results from simultaneous move, pure strategy Nash play, at the true DGP value of <math>\theta\in\Theta</math>, one has
<math display="block">
\begin{align}
\ey\in\eY_\theta\text{a.s.}\label{eq:y_in_Y_games}
\end{align}
</math>
Equation \eqref{eq:y_in_Y_games} exhausts the modeling content of Identification [[#IP:entry_game |Problem]].
[[guide:379e0dcd67#thr:artstein |Theorem]] can be leveraged to extract its empirical content from the observed distribution <math>\sP(\ey,\ex)</math>.
For a given <math>\vartheta\in\Theta</math> and <math>K\subset\cY</math>, let <math>\sT_{\eY_{\vartheta}(\ex,\eps)}(K;\Phi_r)</math> denote the probability of the event <math>\{\eY_\vartheta(\ex,\eps)\cap K\neq \emptyset\}</math> implied when <math>\eps\sim\Phi_r</math>, <math>\ex</math>-a.s.
{{proofcard|Theorem (Structural Parameters in Static, Simultaneous Move Finite Games of Complete Information with PSNE)|SIR:entry_game|
Under the assumptions of Identification [[#IP:entry_game |Problem]], the sharp identification region for <math>\theta</math> is
<math display="block">
\begin{align}
\idr{\theta}=\{\vartheta\in\Theta:\sP(\ey\in K|\ex)\le \sT_{\eY_{\vartheta}(\ex,\eps)}(K;\Phi_r)\,\forall K\subset\cY, \, \ex\text{-a.s.}\}.\label{eq:SIR:entry_game}
\end{align}
</math>|To simplify notation, let <math>\eY_\vartheta\equiv \eY_\vartheta(\ex,\eps)</math>.
In order to establish sharpness, it suffices to show that <math>\vartheta\in \idr{\theta}</math> if and only if one can complete the model with an admissible selection mechanism <math>\sR(y_1,y_2|\ex,\eps)</math> such that <math>\sR(y_1,y_2|\ex,\eps)\ge 0</math> for all <math>(y_1,y_2)\in\{0,1\}\times\{0,1\}</math>, <math>\sum_{(y_1,y_2)\in\{0,1\}\times\{0,1\}}\sR(y_1,y_2|\ex,\eps)=1</math>, and satisfying \eqref{eq:games:sel:mec:1}-\eqref{eq:games:sel:mec:2}, so that <math>\sM(y_1,y_2|\ex)=\sP(y_1,y_2|\ex)</math> for all <math>(y_1,y_2)\in\{0,1\}\times\{0,1\}</math> <math>\ex</math>-a.s., with <math>\sM(y_1,y_2|\ex)</math> defined in \eqref{eq:games_model:pred}.
Suppose first that <math>\vartheta</math> is such that a selection mechanism with these properties is available.
Then there exists a selection of <math>\eY_\vartheta</math> which is equal to the prediction selected by the selection mechanism and whose conditional distribution is equal to <math>\sP(\ey|\ex)</math>, <math>\ex</math>-a.s., and therefore <math>\vartheta \in \idr{\theta}</math>.
Next take <math>\vartheta \in \idr{\theta}</math>.
Then by [[guide:379e0dcd67#thr:artstein |Theorem]], <math>\ey</math> and <math>\eY_\vartheta</math> can be realized on the same probability space as random elements <math>\ey'</math> and <math>\eY'_\vartheta</math>, so that <math>\ey'</math> and <math>\eY'_\vartheta</math> have the same distributions, respectively, as <math>\ey</math> and <math>\eY_\vartheta</math>, and <math>\ey' \in \Sel(\eY'_\vartheta)</math>, where <math>\Sel(\eY'_\vartheta)</math> is the set of all measurable selections from <math>\eY'_\vartheta</math>, see [[guide:379e0dcd67#def:selection |Definition]].
One can then complete the model with a selection mechanism that picks <math>\ey'</math> with probability 1, and the result follows.}}
The characterization provided in Theorem [[#SIR:entry_game |SIR-]] for games with multiple PSNE, taken from <ref name="ber:mol:mol11"/>{{rp|at=Supplementary Appendix D}}, is equivalent to the one in <ref name="gal:hen11"/>.
When <math>J=2</math> and <math>\cY=\{0,1\}\times\{0,1\}</math>, the inequalities in \eqref{eq:SIR:entry_game} reduce to \eqref{eq:CT_00}-\eqref{eq:CT_01L}.
With more players and/or more actions, the inequalities in \eqref{eq:SIR:entry_game} are a superset of those in \eqref{eq:CT_00}-\eqref{eq:CT_01L}, with the latter comprised of the ones in \eqref{eq:SIR:entry_game} for <math>K=\{k\}</math> and <math>k=\cY\setminus\{k\}</math>, for all <math>k\in\cY</math>.
Hence, the inequalities in \eqref{eq:SIR:entry_game} are more informative.
Of course, the computational cost incurred to characterize <math>\idr{\theta}</math> may grow with the number of inequalities involved.
I discuss computational challenges in partial identification in [[guide:A85a6b6ff1#sec:computations |Section]].
'''Key Insight:'''(Random set theory and partial identification -- continued)<i>
In Identification [[#IP:entry_game |Problem]] lack of point identification can be traced back to the set valued predictions delivered by the model, which in turn derive from the model incompleteness defined by <ref name="tam03"/>.
As stated in the Introduction, constructing the (random) set of model predictions delivered by the maintained assumptions is an exercise typically carried out in identification analysis, regardless of whether random set theory is applied.
Indeed, for the problem studied in this section, <ref name="tam03"/>{{rp|at=Figure 1}} put forward the set of admissible outcomes of the game.
<ref name="ber:mol:mol11"/> propose to work directly with this random set to characterize <math>\idr{\theta}</math>.
The fundamental advantage of this approach is that it dispenses with considering the possible selection mechanisms that may complete the model.
Selection mechanisms may depend on the model's unobservables even after conditioning on observables and may constitute an infinite dimensional nuisance parameter, which creates great difficulties for the computation of <math>\idr{\theta}</math> and for inference.
</i>
Next, I discuss the case that the outcome of the game results from simultaneous move, mixed strategy Nash play.<ref group="Notes" >The same reasoning given here applies if instead of mixed strategy Nash the solution concept is correlated equilibrium, by replacing the set of MSNE below with the set of correlated equilibria.</ref>
When mixed strategies are allowed for, the model predicts multiple mixed strategy Nash equilibria (MSNE).
But whereas when only pure strategies are allowed for, if the model is correctly specified, the observed outcome of the game is one of the predicted PSNE, with mixed strategy it is only the result of a random mixing draw from one of the predicted MSNE.
Hence, the identification problem is more complex, and in order to obtain a tractable characterization of <math>\theta</math>'s sharp identification region one needs to use different tools from random set theory.
To keep the treatment simple here I continue to consider the case of two players with two strategies, as in Identification [[#IP:entry_game |Problem]], with mixed strategies allowed for, and refer to <ref name="mol:mol18"/>{{rp|at=Section 3.4}} for the general case.
Fix <math>\vartheta\in\Theta</math>.
Let <math>\sigma_j:\{0,1\}\to [0,1]</math> denote the probability that player <math>j</math> enters the market, with <math>1-\sigma_j</math> the probability that she stays out.
With some abuse of notation, let <math>\bu_j(\sigma_j,\sigma_{-j},\ex_j,\eps_j,\vartheta)</math> denote the expected payoff associated with the mixed strategy profile <math>\sigma=(\sigma_1,\sigma_2)</math>.
For a given realization <math>(x,e)</math> of <math>(\ex,\eps)</math> and a given value of <math>\vartheta\in\Theta</math>, the set of mixed strategy Nash equilibria is
<math display="block">
\begin{multline*}
S_\vartheta(x,e)
=\bigg\{\sigma \in [0,1]^2:\;
\bu_j(\sigma_j,\sigma_{-j},x_j,e_j;\vartheta)
\geq \bu_j(\tilde{\sigma}_j,\sigma_{-j},x_j,e_j;\vartheta)\;\,
\forall \tilde{\sigma}_j\in [0,1]\; j=1,2\bigg\}.
\end{multline*}
</math>
<ref name="ber:mol:mol11"/> show that <math>\eS_\vartheta\equiv S_\vartheta(\ex,\eps)</math> is a random closed set in <math>[0,1]^2</math>.
Its realizations are illustrated in Panel (a) of [[#fig:set_valued_pred:MSNE|Figure]] as a function of <math>(\eps_1,\eps_2)</math>.<ref group="Notes" >This figure is based on Figure 1 in {{ref|name=ber:mol:mol11}}.</ref>
<div id="fig:set_valued_pred:MSNE" class="d-flex justify-content-center">
[[File:guide_d9532_fig_set_valued_pred_MSNE.png | 700px | thumb | MSNE strategies (<math>\eS_\vartheta</math>), set of multinomial distributions over outcomes of the game (<math>\eQ_\vartheta</math>), and its support function (<math>h_{\eQ_\vartheta}</math>), as a function of <math>(\eps_1,\eps_2)</math>, where <math>\sigma_1^*\equiv\frac{-\eps_2-\ex_2\beta_2}{\vartheta_2},\sigma_2^*\equiv\frac{-\eps_1-\ex_1\beta_1}{\vartheta_1}</math>. ]]
</div>
Define the set of possible multinomial distributions over outcomes of the game associated with the selections <math>\sigma</math> of each possible realization of <math>\eS_{\vartheta}</math> as
<math display="block">
\begin{equation}
\label{eq:Q-set}
\eQ_\vartheta=\left\{\eq(\sigma)\equiv \begin{bmatrix}
(1-\sigma_1)(1-\sigma_2)\\
\sigma_1(1-\sigma_2)\\
(1-\sigma_1)\sigma_2\\
\sigma_1\sigma_2
\end{bmatrix}:\, \sigma \in \eS_\vartheta\right\}.
\end{equation}
</math>
As <math>\eQ_\vartheta</math> is the image of a continuous map applied to the random compact set <math>\eS_\vartheta</math>, it is a random compact set.
Its realizations are plotted in Panel (b) of [[#fig:set_valued_pred:MSNE|Figure]] as a function of <math>(\eps_1,\eps_2)</math>.
The multinomial distribution over outcomes of the game determined by a given <math>\sigma\in\eS_\vartheta</math> is a function of <math>\eps</math>.
To obtain the predicted distribution over outcomes of the game conditional on observed payoff shifters only, one needs to integrate out the unobservable payoff shifters <math>\eps</math>.
Doing so requires care, as it needs to be done for each <math>\eq(\sigma)\in\eQ_\vartheta</math>.
First, observe that all the <math>\eq(\sigma)\in\eQ_\vartheta</math> are contained in the <math>3</math> dimensional unit simplex, and are therefore integrable.
Next, define the conditional selection expectation (see [[guide:379e0dcd67#def:sel-exp |Definition]]) of <math>\eQ_\vartheta</math> as
<math display="block">
\E_{\Phi_r}(\eQ_\vartheta|\ex)=\Big\{\E_{\Phi_r}(\eq(\sigma)|\ex):\;
\sigma \in \Sel(\eS_\vartheta)\Big\},
</math>
where <math>\Sel(\eS_\vartheta)</math> is the set of all measurable selections from <math>\eS_\vartheta</math>, see [[guide:379e0dcd67#def:selection |Definition]].
By construction, <math>\E_{\Phi_r}(\eQ_\vartheta|\ex)</math> is the set of probability distributions over action profiles conditional on <math>\ex</math> which are
consistent with the maintained modeling assumptions, i.e., with ''all'' the model's implications (including the assumption that <math>\eps\sim\Phi_r</math>).
If the model is correctly specified, there exists at least one vector <math>\theta\in\Theta</math> such that the observed conditional distribution <math>\cp(\ex)\equiv[\sP(\ey=y^1|\ex),\dots,\sP(\ey=y^4|\ex)]^\top</math> almost surely belongs to the set <math>\E_{\Phi_\rho}(\eQ_\theta|\ex)</math>. 
Indeed, by the definition of <math>\E_{\Phi_\rho}(\eQ_\theta|\ex)</math>, <math>\cp(\ex)\in \E_{\Phi_\rho}(\eQ_\theta|\ex)</math> almost surely if and only if there exists <math>\eq\in \Sel(\eQ_\theta)</math> such that <math>\E_{\Phi_\rho}(\eq|\ex)=\cp(\ex)</math> almost surely, with <math>\Sel(\eQ_\theta)</math> the set of all measurable selections from <math>\eQ_\theta</math>.
Hence, the collection of parameter vectors <math>\vartheta\in\Theta</math> that are observationally equivalent to the data generating value <math>\theta</math> is given by the ones that satisfy <math>\cp(\ex)\in \E_{\Phi_r}(\eQ_\vartheta|\ex)</math> almost surely.
In turn, observing that by [[guide:379e0dcd67#thr:exp-supp |Theorem]] the set <math>\E_{\Phi_r}(\eQ_\vartheta|\ex)</math> is convex, we have that <math>\cp(\ex)\in \E_{\Phi_r}(\eQ_\vartheta|\ex)</math> if and only if <math>u^\top \cp(\ex)\leq h_{\E_{\Phi_r}(\eQ_\vartheta|\ex)}(u)</math> for all <math>u</math> in the unit ball (see, e.g., <ref name="roc70"><span style="font-variant-caps:small-caps">Rockafellar, R.</span>  (1970): ''Convex Analysis'', Princeton landmarks  in mathematics and physics. Princeton University Press.</ref>{{rp|at=Theorem 13.1}}), where <math>h_{\E_{\Phi_r}(\eQ_\vartheta|\ex)}(u)</math> is the support function of <math>\E_{\Phi_r}(\eQ_\vartheta|\ex)</math>, see [[guide:379e0dcd67#def:sup-fun |Definition]].
{{proofcard|Theorem (Structural Parameters in Static, Simultaneous Move Finite Games of Complete Information with MSNE)|SIR:sharpness_mixed|Under the assumptions in Identification [[#IP:entry_game |Problem]], allowing for mixed strategies and with the observed outcomes of the game resulting from mixed strategy Nash play, the sharp identification region for <math>\theta</math> is
<math display="block">
\begin{align}
  \idr{\theta} &=\bigg\{\vartheta\in \Theta:\; \max_{u\in\mathbb{B}^{|\cY|}}\left( u^\top \cp(\ex)
  -\E_{\Phi_r}[h_{\eQ_\vartheta}(u)|\ex]\right)=0,\, \ex\text{-a.s.}\bigg\} 
  \label{eq:SIR_sharp_mixed_sup}\\
  &=\bigg\{\vartheta \in \Theta:\; \int_{\mathbb{B}^{|\cY|}} (u^\top \cp(\ex)
  -\E_{\Phi_r}[h_{\eQ_\vartheta}(u)|\ex])_+ \mathrm{d}\mu(u)=0,\, \ex\text{-a.s.}\bigg\}
  \label{eq:SIR_sharp_mixed_int},
\end{align}
</math>
where <math>\mu</math> is any probability measure on <math>\mathbb{B}^{|\cY|}</math>, and <math>|\cY|=4</math> in this case.|[[guide:379e0dcd67#thr:exp-supp |Theorem]] (equation [[guide:379e0dcd67#eq:dom_Aumann:cond |eq:dom_Aumann:cond]]) yields \eqref{eq:SIR_sharp_mixed_sup}, because by the arguments given before the theorem, <math>\idr{\theta}=\{\vartheta \in \Theta:\;\cp(\ex)\in \E_{\Phi_r}(\eQ_\vartheta|\ex),\ex\text{-a.s.}\}</math>.
The result in \eqref{eq:SIR_sharp_mixed_int} follows because the integrand in \eqref{eq:SIR_sharp_mixed_int} is continuous in <math>u</math> and both conditions inside the curly brackets are satisfied if and only if <math>u^\top \cp(\ex)-\E_{\Phi_r}[h_{\eQ_\vartheta}(u)|\ex]\leq 0</math> <math>\forall u\in \mathbb{B}^{|\cY|}</math> <math>\ex</math>-a.s.}}
For a fixed <math>u\in\mathbb{B}^4</math>, the possible realizations of <math>h_{\eQ_\vartheta}(u)</math> are plotted in Panel (c) of [[#fig:set_valued_pred:MSNE|Figure]] as a function of <math>(\eps_1,\eps_2)</math>.
The expectation of <math>h_{\eQ_\vartheta}(u)</math> is quite straightforward to compute, whereas calculating the set <math>\E_{\Phi_r}(\eQ_\vartheta|\ex)</math> is computationally prohibitive in many cases.
Hence, the characterization in \eqref{eq:SIR_sharp_mixed_sup} is computationally attractive, because for each <math>\vartheta\in\Theta</math> it requires to maximize an easy-to-compute superlinear, hence concave, function over a convex set, and check if the resulting objective value vanishes. 
Several efficient algorithms in convex programming are available to solve this problem, see for example the MatLab software for disciplined convex programming CVX <ref name="gra:boy10"><span style="font-variant-caps:small-caps">Grant, M.,  <span style="font-variant-caps:normal">and</span> S.Boyd</span>  (2010): “{CVX}: Matlab Software for  Disciplined Convex Programming, Version 1.21” available at  [http://cvxr.com/cvx http://cvxr.com/cvx].</ref>.
Nonetheless, <math>\idr{\theta}</math> itself is not necessarily convex, hence tracing out its boundary is non-trivial.
I return to computational challenges in partial identification in [[guide:A85a6b6ff1#sec:computations |Section]].\medskip
'''Key Insight: Random set theory and partial identification -- continued'''<i>
<ref name="ber:mol:mol11"/> provide a general characterization of sharp identification regions for models with ''convex moment predictions''.
These are models that for a given <math>\vartheta\in\Theta</math> and realization of observable variables, predict a set of values for a vector of variables
of interest.
This set is ''not'' necessarily convex, as exemplified by <math>\eY_\vartheta</math> and <math>\eQ_\vartheta</math>, which are finite.
No restriction is placed on the manner in which, in the DGP, a specific model prediction is selected from this set.
When the researcher takes conditional expectations of the resulting elements of this set, the unrestricted process of selection yields a convex set of moments for the model variables (all possible mixtures).
This is the model's convex set of moment predictions.
If this set were almost surely single valued, the researcher would learn (features of) <math>\theta</math> by solving moment equality conditions involving the observed variables and predicted ones.
The approach reviewed in this section is a set-valued method of moments that extends the singleton-valued one commonly used in econometrics.
</i>
I conclude this section discussing the case of static, simultaneous move finite games of incomplete information, using the results in <ref name="ber:mol:mol11"/>{{rp|at=Supplementary Appendix C}}.<ref group="Notes" >See {{ref|name=ber:tam06}}{{rp|at=Section 3}} and {{ref|name=pau13}} for a thorough discussion of the literature on identification problems in games of incomplete information with multiple Bayesian Nash equilibria (BNE).
{{ref|name=ber:tam06}} explain how to extend the approach proposed by {{ref|name=cil:tam09}} to obtain outer regions on <math>\theta</math> when no restrictions are imposed on the equilibrium selection mechanism that chooses among the multiple BNE.</ref>
For clarity, I formalize the maintained assumptions.
{{proofcard|Identification Problem (Structural Parameters in Static, Simultaneous Move Finite Games of Incomplete Information with multiple BNE)|IP:entry_game:incomplete|Impose the same structure on payoffs, entry decision rule, outcome space, parameter space, and observable variables as in Identification [[#IP:entry_game |Problem]].
Assume that the observed outcome of the game results from simultaneous move, pure strategy Bayesian Nash play.
Both players and the researcher observe <math>(\ex_1,\ex_2)</math>.
However, <math>\eps_j</math> is private information to player <math>j=1,2</math> and unobservable to the researcher, with <math>\eps_1\independent\eps_2|(\ex_1,\ex_2)</math>.
Assume that players have correct common prior <math>\sF_\gamma</math> on the distribution of <math>(\eps_1,\eps_2)</math> and the researcher knows this distribution up to <math>\gamma</math>, a finite dimensional parameter vector.
Under these assumptions, multiple Bayesian Nash equilibria (BNE) may result.<ref group="Notes" >Both the independence assumption and the correct common prior assumption are maintained here to simplify exposition.
Both could be relaxed with no conceptual difficulty, though computation of the set of Bayesian Nash equilibria, for example, would become more cumbersome.</ref>
In the absence of additional information, what can the researcher learn about <math>\theta=[\delta_1\delta_2\beta_1\beta_2\gamma]</math>?
|}}
With incomplete information, players' strategies are decision rules that map the support of <math>(\eps,\ex)</math> into <math>\{0,1\}</math>.
The non-negativity condition on expected payoffs that determines each player's decision to enter the market results in equilibrium mappings (decision rules) that are step functions determined by a threshold: <math>y_j(\eps_j) =\one(\eps_j\geq t_j), j=1,2</math>.
As a result, player <math>j</math>'s beliefs about player <math>3-j</math>'s probability of entry under the common prior assumption is <math>\int y_{3-j}(\eps_{3-j}) d\sF_\gamma(\eps_{3-j}|\ex) =1-\sF_\gamma(t_{3-j}|\ex)</math>, and therefore player <math>j</math>'s best response cutoff is
<math display="block">
\begin{align*}
t_j^b(t_{3-j},\ex;\theta)=-\ex_j\beta_j-\delta_j(1-\sF_\gamma(t_{3-j}|\ex)).
\end{align*}
</math>
Hence, the set of equilibria can be defined as the set of cutoff rules:
<math display="block">
\begin{equation*}
\eT_{\theta}(\ex)=\left\{(t_1,t_2\right):t_j=t_j^b(t_{3-j},\ex;\theta),j=1,2\}.
\end{equation*}
</math>
The equilibrium thresholds are functions of <math>\ex</math> and <math>\theta</math> only.
The set <math>\eT_{\theta}(\ex)</math> might contain a finite number of equilibria (e.g., if the common prior is the Normal distribution), or a continuum of equilibria.
For ease of notation I suppress its dependence on <math>\ex</math> in what follows.
Given the equilibrium decision rules (the selections of the set <math>\eT_\theta</math>), it is possible to determine their associated action profiles.
Because in the simple two-player entry game that I consider actions and outcomes coincide, I denote the set of admissible action profiles by <math>\eY_\theta</math>:
<math display="block">
\begin{align}
\eY_\theta=\left\{
\ey(\et)\equiv
\begin{bmatrix}
\one(\eps_1 < \et_1,\eps_2 < \et_2)\\
\one(\eps_1\ge\et_1,\eps_2 < \et_2)\\
\one(\eps_1 < \et_1,\eps_2\ge\et_2)\\
\one(\eps_1\ge\et_1,\eps_2\ge\et_2)
\end{bmatrix}
:\et\in\Sel(\eT_\theta)
\right\},\label{eq:q_incomplete}
\end{align}
</math>
with <math>\Sel(\eT_\theta)</math> the set of all measurable selections from <math>\eT_\theta</math>, see [[guide:379e0dcd67#def:selection |Definition]].
To obtain the predicted set of multinomial distributions for the outcomes of the game, one needs to integrate out <math>\eps</math> conditional on <math>\ex</math>.
Again this can be done by using the conditional Aumann expectation:
<math display="block">
\begin{equation*}
\E_{\sF_\gamma}(\eY_\theta|\ex)=\{\E_{\sF_\gamma}(\ey(\et)|\ex):\et\in\Sel(\eT_\theta)\}.
\end{equation*}
</math>
This set is closed and convex.
Regardless of whether <math>\eT_\theta</math> contains a finite number of equilibria or a continuum, <math>\eY_\theta</math> can take on only a finite number of
realizations corresponding to each of the vertices of the three dimensional simplex, because the vectors <math>\ey(\et)</math> in \eqref{eq:q_incomplete} collect threshold decision rules.
This implies that <math>\E_{\sF_\gamma}(\eY_\theta|\ex)</math> is a closed convex polytope <math>\ex</math>-a.s., fully characterized by a finite number of supporting hyperplanes.
Hence, it is possible to determine whether <math>\vartheta\in\idr{\theta}</math> using efficient algorithms in linear programming.
{{Proofcard|Theorem (Structural Parameters in Static, Simultaneous Move Finite Games of Incomplete Information with BNE)|SIR:incomplete_info|Under the assumptions in Identification [[#IP:entry_game:incomplete |Problem]], the sharp identification region for <math>\theta</math> is
<math display="block">
\begin{align}
  \idr{\theta} &=\bigg\{\vartheta\in \Theta:\; \max_{u\in\mathbb{B}^{|\cY|}} u^\top \cp(\ex)
  -\E_{\sF_{\tilde\gamma}}[h_{\eY_\vartheta}(u)|\ex]=0,\, \ex\text{-a.s.}\bigg\} \label{eq:SIR:incomplete_info:1} \\
  &=\bigg\{\vartheta\in \Theta:\; u^\top \cp(\ex)
  \le \E_{\sF_{\tilde\gamma}}[h_{\eY_\vartheta}(u)|\ex],\,\forall u\in D, \ex\text{-a.s.}\bigg\},\label{eq:SIR:incomplete_info:2} \\
  &=  \bigg\{\vartheta\in \Theta:\; \sP(\ey\in K|\ex)\le \sT_{\eY_{\vartheta}(\ex,\eps)}(K;\sF_{\tilde\gamma})\,\forall K\subset\cY,\, \ex\text{-a.s.}\bigg\} \label{eq:SIR:incomplete_info:0},
\end{align}
</math>
with <math>D=\{u=[u_1,\dots,u_{|\cY|}]^\top:u_i\in\{0,1\},i=1,...,|\cY|\} </math>, <math>\vartheta=[d_1,d_2,b_1,b_2,\tilde\gamma]</math>, and <math>\sT_{\eY_{\vartheta}(\ex,\eps)}(K;\sF_{\tilde\gamma})</math> the probability that <math>\{\eY_\vartheta(\ex,\eps)\cap K\neq \emptyset\}</math> implied when <math>\eps\sim\sF_{\tilde\gamma}</math>, <math>\ex</math>-a.s.|The result in \eqref{eq:SIR:incomplete_info:1} follows by the same argument as in the proof of Theorem [[#SIR:sharpness_mixed |SIR-]].
Next I show equivalence of the conditions
<math display="block">
\begin{align*}
(i)&u^\top\cp(\ex)\le\E_{\sF_{\tilde\gamma}}[h_{\eY_\vartheta}(u)|\ex]\forall u\in\mathbb{B}^{|\cY|}, \\
(ii)&u^\top\cp(\ex)\le\E_{\sF_{\tilde\gamma}}[h_{\eY_\vartheta}(u)|\ex]\forall u\in D.
\end{align*}
</math>
By the positive homogeneity of the support function, condition <math>(i)</math> is equivalent to <math>\cp(\ex)\le\E_{\sF_{\tilde\gamma}}[h_{\eY_\vartheta}(u)|\ex]\forall u\in\R^{|\cY|}</math>, which implies condition <math>(ii)</math>.
Next I show that condition <math>(ii)</math> implies condition <math>(i)</math>.
As explained before, the set <math>\eY_\theta</math>, and hence also its convex hull <math>\conv(\eY_\theta)</math>, can take on only a finite number of realizations.
Let <math>Y_1,\dots,Y_m</math> be convex compact sets in the simplex of dimension <math>|\cY|-1</math> equal to the possible realizations of <math>\conv(\eY_\theta)</math>, and let <math>\varpi_1(\ex),\dots,\varpi_m(\ex)</math> denote the probability of each of these realizations conditional on <math>\ex</math>.
Then by Theorem 2.1.34 in <ref name="mo1"><span style="font-variant-caps:small-caps">Molchanov, I.</span>  (2017): ''Theory of Random Sets''. Springer, London,  2 edn.</ref>, <math>\E_{\sF_{\tilde\gamma}}(\eY_\theta|\ex)=\sum_{j=1}^m Y_j\varpi_j(\ex)</math>.
By the properties of the support function (see, e.g., <ref name="sch93"><span style="font-variant-caps:small-caps">Schneider, R.</span>  (1993): ''Convex Bodies: The Brunn-Minkowski  Theory'', Encyclopedia of Mathematics and its Applications. Cambridge  University Press, 1 edn.</ref>{{rp|at=Theorem 1.7.5}}), <math>h_{\E_{\sF_{\tilde\gamma}}(\eY_\theta|\ex)}(u) =\sum_{j=1}^m \varpi_j(\ex)h_{Y_j}(u)</math>.
For each <math>j=1,...,m,</math> the vertices of <math>Y_j</math> are a subset of the vertices of the <math>(|\cY|-1)</math>-dimensional simplex.
Hence the supporting hyperplanes of <math>Y_j,j=1,...,m</math>, are a subset of the supporting hyperplanes of that simplex, which in turn are obtained through its support function evaluated in directions <math>u\in D</math>.
Finally, I show equivalence with the result in \eqref{eq:SIR:incomplete_info:0}.
Because the vertices of <math>Y_j</math> are a subset of the vertices of the <math>(|\cY|-1)</math>-dimensional simplex, each direction <math>u\in D</math> determines a set <math>K_u\subset \cY</math>.
Given the choice of <math>u</math>, the value of <math>u^\top\ey(\et)</math> equals one if <math>\ey(\et)\in K_u</math> and zero otherwise.
Hence, condition \eqref{eq:SIR:incomplete_info:2} reduces to
<math display="block">
\begin{align*}
\sP(\ey\in K_u|\ex) = u^\top \cp(\ex) &\le \E_{\sF_{\tilde\gamma}}[h_{\eY_\vartheta}(u)|\ex] = \E_{\sF_{\tilde\gamma}}\left[\sup_{\ey(\et)\in\eY_\vartheta}u^\top\ey(\et)|\ex\right] \\
&= \E_{\sF_{\tilde\gamma}}[\one(\eY_\vartheta\cap K_u\neq \emptyset)|\ex]=\sT_{\eY_{\vartheta}(\ex,\eps)}(K_u;\sF_{\tilde\gamma}).
\end{align*}
</math>
Observing that the collection <math>D</math> comprises the <math>2^{|\cY|}</math> vectors with entries equal to either 1 or 0, and that these determine all possible subsets <math>K_u</math> of <math>\cY</math>, yields condition \eqref{eq:SIR:incomplete_info:0}.}}
One can use the same argument as in the proof of Theorem [[#SIR:incomplete_info |SIR-]], to show that the Aumann expectation/support function characterization of the sharp identification region in Theorem [[#SIR:sharpness_mixed |SIR-]] coincides with the characterization based on the capacity functional in Theorem [[#SIR:entry_game |SIR-]], when only pure strategies are allowed for.
This shows that in this class of models, the capacity functional based characterization is a special case of the Aumann expectation/support function based one.
<ref name="ara:tam08"/> study what is the identification power of equilibrium also in the case of static entry games with incomplete information.
They show that in the presence of multiple equilibria, assuming Bayesian Nash behavior yields more informative regions for the parameter vector <math>\theta</math> than assuming only rational behavior, but at the price of a higher computational cost.
<ref name="pau:tan12"><span style="font-variant-caps:small-caps">{\noopsort{Paula}}{de Paula}, A.,  <span style="font-variant-caps:normal">and</span> X.Tang</span>  (2012):  “Inference of Signs of Interaction Effects in Simultaneous Games With  Incomplete Information” ''Econometrica'', 80(1), 143--172.</ref> propose a procedure to test for the sign of the interaction effects (which here I have assumed to be non-positive) in discrete simultaneous games with incomplete information and (possibly) multiple equilibria.
As a by-product of this procedure, they also provide a test for the presence of multiple equilibria in the DGP.
The test does not require parametric specifications of players' payoffs, the distributions of their private signals, or the equilibrium selection mechanism.
Rather, the test builds on the commonly invoked assumption that players' private signals are independent conditional on observed states.
<ref name="gri14"><span style="font-variant-caps:small-caps">Grieco, P. L.E.</span>  (2014): “Discrete games with flexible information  structures: an application to local grocery markets” ''The RAND Journal  of Economics'', 45(2), 303--340.</ref> introduces an important class of models with flexible information structure.
Each player is assumed to have a vector of payoff shifters unobservable by the researcher composed of elements that are private information to the player, and elements that are known to all players.
The results of <ref name="ber:mol:mol11"/> reported in this section apply to this set-up as well.
===<span id="subsec:auctions"></span>Auction Models with Independent Private Values===
====<span id="subsubsec:HT"></span>An Inference Approach Robust to Bidding Behavior Assumptions====
<ref name="hai:tam03"/> study what can be learned about the distribution of valuations in an open outcry English auction where symmetric bidders have independent private values for the object being auctioned.
The standard theoretical model <ref name="mil:web82"><span style="font-variant-caps:small-caps">Milgrom, P.R.,  <span style="font-variant-caps:normal">and</span> R.J. Weber</span>  (1982): “A Theory of  Auctions and Competitive Bidding” ''Econometrica'', 50(5), 1089--1122.</ref>, called “button auction” model, posits that each bidder holds down a button while the object’s price rises continuously and exogenously, releasing it (in the dominant strategy equilibrium) when it reaches her valuation or all her opponents have left.
In this case, the distribution of bidder's valuation can be learned exactly.
<ref name="hai:tam03"/> show that much can be learned about the distribution of valuations, even allowing for the fact that real-life auctions may depart from this stylized framework, as in the following identification problem.<ref group="Notes" >Examples of departures from the standard model include the case where active bidding by a player's opponents may eliminate her incentives to bid close to her valuation or at all; the econometrician does not precisely observe the point at which each bidder drops out; there are discrete bid increments; etc.
</ref>
\begin{IP}[Incomplete Auction Model with Independent Private Values]\label{IP:auction}
For a given auction with <math>n < \infty</math> participating bidders, let <math>\ev_i\sim\sQ,i=1,\dots,n,</math> be bidder <math>i</math>'s valuation for the object being auctioned and assume that <math>\ev_i\independent \ev_j</math> for all <math>i\neq j</math>.
Assume that the support of <math>\sQ</math> is <math>[\underline{v},\bar{v}]</math> and that each bidder knows her own valuation but not that of her opponents.
Let the auctioneer set a minimum bid increment <math>\delta\in [0,\bar{v})</math>, and for simplicity suppose there is no reserve price.<ref group="Notes" >If there is a reserve price <math>r > \underline{v}</math>, nothing can be learned about <math>\sQ(\ev\in [\underline{v},v])</math> for any <math>v < r</math>.
In that case, one can learn features of the truncated distribution of valuations using the same insights summarized here.</ref>
Suppose the researcher observes  order statistics of the bids, <math>\vec{\eb}_n\equiv(\eb_{1:n},\dots,\eb_{n:n})\sim\sP</math> in <math>\R^n_+</math>, with <math>\eb_{i:n}</math> the <math>i</math>-th lowest of the <math>n</math> bids.
Assume that: (1) Bidders do not bid more than they are willing to pay; (2) Bidders do not allow an opponent to win at a price they are willing to beat.
In the absence of additional information, what can the researcher learn about <math>\sQ</math>?
|}}
<div id="fig:auction" class="d-flex justify-content-center">
[[File:guide_d9532_fig_auction.png | 700px | thumb | A realization of the model predicted ordered bids <math>\eB(\vec{\ev}_n)</math> in \eqref{eq:RCS_auction} for <math>n=3,\vec{\ev}_n=v^0,\delta=0</math>. ]]
</div>
The model in Identification [[#IP:auction |Problem]] delivers set valued predictions because given valuations <math>(\ev_1,\dots,\ev_n)</math>, the two fundamental assumptions about bidder's behavior yield
<math display="block">
\begin{align}
\vec{\eb}_n \in \eB(\vec{\ev}_n)\equiv\left[\left\{\prod_{i=1}^{n-1}[\underline{v},\ev_{i:n}]\right\}\times [\ev_{n-1:n}-\delta,\ev_{n:n}]\right]\cap V_n,\label{eq:RCS_auction}
\end{align}
</math>
where <math>\vec{\ev}_n\equiv(\ev_{1:n},\dots,\ev_{n:n})</math> denotes the vector of order statistics of the valuations, and <math>V_n=\{v\in\R^n:\underline{v}\le v_1\le v_2\le\dots\le v_n\le \bar{v}\}</math>.<ref group="Notes" >Using the same convention as for the bids, <math>\ev_{i:n}</math> denotes the <math>i</math>-th lowest of the <math>n</math> valuations.</ref>
[[#fig:auction|Figure]] provides a stylized depiction of a realization of this set for <math>\vec{\ev}_n=v^0</math> when there are three bidders (<math>n=3</math>), <math>\underline{v}=0</math>, and <math>\delta=0</math>.
In words, <math>\eB(\vec{\ev}_n)</math> collects the model predicted values of ordered bids.
The fact that <math>\eb_{i:n}\le \ev_{i:n}</math> for all <math>i</math> results from assumption (1): since each bidder bids at most an amount equal to her valuation, the <math>i</math>-th highest bid cannot exceed the <math>i</math>-th highest valuation <ref name="hai:tam03"/>{{rp|at=Lemma 1}}.<ref group="Notes" >Note that <math>\eb_{i:n}</math> needs not be the bid made by the bidder with valuation <math>\ev_{i:n}</math>.</ref>
The fact that <math>\eb_{n:n}\ge \ev_{n-1,n}-\delta</math> follows immediately from assumption (2) <ref name="hai:tam03"/>{{rp|at=Lemma 3}}.
The fact that <math>\vec{\eb}_n</math> has to lie in <math>V_n</math> follows because it is a vector of ''ordered'' bids.
Why does this set-valued prediction hinder point identification?
The reason is that the distribution of the observable data relates to the model structure in an ''incomplete'' manner.<ref group="Notes" >{{ref|name=hai:tam03}}{{rp|at=Appendix D}} provide the discussion summarized here. Additionally, in their Appendix B, they give a simple example of a two-bidder auction satisfying all assumptions in Identification [[#IP:auction |Problem]], where two different distributions <math>\sQ</math> and <math>\tilde{\sQ}</math> yield the same distribution of ordered bids.</ref>
Define a bidding rule <math>\sB(\eb_{1:n},\dots,\eb_{n:n}|\ev_{1:n},\dots,\ev_{n:n})</math> to be a conditional joint distribution for the order statistics of the bids conditional on the order statistics of the valuations.
Then, for a given realization of the valuations <math>\ev_{1:n}=v_1,\dots,\ev_{n:n}=v_n</math>, the model requires that the support of <math>\sB(\cdot|v_1,\dots,v_n)</math> is in <math>B(\vec{v})</math> as defined in \eqref{eq:RCS_auction} with <math>\ev_{1:n}=v_1,\dots,\ev_{n:n}=v_n</math>, but imposes no other restriction on it.
Hence, the model implied joint distribution of ordered bids is
<math display="block">
\begin{align}
\sM_{1,\dots,n:n}(\cdot;\sB,\sQ)\equiv\int \sB(\cdot|v_1,\dots,v_n)\sQ_{1,\dots,n:n}(dv_1,\dots,dv_n),\label{eq:model:impl_sel_mech_auction}
\end{align}
</math>
where <math>\sQ_{1,\dots,n:n}</math> is the joint distribution of order statistics of the valuations implied by <math>\sQ</math>.
Since the bidding rule <math>\sB</math> is left completely unspecified (other than requiring it to be a valid joint conditional probability distribution with support in <math>\eB</math>), one can find multiple pairs <math>(\sB ,\sQ)</math> satisfying the assumptions of Identification [[#IP:auction |Problem]], such that <math>\sM_{1,\dots,n:n}(\cdot;\sB,\sQ)=\sG_{1,\dots,n:n}(\cdot)</math>, with <math>\sG_{1,\dots,n:n}</math> the observed joint CDF of the order statistics of the bids associated with <math>\sP</math>.
<ref name="hai:tam03"/> propose to use simple and tractable implications of the model to learn features of <math>\sQ</math>.
Recall that with i.i.d. valuations, the distribution of each order statistic uniquely determines <math>\sQ(v)</math>, with <math>\sQ(v)\equiv\sQ(\ev\le v)</math> for any <math>v\ge\underline{v}</math>, through:
<math display="block">
\begin{align}
\sQ(v)=\sq_{\cB}(\sQ_{i:n}(v);i,n-i+1),\label{eq:HT:beta}
\end{align}
</math>
where <math>\sQ_{i:n}</math> is the CDF of <math>\ev_{i:n}</math> and <math>\sq_{\cB}(\cdot;i,n-i+1)</math> is the quantile function of a Beta-distributed random variable with parameters <math>i</math> and <math>n-i+1</math>.
Using this, their Lemmas 1 and 3 yield, respectively,
<math display="block">
\begin{align}
\sQ(v) &\le \min_{n,i}\sq_{\cB}(\sG_{i:n}(v);i,n-i+1),\forall v\in[\underline{v},\bar{v}],\label{eq:HT_upper}\\
\sQ(v) &\ge \max_{n}\sq_{\cB}(\sG_{n:n}(v-\delta);i,n-i+1),\forall v\in[\underline{v},\bar{v}],\label{eq:HT_lower}
\end{align}
</math>
where, for any <math>v\ge\underline{v}</math>, <math>\sG_{i:n}(v)\equiv\sP(\eb_{i:n}\le v)</math> denotes the observed CDF of <math>\eb_{i:n}</math> for <math>i=1,\dots,n</math>.
'''Key Insight:'''<i>
The model and analysis put forward by <ref name="hai:tam03"/> trade point identification of the distribution of valuation under stringent assumptions on the bidding rule, for a ''robust'' inference approach that yields informative bounds under weak and widely credible assumptions on bidding behavior.
Remarkably, “nothing is lost” due to the use of their robust approach: point identification is recovered when the standard assumptions of the button auction model hold.<ref group="Notes" >
The button auction model yields bidding behavior consistent with Identification [[#IP:auction |Problem]].</ref>
This is because in the dominant strategy equilibrium the top losing bidder exits at her valuation, followed immediately by the winning bidder.
Hence, <math>\eb_{n-1:n}=\ev_{n-1:n}=\eb_{n:n}</math> and <math>\delta=0</math>, so that the upper and the lower bound in \eqref{eq:HT_upper}-\eqref{eq:HT_lower} coincide and point identify the distribution of valuations.
</i>
<ref name="hai:tam03"/> also provide sharp bounds on the optimal reserve price, which I do not discuss here.
However, they leave open the question of whether the collection of CDFs satisfying \eqref{eq:HT_upper}-\eqref{eq:HT_lower} yields the sharp identification region for <math>\sQ</math>.
As discussed in Sections [[guide:Ec36399528#subsec:missing_data |Selectively Observed Data]]-[[guide:Ec36399528#subsec:interval_data |Interval Data]], pointwise bounds on the CDF deliver tubes of admissible CDFs that in general yield outer regions on the CDF of interest.
But in this identification problem, the issue of sharpness is even more subtle, and therefore addressed in the following subsection.
Before moving on to that discussion, I note that the work of <ref name="hai:tam03"/> spurred a rich literature applying partial identification analysis to the study of auction models.
<ref name="tan11"><span style="font-variant-caps:small-caps">Tang, X.</span>  (2011): “Bounds on revenue distributions in counterfactual  auctions with reserve prices” ''The RAND Journal of Economics'', 42(1),  175--203.</ref> studies first price sealed bid auctions with equilibrium behavior, where affiliated valuations prevent --in the absence of parametric restrictions on the distribution of the model primitives-- point identification of the model.
He derives bounds on seller revenue under various counterfactual scenarios on reserve prices and auction formats.
<ref name="arm13"><span style="font-variant-caps:small-caps">Armstrong, T.B.</span>  (2013): “Bounds in auctions with unobserved  heterogeneity” ''Quantitative Economics'', 4(3), 377--415.</ref> also studies first price sealed bid auctions with equilibrium behavior, but relaxes the independence assumptions on symmetric valuations by requiring it to hold only conditional on unobserved heterogeneity.
He derives bounds on various functionals of the distributions of interest, including the mean bid and mean valuation.
<ref name="ara:gan:qui13"><span style="font-variant-caps:small-caps">Aradillas‐López, A., A.Gandhi,  <span style="font-variant-caps:normal">and</span> D.Quint</span>  (2013):  “Identification and Inference in Ascending Auctions With Correlated Private  Values” ''Econometrica'', 81(2), 489--534.</ref> analyze second price auctions with correlated private values.
In this case, the distribution of valuations is not point identified even under the assumptions of the button auction model <ref name="ath:hai02"><span style="font-variant-caps:small-caps">Athey, S.,  <span style="font-variant-caps:normal">and</span> P.A. Haile</span>  (2002): “Identification of  Standard Auction Models” ''Econometrica'', 70(6), 2107--2140.</ref>{{rp|at=Theorem 4}}.
Nonetheless, <ref name="ara:gan:qui13"/> show that interesting functionals of it (seller profits and bidder surplus) can be bounded, if one assumes that transaction prices are determined by the second highest valuation and imposes some restrictions on the joint distribution of the number of bidders and distribution of the valuations.
<ref name="kom13"><span style="font-variant-caps:small-caps">Komarova, T.</span>  (2013): “Partial identification in asymmetric auctions  in the absence of independence” ''The Econometrics Journal'', 16(1),  S60--S92.</ref> studies a related model of second-price ascending auctions with arbitrary dependence in bidders’ private values.
She provides partial identification results for the joint distribution of values for any subset of bidders under various assumptions about what data the researcher observes.
While in her framework the highest bid is never observed, she considers the case where only the winner's identity and the winning price are observed, and the case where all the identities and all the bids except for the highest bid are known.
She also investigates the informational content of assuming positive dependence in bidders' values.
<ref name="gen:li14"><span style="font-variant-caps:small-caps">Gentry, M.,  <span style="font-variant-caps:normal">and</span> T.Li</span>  (2014): “Identification in auctions  with selective entry” ''Econometrica'', 82(1), 315--344.</ref> are concerned with nonparametric identification of a two-stage entry and bidding game.
Potential bidders are assumed to have private valuations and observe private signals before deciding whether to enter the auction.
The dependence between signals and valuations is only minimally restricted.
Hence, even with some excluded instruments that affect selection into the auction, the model primitives are only partially identified.
The authors derive bounds on these primitives, and provide conditions under which point identification is restored.
<ref name="syr:tam:zia18"><span style="font-variant-caps:small-caps">Syrgkanis, V., E.Tamer,  <span style="font-variant-caps:normal">and</span> J.Ziani</span>  (2018): “Inference  on auctions with weak assumptions on information” available at  [https://arxiv.org/abs/1710.03830 https://arxiv.org/abs/1710.03830].</ref> provide partial identification results in private value and common value auctions under weak restrictions on the information available to the bidders.
Their approach leverages a result in <ref name="ber:mor16"><span style="font-variant-caps:small-caps">Bergemann, D.,  <span style="font-variant-caps:normal">and</span> S.Morris</span>  (2016): “Bayes correlated  equilibrium and the comparison of information structures in games”  ''Theoretical Economics'', 11(2), 487--522.</ref> yielding an equivalence between distributions of valuations that obey the restrictions imposed by a Bayesian Correlated Equilibrium and those that obey the restrictions imposed by Bayesian Nash Equilibrium under some information structure.
Such equivalence is particularly helpful because the set of Bayesian Correlated Equilibria can be characterized through linear programming, so that the sharp identification region provided by <ref name="syr:tam:zia18"/> is given by the collection of parameter vectors <math>\vartheta</math> for which a linear program is feasible.
Related results leveraging the linear structure of correlated equilibria in the context of entry games include <ref name="yan06"><span style="font-variant-caps:small-caps">Yang, Z.</span>  (2006): “Correlated equilibrium and the estimation of  static discrete games with complete information” available at  [https://ideas.repec.org/p/pra/mprapa/79395.html https://ideas.repec.org/p/pra/mprapa/79395.html].</ref>, <ref name="ber:mol:mol11"/>{{rp|at=Supplementary Appendix E.2}}, and <ref name="mag:ron17"><span style="font-variant-caps:small-caps">Magnolfi, L.,  <span style="font-variant-caps:normal">and</span> C.Roncoroni</span>  (2017): “Estimation of  Discrete Games with Weak Assumptions on Information” available at  [http://lorenzomagnolfi.com/estimdiscretegames http://lorenzomagnolfi.com/estimdiscretegames].</ref>.
====<span id="subsubsec:sharp:auction"></span>Characterization of Sharpness through Random Set Theory====
<ref name="hai:tam03"/> bounds exploit the information contained in the ''marginal'' CDFs <math>\sG_{i:n}</math> for each <math>i</math> and <math>n</math>.
However, in Identification [[#IP:auction |Problem]] additional information can be extracted from the ''joint'' distribution of ordered bids.
<ref name="che:ros17"></ref> obtain the sharp identification region <math>\idr{\sQ}</math> using random set methods (Artstein's characterization in [[guide:379e0dcd67#thr:artstein |Theorem]]) applied to a quantile function representation of the order statistics.
Here I provide an equivalent characterization that uses equation \eqref{eq:RCS_auction} directly, and which has not appeared in the literature before.
Let <math>\cT</math> denote the space of probability distributions with support on <math>[\underline{v},\bar{v}]</math>, so that <math>\sQ\in\cT</math>.
For a candidate distribution <math>\tilde{\sQ}\in\cT</math>, let <math>\tilde{\sQ}_{1,\dots,n:n}</math> denote the implied distribution of order statistics of <math>n</math> i.i.d. random variables distributed <math>\tilde{\sQ}</math>.
Let <math>\tilde{\eB}</math> be a random closed set defined as in \eqref{eq:RCS_auction} with respect to order statistics of i.i.d. random variables with distribution <math>\tilde{\sQ}</math>.
For a given set <math>K\in\cK</math>, with <math>\cK</math> the collection of compact subsets of <math>\R^n</math>, let <math>\sT_{\tilde\eB}(K;\tilde{\sQ})</math> denote the probability of the event <math>\{\tilde\eB\cap K\neq \emptyset\}</math> implied by <math>\tilde{\sQ}</math>.
{{Proofcard|Theorem (Distribution of Valuations in Incomplete Auction Model with Independent Private Values)|SIR:auction|
Under the assumptions of Identification [[#IP:auction |Problem]], the sharp identification region for <math>\sQ</math> is
<math display="block">
\begin{align}
\label{eq:SIR:auction}
\idr{\sQ}= \left\{\tilde{\sQ}\in\cT: \sP(\vec{\eb}_n\in K) \le \sT_{\tilde\eB}(K;\tilde{\sQ}) \forall K\in\cK \right\}.
\end{align}
</math>|The sharp identification region for <math>\sQ</math> is given by the collection of probability distributions <math>\tilde{\sQ}\in\cT</math> for which one can find a bidding rule <math>\sB(\cdot|\cdot)</math> with support in <math>\tilde{\eB}</math> a.s. such that <math>\sG_{1,\dots,n:n}(\cdot)=\sM_{1,\dots,n:n}(\cdot;\sB,\tilde{\sQ})</math>.
Here <math>\sM_{1,\dots,n:n}(\cdot;\sB,\tilde{\sQ})</math> is defined as in \eqref{eq:model:impl_sel_mech_auction} with <math>\tilde{\sQ}</math> replacing <math>\sQ</math>.
Take a distribution <math>\tilde{\sQ}</math> satisfying this definition of sharpness.
Then there exists a selection of <math>\tilde{\eB}</math> determined by the bidding rule associated with <math>\tilde{\sQ}</math>, such that its distribution matches that of <math>\vec{\eb}_n</math>.
But then [[guide:379e0dcd67#thr:artstein |Theorem]] implies that the inequalities in \eqref{eq:SIR:auction} hold.
Conversely, take <math>\tilde{\sQ}</math> satisfying the inequalities in \eqref{eq:SIR:auction}.
Then, by [[guide:379e0dcd67#thr:artstein |Theorem]], <math>\vec{\eb}_n</math> and <math>\tilde{\eB}</math> can be realized on the same probability space as random elements <math>\vec{\eb}_n^\prime</math> and <math>\tilde{\eB}^\prime</math>, <math>\vec{\eb}_n\edis \vec{\eb}_n^\prime</math>, <math>\tilde{\eB}\edis\tilde{\eB}^\prime</math>, such that <math>\vec{\eb}_n^\prime \in \tilde{\eB}^\prime</math> a.s.
One can then complete the auction model with a bidding rule that picks <math>\vec{\eb}_n^\prime</math> with probability <math>1</math>, and the result follows.}}
In \eqref{eq:SIR:auction}, <math>\sP(\vec{\eb}_n\in K)</math> is determined by the joint distribution of the ordered bids and hence can be learned from the data.
On the other side, <math>\sT_{\tilde\eB}(K;\tilde{\sQ})</math> is a function of the model and <math>\tilde{\sQ}\in\cT</math>.
Hence, it can be computed using \eqref{eq:RCS_auction}, with <math>\tilde\eB</math> defined with respect to order statistics of i.i.d. random variables with distribution <math>\tilde{\sQ}\in\sT</math>.
To gain insights in the characterization of <math>\idr{\sQ}</math>, consider for example the set <math>K=\{\prod_{i=1}^{n-1}(-\infty,+\infty)\}\times(-\infty,v]</math>.
Plugging it in the inequalities in \eqref{eq:SIR:auction}, one obtains
<math display="block">
\begin{align*}
\sG_{n:n}(v) \le \sQ_{n-1,n}(v),\text{for all } n,
\end{align*}
</math>
which, using \eqref{eq:HT:beta}, yields \eqref{eq:HT_lower}.
Similarly, plugging in the sets <math>K_j=\{\prod_{i=1}^{j-1}(-\infty,+\infty)\}\times[v,\infty)\times\{\prod_{j+1}^n(-\infty,+\infty)\}</math>, <math>j=1,\dots,n</math>, yields \eqref{eq:HT_upper}.
So the inequalities proposed by <ref name="hai:tam03"/> are a subset of the inequalities yielding the sharp identification region in Theorem [[#SIR:auction |SIR-]].
More information can be obtained by using additional sets <math>K</math>.
For instance, the set <math>K=[v_1,\infty)\times[v_2,\infty)\times\{\prod_{i=1}^{n}(-\infty,+\infty)\}</math>, <math>v_2\ge v_1</math>, yields <math>\sP(\eb_{1:n}\ge v_1,\eb_{2:n}\ge v_2)\le \sQ_{1,2:n}([v_1,\infty)\times[v_2,\infty))</math>, which further restricts <math>\sQ</math>.
Numerous examples can be given.
Characterization \eqref{eq:SIR:auction} is stated using inequality [[guide:379e0dcd67#eq:domin-t |eq:domin-t]] for the collection of compact subsets of <math>\R^n</math>.
One can instead use the (equivalent) inequality [[guide:379e0dcd67#eq:dom-c |eq:dom-c]], and show that in fact it suffices to check it for a much smaller collection of sets, as shown by <ref name="che:ros17"/> (see also <ref name="mol:mol18"/>{{rp|at=Section 2.2}}).
Nonetheless, this collection remains extremely large.
'''Key Insight: Random set theory and partial identification -- continued'''<i>
As stated in the Introduction, constructing the (random) set of model predictions delivered by the maintained assumptions is an exercise typically carried out in identification analysis, regardless of whether random set theory is applied.
Indeed, for the problem studied in this section, <ref name="hai:tam03"/>{{rp|at=equation D1}} put forward the set of admissible bids in \eqref{eq:RCS_auction}.<ref group="Notes" >Equations D1 in {{ref|name=hai:tam03}} and \eqref{eq:RCS_auction} here differ in that the latter also requires bids to be ordered. This observation was besides the point in {{ref|name=hai:tam03}} discussion that led to equation D1.</ref> With this set in hand, the tools of random set theory (in this case, [[guide:379e0dcd67#thr:artstein |Theorem]]) immediately deliver the sharp identification region of interest.
</i>
<ref name="che:ros17auction"><span style="font-variant-caps:small-caps">Chesher, A.,  <span style="font-variant-caps:normal">and</span> A.M. Rosen</span>  (2017b): “Incomplete English auction models with  heterogeneity” CeMMAP working paper CWP27/17, available at  [https://www.cemmap.ac.uk/publication/id/9277 https://www.cemmap.ac.uk/publication/id/9277].</ref> further generalize the analysis in this section by dropping the requirement of independent private values.
This allows them, for example, to consider affiliated private values.
They show that even in this significantly more complex context, the key behavioral restrictions imposed by <ref name="hai:tam03"/> to relate bids to valuations can be coupled with the use of random set theory, to characterize sharp identification regions.
===<span id="subsubsec:networks"></span>Network Formation Models===
Strategic models of network formation generalize the frameworks of single agents and multiple agents discrete choice models reviewed in Sections [[#subsec:single:ag:RUM |Discrete Choice in Single Agent Random Utility Models]] and [[#subsec:multiple:eq |Static, Simultaneous-Move Finite Games with Multiple Equilibria]].
They posit that pairs of agents (nodes) form, maintain, or sever connections (links) according to an explicit equilibrium notion and utility structure.
Each individual's utility depends on the links formed by others (the network) and on utility shifters that may be pair-specific.
One may conjecture that the results reported in Sections [[#subsec:single:ag:RUM |Discrete Choice in Single Agent Random Utility Models]]-[[#subsec:multiple:eq |Static, Simultaneous-Move Finite Games with Multiple Equilibria]] apply in this more general context too.
While of course lessons can be carried over, network formation models present challenges that combined cannot be overcome without the development of new tools.
These include the issue of equilibrium existence and the possibility of multiple equilibria when they exist, due to the interdependence in agents' choices (this problem was already discussed in Section [[#subsec:multiple:eq |Static, Simultaneous-Move Finite Games with Multiple Equilibria]]).
Another challenge is the degree of correlation between linking decisions, which interacts with how the observable data is generated: one may observe a growing number of independent networks, or a growing number of agents on a single network.
Yet another challenge, which substantially increases the difficulties associated with the previous two, is the combinatoric complexity of network formation problems.
The purpose of this section is exclusively to discuss some recent papers that have made important progress to address these specific challenges and carry out partial identification analysis.
For a thorough treatment of the literature on network formation, I refer to the reviews in <ref name="gra15"><span style="font-variant-caps:small-caps">Graham, B.S.</span>  (2015): “Methods of Identification in Social  Networks” ''Annual Review of Economics'', 7(1), 465--485.</ref>, <ref name="cha16"><span style="font-variant-caps:small-caps">Chandrasekhar, A.</span>  (2016): “Econometrics of Network Formation” in  ''Oxford Handbook on the Economics of Networks'', ed. by Y.Bramoulle,  A.Galeotti,  <span style="font-variant-caps:normal">and</span> B.Rogers, chap.13. Oxford University Press.</ref>, <ref name="pau17"><span style="font-variant-caps:small-caps">{\noopsort{Paula}}{de Paula}, A.</span>  (2017): “Econometrics of Network Models” in  ''Advances in Economics and Econometrics: Eleventh World Congress'', ed.  by B.Honoré, A.Pakes, M.Piazzesi,  <span style="font-variant-caps:normal">and</span> L.Samuelson, vol.1 of  ''Econometric Society Monographs'', p. 268–323. Cambridge University  Press.</ref>, and <ref name="gra19"><span style="font-variant-caps:small-caps">Graham, B.S.</span>  (2019): “The Econometric Analysis of Networks” in  ''Handbook of Econometrics''. Elsevier.</ref>{{rp|at=Chapter XXX in this Volume}}.<ref group="Notes" >For a review of the literature on peer group effect analysis, see, e.g., {{ref|name=bro:dur01hoe}}, {{ref|name=blu:bro:dur:ioa11}}, {{ref|name=pau17}}, and {{ref|name=gra19}}.</ref>
Depending on whether the researcher observes data from a single network or multiple independent networks, the underlying population of agents may be represented as a continuum or as a countably infinite set in the first case, or as a finite set in the second case.
Henceforth, I denote generic agents as <math>i</math>, <math>j</math>, <math>k</math>, and <math>m</math>.
I consider static models of undirected network formation with non-transferable utility.<ref group="Notes" >''Undirected'' means that if a link  from node <math>i</math> to node <math>j</math> exists, then the link from <math>j</math> to <math>i</math> exists.
The discussion that follows can be generalized to the case of models with transferable utility.</ref>
The collection of all links among nodes forms the network, denoted <math>\ey</math>.
For any pair <math>(i,j)</math> with <math>i\neq j</math>, <math>\ey_{ij}=1</math> if they are linked, and <math>\ey_{ij}=0</math> otherwise (<math>\ey_{ii}=0</math> for all <math>i</math> by convention).
The notation <math>\ey-\{ij\}</math> denotes the network that results if a link present between nodes <math>i</math> and <math>j</math> is deleted, while <math>\ey+\{ij\}</math> denotes the network that results if a link absent between nodes <math>i</math> and <math>j</math> is added.
Denote agent <math>i</math>'s payoff by <math>\bu_i(\ey,\ex,\epsilon)</math>.
This payoff depends on the network <math>\ey</math> and the payoff shifters <math>(\ex,\epsilon)</math>, with <math>\ex</math> observable both to the agents and to the researcher, <math>\epsilon</math> only to the agents, and <math>(\ex,\epsilon)</math> collecting <math>(\ex_{ij},\epsilon_{ij})</math> for all <math>i</math> and <math>j</math>.<ref group="Notes" >Here I consider a framework where the agents have complete information.</ref>
Following much of the literature, I employ ''pairwise stability'' <ref name="jac:wol96"><span style="font-variant-caps:small-caps">Jackson, M.O.,  <span style="font-variant-caps:normal">and</span> A.Wolinsky</span>  (1996): “A Strategic Model  of Social and Economic Networks” ''Journal of Economic Theory'', 71(1),  44 -- 74.</ref> as equilibrium notion: <math>\ey</math> is a pairwise stable network if all linked agents prefer not to sever their links, and all non-existing links are damaging to at least one agent.
Formally,
<math display="block">
\begin{align*}
\forall(i,j):\ey_{ij}&=1,\bu_i(\ey,\ex,\epsilon)\ge \bu_i(\ey-\{ij\},\ex,\epsilon)\mathrm{and}\bu_j(\ey,\ex,\epsilon)\ge \bu_j(\ey-\{ij\},\ex,\epsilon),\\
\forall(i,j):\ey_{ij}&=0,\mathrm{if}\bu_i(\ey+\{ij\},\ex,\epsilon) >  \bu_i(\ey,\ex,\epsilon)\mathrm{then}\bu_j(\ey+\{ij\},\ex,\epsilon) <  \bu_j(\ey,\ex,\epsilon).
\end{align*}
</math>
Under this equilibrium notion, if equilibria exist multiplicity is likely; see, among others, the examples in <ref name="gra15"/>{{rp|at=p. 475}}, <ref name="pau17"/>{{rp|at=p. 301}}, and <ref name="she18"><span style="font-variant-caps:small-caps">Sheng, S.</span>  (2018): “A structural econometric analysis of network  formation games through subnetworks” ''Econometrica'', accepted for  publication.</ref>{{rp|at=example 3.1}}.
The model is therefore ''incomplete'', because it does not specify how an equilibrium is selected in the region of multiplicity.
For the same reasons as discussed in the context of finite games in Section [[#subsec:multiple:eq |Static, Simultaneous-Move Finite Games with Multiple Equilibria]], partial identification results (unless one is willing to impose restrictions on the equilibrium selection mechanism).
However, as I explain below, an immediate application of the identification analysis carried out there presents enormous practical challenges because there are <math>2^{n(n-1)/2}</math> possible network configurations to be checked for stability (and the dimensionality of the space of unobservables is also very large).
In what follows I consider two distinct frameworks that make different assumptions about the utility function and how the data is generated, and discuss what can be learned about the parameters of interest in these cases.
====<span id="subsubsec:networks:1"></span>Data from Multiple Independent Networks====
I first consider the case that the researcher observes data from multiple independent networks.
I follow the set-up put forward by <ref name="she18"/>.
{{proofcard|Identification Problem (Network Formation Model with Multiple Independent Networks)|IP:networks:multiple:indep|Let there be <math>n\in\{2,3,\dots\},n < \infty</math> agents, and let <math>(\ex,\ey)\sim\sP</math> be observable random variables in <math>\times_{j=1}^n\R^d\times\{0,1\}^{n(n-1)/2}</math>, <math>d < \infty</math>.
Suppose that <math>\ey</math> is a pairwise stable network.
For each agent <math>i</math>, let the utility function be known up to finite dimensional parameter vector <math>\delta\in\Delta\subset\R^p</math>, and given by
<math display="block">
\begin{multline}
\bu_i(\ey,\ex,\epsilon;\delta)=\sum_{j=1}^n \ey_{ij}(f(\ex_i,\ex_j;\delta_1)+\epsilon_{ij})\\
+\delta_2\frac{\sum_{j=1}^n\sum_{k\neq i,k=1}^n\ey_{ij}\ey_{jk}}{n-2}+\delta_3\frac{\sum_{j=1}^n\sum_{k=j+1}^n\ey_{ij}\ey_{ik}\ey_{jk}}{n-2}\label{eq:utility:network:1}
\end{multline}
</math>
with <math>f(\cdot,\cdot;\cdot)</math> a continuous function of its arguments.<ref group="Notes" >The effects of having friends in common and of friends of friends in \eqref{eq:utility:network:1} are normalized by <math>n-2</math>. This enforces that the marginal utility that <math>i</math> receives from linking with <math>j</math> is affected by <math>j</math> having an additional link with <math>k</math> to a smaller degree as <math>n</math> grows. This does not result in diminishing network effects.</ref>
Suppose that <math>\epsilon_{ij}</math> are independent for all <math>i\neq j</math> and identically distributed with CDF known up to parameter vector <math>\gamma\in\Gamma\subset\R^m</math>, denoted <math>\sF_\gamma</math>.
Assume that the support of <math>\sF_\gamma</math> is <math>\R</math>, that <math>\sF_\gamma</math> is absolutely continuous with respect to Lebesgue measure, and continuously differentiable with respect to <math>\gamma\in\Gamma</math>.
Let <math>\Theta=\Delta\times\Gamma</math>.
Assume that the researcher observes a random sample of networks and observable payoff shifters drawn from <math>\sP</math>.
In the absence of additional information, what can the researcher learn about <math>\theta\equiv[\delta_1\delta_2\delta_3\gamma]</math>?
|}}
<ref name="she18"/> analyzes this problem.
She establishes equilibrium existence provided that <math>\delta_2\ge 0</math> and <math>\delta_3\ge 0</math> <ref name="she18"/>{{rp|at=Proposition 2.2}}.<ref group="Notes" >With transferable utility, {{ref|name=she18}}{{rp|at=Proposition 2.1}} establishes existence for any <math>\delta_2,\delta_3\in\R</math>.
See {{ref|name=hel13}} for an earlier analysis of existence and uniqueness of pairwise stable networks.</ref>
Given payoff shifters <math>(\ex,\epsilon)</math> and parameters <math>\vartheta\equiv[\tilde\delta_1\tilde\delta_2\tilde\delta_3\tilde\gamma]\in\Theta</math>, let <math>\eY_\vartheta(\ex,\epsilon)</math> denote the collection of pairwise stable networks implied by the model.
It is easy to show that <math>\eY_\vartheta(\ex,\epsilon)</math> is a random closed set as in [[guide:379e0dcd67#def:rcs |Definition]].
The networks in <math>\eY_\vartheta(\ex,\epsilon)</math> are <math>n\times n</math> symmetric adjacency matrices with diagonal elements equal to zero and off diagonal elements in <math>\{0,1\}</math>.
To ease notation, I omit <math>\eY_\vartheta</math>'s dependence on <math>(\ex,\epsilon)</math> in what follows.
Under the assumption that <math>\ey</math> is a pairwise stable network, at the true data generating value of <math>\theta\in\Theta</math>, one has
<math display="block">
\begin{align}
\ey\in\eY_\theta\mathrm{a.s.} \label{eq:y_in_Y_network_multiple}
\end{align}
</math>
Equation \eqref{eq:y_in_Y_network_multiple} exhausts the modeling content of Identification [[#IP:networks:multiple:indep |Problem]].
[[guide:379e0dcd67#thr:artstein |Theorem]] can be leveraged to extract its empirical content from the observed distribution <math>\sP(\ey,\ex)</math>.
Let <math>\cY</math> be the collection of <math>n\times n</math> symmetric matrices with diagonal elements equal to zero and all other entries in <math>\{0,1\}</math>, so that <math>|\cY|=2^{n(n-1)/2}</math>.
For a given set <math>K\subset\cY</math>, let <math>\sT_{\eY_{\vartheta}}(K;\sF_\gamma)</math> denote the probability of the event <math>\{\eY_\vartheta\cap K\neq \emptyset\}</math> implied when <math>\epsilon\sim\sF_\gamma</math>, <math>\ex</math>-a.s.
{{proofcard|Theorem (Structural Parameters in Network Formation Models with Multiple Independent Networks)|SIR:networks:1|
Under the assumptions of Identification [[#IP:networks:multiple:indep |Problem]], the sharp identification region for <math>\theta</math> is
<math display="block">
\begin{align}
\idr{\theta}=\{\vartheta\in\Theta:\sP(\ey\in K|\ex)\le \sT_{\eY_{\vartheta}}(K;\sF_{\tilde\gamma})\,\forall K\subset\cY, \, \ex\text{-a.s.}\}.\label{eq:SIR:networks:1}
\end{align}
</math>|Follows from similar arguments as for the proof of [[#SIR:entry_game |Theorem]].}}
The characterization of <math>\idr{\theta}</math> in Theorem [[#SIR:networks:1 |SIR-]] is new to this chapter.<ref group="Notes" >
{{ref|name=gua19}} has previously used Theorem D.1 in {{ref|name=ber:mol:mol11}}, as I do here, to characterize sharp identification regions in unilateral and bilateral directed network formation games.</ref>
While technically it entails a finite number of conditional moment inequalities, in practice their number can be prohibitive as it can be as large as <math>2^{2^{n(n-1)/2}}-2</math>.<ref group="Notes" >This number may be reduced drastically using the notion of ''core determining class'' of sets, see [[guide:379e0dcd67#def:core-det |Definition]] and the discussion on [[guide:379e0dcd67 |Basic Definitions and Facts from Random Set Theory]].
Nonetheless, even with relatively few agents, the number of inequalities in \eqref{eq:SIR:networks:1} may remain overwhelming.</ref>
Even using only a subset of the inequalities in \eqref{eq:SIR:networks:1} to obtain an outer region, for example applying the insights in <ref name="cil:tam09"/>, may not be practical (with <math>n=20</math>, <math>|\cY|\approx 10^{57}</math>).
Moreover, computation of <math>\sT_{\eY_{\vartheta}}(K;\sF_\gamma)</math> may require (depending on the set <math>K</math>) evaluation of rather complex integrals.
To circumvent these challenges, <ref name="she18"/> proposes to analyze network formation through ''subnetworks''.
A subnetwork is the restriction of a network to a subset of the agents (i.e., a subset of nodes and the links between them).
For given <math>A\subseteq\{1,2,\dots,n\}</math>, let <math>\ey^A=\{\ey_{ij}\}_{i,j\in A, i\neq j}</math> be the submatrix in <math>\ey</math> with rows and columns in <math>A</math>, and let <math>\ey^{-A}</math> be the remaining elements of <math>\ey</math> after <math>\ey^A</math> is deleted.
With some abuse of notation, let <math>(\ey^A,\ey^{-A})</math> denote the composition of <math>\ey^A</math> and <math>\ey^{-A}</math> that returns <math>\ey</math>.
Recall that <math>\eY_\vartheta\equiv\eY_\vartheta(\ex,\epsilon)</math>, and let
<math display="block">
\begin{align*}
\eY_{\vartheta}^A=\{\ey^A\in\{0,1\}^{|A|}:\exists \ey^{-A}\in\{0,1\}^{|-A|}\mathrm{suchthat}(\ey^A,\ey^{-A})\in\eY_{\vartheta}\}
\end{align*}
</math>
be the collection of subnetworks with rows and columns in <math>A</math> that can be part of a pairwise stable network in <math>\eY_\vartheta</math>.
Let <math>\ex^A</math> denote the subset of <math>\ex</math> collecting <math>\ex_{ij}</math> for <math>i,j\in A</math>.
For a given <math>y^A\in\{0,1\}^{|A|}</math>, let <math>\sC_{\eY_{\vartheta}^A}(y^A;\sF_\gamma)</math> and <math>\sT_{\eY_{\vartheta}^A}(y^A;\sF_\gamma)</math> denote, respectively, the probability of the events <math>\{\eY_\vartheta^A=\{y^A\}\}</math> and <math>\{\{y^A\}\in\eY_\vartheta^A\}</math> implied when <math>\epsilon\sim\sF_\gamma</math>, <math>\ex</math>-a.s.
The first event means that only the subnetwork <math>y^A</math> is part of a pairwise stable network, while the second event means that <math>y^A</math> is a possible subnetwork that is part of a pairwise stable network but other subnetworks may be part of it too.
<ref name="she18"/>{{rp|at=Proposition 4.1}} provides the following outer region for <math>\theta</math> by adapting the insight in <ref name="cil:tam09"/> to subnetworks.
In the theorem I abuse notation compared to [[guide:55e14f6e47#tab:notation |Table]] by introducing a superscript, <math>A</math>, to make explicit the dependence of the outer region on it.
{{Proofcard|Theorem (Subnetworks-based Outer Region on Structural Parameters in Network Formation Models with Multiple Independent Networks)|OR:networks:1|Under the assumptions of Identification [[#IP:networks:multiple:indep |Problem]], for any <math>A\subseteq\{1,2,\dots,n\}</math>, an <math>A</math>-dependent outer region for <math>\theta</math> is
<math display="block">
\begin{align}
\mathcal{O}^A_\sP[\theta]=\{\vartheta\in\Theta:\sC_{\eY_{\vartheta}^A}(y^A;\sF_{\tilde\gamma})\le\sP(\ey^A=y^A|\ex^A)\le \sT_{\eY_{\vartheta}^A}(y^A;\sF_{\tilde\gamma})\,\forall y^A\subset\cY^A, \, \ex^A\text{-a.s.}\},\label{eq:OR:networks:1}
\end{align}
</math>
where <math>\cY^A</math> is the collection of <math>|A|\times|A|</math> symmetric matrices with diagonal elements equal to zero and all other elements in <math>\{0,1\}</math> so that <math>|\cY^A|=2^{|A|(|A|-1)/2}</math>.|Let <math>\eu(\tilde\ey|\eY_\vartheta)</math> be a random variable in the unit simplex in <math>\R^{n(n-1)/2}</math> which assigns to each possible pairwise stable network <math>\tilde\ey</math> that may realize given <math>(\ex,\epsilon)</math> and <math>\vartheta\in\Theta</math> the probability that it is selected from <math>\eY_\vartheta</math>.
Given <math>y\in\cY</math>, denote by <math>\sM(y|\ex)</math> the model predicted probability that the network realizes equal to <math>y</math>.
Then the model yields
<math display="block">
\begin{align}
\sM(y|\ex)&=\int\eu(y| Y_\vartheta)d\sF_\gamma=\int_{y\in Y_\vartheta,| Y_\vartheta|=1}d\sF_\gamma+\int_{y\in Y_\vartheta,| Y_\vartheta|\ge 2}\eu( y| Y_\vartheta)d\sF_\gamma.\label{eq:model:distrib:network:1}
\end{align}
</math>
The model implied distribution for subnetwork <math>\tilde\ey^A</math> is obtained by taking the marginal of expression \eqref{eq:model:distrib:network:1} with respect to <math>\tilde\ey^{-A}</math>
<math display="block">
\begin{align}
\sM(y^A|\ex)&=\sum_{y^{-A}}\sM((y^A,y^{-A})|\ex)=
\int_{y^A\in Y_\vartheta^A,| Y_\vartheta^A|=1}d\sF_\gamma+\int_{y^A\in Y_\vartheta^A,| Y_\vartheta^A|\ge 2}\sum_{y^{-A}}\eu((y^A,y^{-A})| Y_\vartheta)d\sF_\gamma.\label{eq:model:distrib:subnetwork:1}
\end{align}
</math>
Replacing <math>\eu</math> in \eqref{eq:model:distrib:subnetwork:1} with zero and one yields the bounds in \eqref{eq:OR:networks:1}.}}
<ref name="she18"/>{{rp|at=Section 4.2}} further assumes that the selection mechanism <math>\eu(\tilde\ey|\eY_\vartheta)</math> is invariant to permutations of the labels of the players.
Under this condition and the maintained assumptions on <math>\epsilon</math>, she shows that the inequalities in \eqref{eq:OR:networks:1} are invariant under permutations of labels, so subnetworks in any two subsets <math>A,A'\subseteq\{1,2,\dots,n\}</math> with <math>|A|=|A'|</math> and <math>\ex^A=\ex^{A'}</math> yield the same inequalities for all <math>y^A=y^{A'}</math>.
It is therefore sufficient to consider subnetwork <math>A</math> and the inequalities in \eqref{eq:OR:networks:1} associated with it.
Leveraging this result, <ref name="she18"/> proposes an outer region obtained by looking at unlabeled subnetworks of size <math>|A|\le\bar{a}</math> and given by
<math display="block">
\begin{align*}
\outr{\theta}=\bigcap_{|A|\le\bar{a}}\mathcal{O}^A_\sP[\theta].
\end{align*}
</math>
As long as the subnetworks are chosen to be small, e.g., <math>|A|\le 2,3,4</math>, the inequalities in \eqref{eq:OR:networks:1} can be computed even if the network is large.
<ref name="she18"/> shows that the inequalities in \eqref{eq:OR:networks:1} remain informative as <math>n</math> grows.
This fact highlights the importance of working with subnetworks.
One could have applied the insight of <ref name="cil:tam09"/> directly to the full network by setting <math>\eu</math> equal to zero and to one in \eqref{eq:model:distrib:network:1}.
The resulting bounds, however, would vanish to zero as <math>n</math> grows and become uninformative for <math>\theta</math>.
The characterization in Theorem [[#OR:networks:1 |OR-]] can be refined to obtain a smaller region, adapting the results in <ref name="ber:mol:mol11"/>{{rp|at=Supplementary Appendix Theorem D.1}} to subnetworks.
The size of this refined region is weakly decreasing in <math>|A|</math>.<ref group="Notes" >The idea of using random set methods on subnetworks to obtain the refined region was put forward in an earlier version of {{ref|name=she18}}. She provided a proof that the refined region's size decreases weakly in <math>|A|</math>.</ref>
However, the refinement does not yield <math>\idr{\theta}</math> because it is applied only to subnetworks.
'''Key Insight:'''<i>
At the beginning of this section I highlighted some key challenges to inference in network formation models.
Identification [[#IP:networks:multiple:indep |Problem]] bypasses the concern on the dependence among linking decisions through the independence assumption on <math>\epsilon_{ij}</math> and the presumption that the researcher observes data from multiple independent networks, which allows for identification of <math>\sP(\ey,\ex)</math>.
<ref name="she18"/> takes on the remaining challenges by formally establishing equilibrium existence and allowing for unrestricted selection among multiple equilibria.
In order to overcome the computational complexity of the problem, she puts forward the important idea of inference based on subnetworks.
While of course information is left on the table, the approach remains feasible even with large networks.
</i>
<ref name="miy16"><span style="font-variant-caps:small-caps">Miyauchi, Y.</span>  (2016): “Structural estimation of pairwise stable  networks with nonnegative externality” ''Journal of Econometrics'',  195(2), 224 -- 235.</ref> considers a framework similar to the one laid out in Identification [[#IP:networks:multiple:indep |Problem]].
He assumes non-negative externalities, and shows that in this case the set of pairwise stable equilibria is a complete lattice with a smallest and a largest equilibrium.<ref group="Notes" >This approach exploits supermodularity, and is related to {{ref|name=jia08}} and {{ref|name=ech05}}.</ref>
He then uses moment functions that are monotone in the pairwise stable network (so that they take their extreme values at the smallest and largest equilibria), to obtain moment conditions that restrict <math>\theta</math>.
Examples of the moment functions used include the proportion of pairs with a link, the proportion of links belonging to traingles, and many more (see <ref name="miy16"/>{{rp|at=Table 1}}).
<ref name="gua19"><span style="font-variant-caps:small-caps">Gualdani, C.</span>  (2019): “An Econometric Model of Network Formation with  an Application to Board Interlocks Between Firms” available at  [http://docs.wixstatic.com/ugd/063589_b751c9f9c4e34d51b4da7ed7e007080a.pdf http://docs.wixstatic.com/ugd/063589_b751c9f9c4e34d51b4da7ed7e007080a.pdf].</ref> considers unilateral and bilateral directed network formation games, still under a sampling framework where the researcher observes many independent networks.
The equilibrium notion that she uses is pure strategy Nash.
She assumes that the payoff that player <math>i</math> receives from forming link <math>ij</math> is allowed to depend on the number of additional players forming a link pointing to <math>j</math>, but rules out other spillover effects.
Under this assumption and some regularity conditions, <ref name="gua19"/> shows that the network formation game can be decomposed into local games (i.e., games whose sets of players and strategy profiles are subsets of the network formation game's ones), so that the network formation game is in equilibrium if and only if each local game is in equilibrium.
She then obtains a characterization of <math>\idr{\theta}</math> using elements of random set theory.
====<span id="subsubsec:networks:2"></span>Data From a Single Network====
When the researcher observes data from a single network, extra care has to be taken to restrict the dependence among linking decisions.
This can be done in various ways (see, e.g., <ref name="cha16"/>{{rp|at=for some examples}}).
Here I consider a framework proposed by <ref name="pau:shu:tam18"><span style="font-variant-caps:small-caps">{\noopsort{Paula}}{de Paula}, A., S.Richards-Shubik,  <span style="font-variant-caps:normal">and</span>  E.Tamer</span>  (2018): “Identifying Preferences in Networks With Bounded  Degree” ''Econometrica'', 86(1), 263--288.</ref>.
{{proofcard|Identification Problem (Network Formation Model with a Single Network)|IP:networks:single|Let there be a continuum of agents <math>j\in\cI=[0,\mu]</math>, with <math>\mu > 0</math> their total measure, who choose whom to link to based on a utility function specified below.<ref group="Notes" >This is an approximation to a framework with a large but finite number of agents.
The utility function can be less restrictive than the one considered here (see Assumptions 1 and 2 in {{ref|name=pau:shu:tam18}}).</ref>
Let <math>y:\cI\times\cI\to\{0,1\}</math> be an adjacency mapping with <math>y_{jk}=1</math> if nodes <math>j</math> and <math>k</math> are linked, and <math>y_{jk}=0</math> otherwise.
Assume that only connections up to distance <math>\bar{d}</math> affect utility and that preferences are such that agents never choose to form more than a total of <math>\bar{l}</math> links.<ref group="Notes" >The distance measure used here is the shortest path between two nodes.</ref>
To simplify exposition, let <math>\bar{d}=2</math>.
Let each agent <math>j</math> be endowed with characteristics <math>\ex_j\in\cX</math>, with <math>\cX</math> a finite set in <math>\R^p</math>, that are observable to the researcher.
Additionally, let each agent <math>j</math> be endowed with <math>\bar{l}\times|\cX|</math> preference shocks <math>\epsilon_{j\ell}(x)\in\R,\ell=1,\dots,\bar{l},x\in\cX</math>, that are unobservable to the researcher and correspond to the possible direct connections and their characteristics.<ref group="Notes" >Under this assumption, the preference shocks do not depend on the individual identities of the agents.
Hence, it agents <math>k</math> and <math>m</math> have the same observable characteristics, then <math>j</math> is indifferent between them.</ref>
Suppose that the vector of preference shocks is independent of <math>\ex</math> and has a distribution known up to parameter vector <math>\gamma\in\Gamma\subset\R^m</math>, denoted <math>\sQ_\gamma</math>.
Let <math>\cI(j)=\{k:y_{jk}=1\}</math>.
Assume that agents with characteristics and preference shocks <math>(x,e)</math> value links according to the utility function
<math display="block">
\begin{multline}
\bu_j(y,x,e)=\sum_{k\in\cI(j)}(f(x_j,x_k)+e_{j\ell(k)}(x_k))\\
+\delta_1\left|\bigcup_{k\in\cI(j)}\cI(k)-\cI(j)-\{j\}\right|
+\delta_2\sum_{k\in\cI(j)}\sum_{m\in\cI(j):m > k}y_{km}-\infty\one(|\cI(k)| > \bar{l})\label{eq:utility:network:2}
\end{multline}
</math>
Assume that the network <math>\ey</math> formed by agents with characteristics and shocks <math>(\ex,\epsilon)</math> is pairwise stable.
Let <math>\Theta\equiv\Upsilon\times\Delta\times\Gamma</math>, with <math>\Upsilon</math> the parameter space for <math>\cf\equiv\{f(x,w):x\in\cX,w\in\cX\}</math>.
In the absence of additional information, what can the researcher learn about <math>\theta\equiv[\cf\delta_1\delta_2\gamma]</math>?
|}}
Identification [[#IP:networks:single |Problem]] enforces dimension reduction through the restrictions on depth and degree (the bounds <math>\bar{d}</math> and <math>\bar{l}</math>), so that it is applicable to frameworks with networks that have limited degree distribution (e.g., close friendships network, but not Facebook network).
It also requires that individual identities are irrelevant.
This substantially reduces the richness of unobserved heterogeneity allowed for and the dimensionality of the space of unobservables.
While the latter feature narrows the domain of applicability of the model, it is very beneficial to obtain a tractable characterization of what can be learned about <math>\theta</math>, and yields equilibria that may include isolated nodes, a feature often encountered in networks data.
<ref name="pau:shu:tam18"/> study Identification [[#IP:networks:single |Problem]] focusing on the payoff-relevant local subnetworks that result from the maintained assumptions.
These are distinct from the subnetworks used by <ref name="she18"/>: whereas <ref name="she18"/> looks at subnetworks formed by arbitrary individuals and whose size is chosen by the researcher on the base of computational tractability, <ref name="pau:shu:tam18"/> look at subnetworks among individuals that are within a certain distance of each other, as determined by the structure of the preferences.
On the other hand, <ref name="she18"/> analysis does not require that agents have a finite number of types nor bounds the number of links that they may form.
To characterize the local subnetworks relevant for identification analysis in their framework, <ref name="pau:shu:tam18"></ref> propose the concepts of ''network type'' and ''preference class''.
A network type <math>t=(a,v)</math> describes the local network up to distance <math>\bar{d}</math> from the reference node.
Here <math>a</math> is a square matrix of size <math>1+\bar{l}\sum_{d=1}^{\bar{d}}(\bar{l}-1)^{d-1}</math> that describes the local subnetwork that is utility relevant for an agent of type <math>t</math>.
It consists of the reference node, its direct potential neighbors (<math>\bar{l}</math> elements), its second order neighbors (<math>\bar{l}(\bar{l}-1)</math> elements), through its <math>\bar{d}</math>-th order neighbors (<math>\bar{l}(\bar{l}-1)^{\bar{d}-1}</math> elements).
The other component of the type, <math>v</math>, is a vector of length equal to the size of <math>a</math> that contains the observable characteristics of the reference node and her alters.
The bounds <math>\bar{d}</math> and <math>\bar{l}</math> enforce dimension reduction by bounding the number of network types.
The partial identification approach of <ref name="pau:shu:tam18"/> depends on this number, rather than on the number of agents.
For example, the number of moment inequalities is determined by the number of network types, not by the number of agents.
As such, the approach yields its highest dividends for dimension reduction in large networks.
Let <math>\cT</math> denote the collection of network types generated from a preference structure <math>\bu</math> and set of characteristics <math>\cX</math>.
For given realization <math>(x,e)</math> of the observable characteristics and preference shocks of a reference agent, and for given <math>\vartheta\in\Theta</math>, define the collection of network types for which no agent wants to drop a link by
<math display="block">
\begin{align*}
H_\vartheta(x,e)=\{(a,v)\in\cT:v_1=x\mathrm{and}\bu(a,v,e)\ge \bu(a_{-\ell},v,e)\forall\ell=1,\dots,\bar{l}\},
\end{align*}
</math>
where <math>a_{-\ell}</math> is equal to the local adjacency matrix <math>a</math> but with the <math>\ell</math>-th link removed (that is, it sets the <math>(1,\ell+1)</math> and <math>(\ell+1,1)</math> elements of <math>a</math> equal to zero).
Because <math>(\ex,\epsilon)</math> are random vectors, <math>\eH_\vartheta\equiv H_\vartheta(\ex,\epsilon)</math> is a random closed set as per [[guide:379e0dcd67#def:rcs |Definition]].
This random set takes on a finite number of realizations (equal to the possible subsets of <math>\cT</math>), so that its distribution is completely determined by the probability with which it takes on each of these realizations.
A ''preference class'' <math>H\subset\cT</math> is one of the possible realizations of <math>\eH_\vartheta</math> for some <math>\vartheta\in\Theta</math>.
The model implied probability that <math>\eH_\vartheta=H</math> is given by
<math display="block">
\begin{align}
\sM(H|\ex;\vartheta)\equiv\sQ_{\tilde\gamma}(\epsilon:\eH_\vartheta=H|\ex).\label{eq:model:prediction:network:class}
\end{align}
</math>
Observation of data from one network allows the researcher, under suitable restrictions on the sampling process, to learn the distribution of network types in the data (type shares), denoted <math>\sP(t)</math>.<ref group="Notes" >Full observation of the network is not required (and in practice it often does not occur). Sampling uncertainty results from it because in this model there is a continuum of agents.</ref>
For example, in a network of best friends with <math>\bar{l}=1</math> and <math>\bar{d}=2</math>, and <math>\cX=\{x^1,x^2\}</math> (e.g., a simplified framework with only two possible races), agents are either isolated or in a pair.
Network types are pairs for the agents' race and the best friend's race (with second element equal zero if the agent is isolated).
Type shares are the fraction of isolated blacks, the fraction of isolated whites, the fraction of blacks with a black best friend, the fraction of whites with a black best friend, and the fraction of whites with a white best friend.
The preference classes for a black agent are <math>H^1(b,e)=\{(b,0)\}</math>, <math>H^2(b,e)=\{(b,0),(b,b)\}</math>, <math>H^3(b,e)=\{(b,0),(b,w)\}</math>, <math>H^4(b,e)=\{(b,0),(b,w),(b,b)\}</math> (and similarly for whites).
In each case, being alone is part of the preference class, as there are no links to sever.
In the second class the agent has a preference for having a black friend, in the third class for a white friend, and in the last class for a friend of either race.
It is easy to see that the model is ''incomplete'', as for a given realization of <math>\epsilon</math> it makes multiple predictions on the agent's preference type.
<ref name="pau:shu:tam18"/> propose to map the distribution of preference classes into the observed distribution of preference types in the data through the use of ''allocation parameters'', denoted <math>\alpha_H(t)\in[0,1]</math>.
These are distinct from but play the same role as a selection mechanism, and they represent a candidate distribution for <math>t</math> given <math>\eH_\vartheta=H</math>.
The model, augmented with them, implies a probability that an agent is of network type <math>t</math>:
<math display="block">
\begin{align}
\sM(t;\vartheta,\alpha)=\frac{1}{\mu}\sum_{H\subset\cT}\mu_{v_1(t)}\sM(H|v_1(t);\vartheta)\alpha_H(t),\label{eq:model:prediction:network:2}
\end{align}
</math>
where <math>\mu_{v_1(t)}</math> is the measure of reference agents with characteristics equal to the second component of the preference type <math>t</math>, <math>\ex=v_1(t)</math>, and <math>\alpha\equiv\{\alpha_H(t):t\in \cT, H\subset\cT\}</math>.
<ref name="pau:shu:tam18"/> provide a characterization of an outer region for <math>\theta</math> based on two key implications of pairwise stability that deliver  restrictions on <math>\alpha</math>.
They also show that under some additional assumptions, this characterization yields <math>\idr{\theta}</math> <ref name="pau:shu:tam18"/>{{rp|at=Appendix B}}.
Here I focus on their more general result.
The first implication that they use is that existing links should not be dropped:
<math display="block">
\begin{align}
t\notin H\Rightarrow\alpha_H(t)=0.\label{eq:networks:2:PS1}
\end{align}
</math>
The condition in \eqref{eq:networks:2:PS1} is embodied in <math>\bar\alpha\equiv\{\alpha_H(t):t\in H, H\subset\cT\}</math>.
The second implication is that it should not be possible to establish mutually beneficial links among nodes that are far from each other.
Let <math>t^\prime</math> and <math>s^\prime</math> denote the network types that are generated if one adds a link in networks of types <math>t</math> and <math>s</math> among two nodes that are at distance at least <math>2\bar{d}</math> from each other and each have less than <math>\bar{l}</math> links.
Then the requirement is
<math display="block">
\begin{align}
\left(\sum_{H\subset\cT}\mu_{v_1(t)}\sM(H|v_1(t);\vartheta)\alpha_H(t)\one(t^\prime\in H)\right)\left(\sum_{H\subset\cT}\mu_{v_1(s)}\sM(H|v_1(s);\vartheta)\alpha_H(s)\one(s^\prime\in H)\right)=0\label{eq:networks:2:PS2}
\end{align}
</math>
In words, if a positive measure of agents of type <math>t</math> prefer <math>t^\prime</math> (i.e., <math>\alpha_H(t) > 0</math> for some <math>H</math> such that <math>t^\prime\in H</math>), there must be zero measure of type <math>s</math> individuals who prefer <math>s^\prime</math>, because otherwise the network is unstable.
<ref name="pau:shu:tam18"/> show that the conditions in \eqref{eq:networks:2:PS2} can be embodied in a square matrix <math>q</math> of size equal to the length of <math>\bar{\alpha}</math>.
The entries of <math>q</math> are constructed as follows.
Let <math>H</math> and <math>\tilde{H}</math> be two preference classes with <math>t\in H</math> and <math>s\in\tilde{H}</math>.
With some abuse of notation, let <math>q_{\alpha_H(t),\alpha_{\tilde{H}}(s)}</math> denote the element of <math>q</math> corresponding to the index of the entry in <math>\bar\alpha</math> equal to <math>\alpha_H(t)</math> for the row, and to <math>\alpha_{\tilde{H}}(s)</math> for the column.
Then set <math>q_{\alpha_H(t),\alpha_{\tilde{H}}(s)}(\vartheta)=\one(t^\prime\in H)\one(s^\prime\in\tilde{H})</math>.
It follows that this element yields the term <math>\big(\alpha_H(t)\one(t^\prime\in H)\big)\big(\alpha_{\tilde{H}}(s)\one(s^\prime\in \tilde{H})\big)</math> in the quadratic form <math>\bar{\alpha}^\top q \bar{\alpha}</math>.
As long as <math>\mu_{v_1(\cdot)}</math> and <math>\sM(\cdot|\ex;\vartheta)</math> in \eqref{eq:model:prediction:network:class} are strictly positive, this term is equal to zero if and only if condition \eqref{eq:networks:2:PS2} holds for types <math>t</math> and <math>s</math>.<ref group="Notes" >The possibility that <math>\mu_{v_1(\cdot)}</math> or <math>\sM(\cdot|\ex;\vartheta)</math> are equal to zero can be accommodated by setting <math>q_{\alpha_H(t),\alpha_{\tilde{H}}(s)}(\vartheta)=(\mu_{v_1(t)}\sM(H|v_1(t);\vartheta)\one(t^\prime\in H))(\mu_{v_1(s)}\sM(H|v_1(s);\vartheta)\one(s^\prime\in\tilde{H}))</math>. However, in that case <math>q</math> depends on <math>\vartheta</math> and its computational cost increases.</ref>
With this background, Theorem [[#OR:networks:2 |OR-]] below provides an outer region for <math>\theta</math>.
The proof of this result follows from the arguments laid out above (see <ref name="pau:shu:tam18"/>{{rp|at=Theorems 1 and 2, for the full details}}).
{{Proofcard|Theorem (Outer Region on Parameters of a Network Formation Model with a Single Network)|OR:networks:2|Under the assumptions of Identification [[#IP:networks:single |Problem]],
<math display="block">
\begin{align}
\outr{\theta}=\left\{\vartheta\in\Theta:
\left(\begin{array}{[rl]}
\min_{\bar{\alpha}} \bar{\alpha}^\top q \bar{\alpha} &  \\
s.t. & \sM(t;\vartheta,\bar{\alpha})=\sP(t) \forall t\in\cT \\
& \sum_{t\in H}\bar\alpha_H(t)=1\forall H\subset \cT \\
& \alpha_H(t)\ge 0\forall t\in H,\forall H\subset \cT 
\end{array}  \right){{=}}0
\right\}.\label{eq:OR:networks:2}
\end{align}
</math>|}}
The set in \eqref{eq:OR:networks:2} does not equal <math>\idr{\theta}</math> in all models allowed for in Identification [[#IP:networks:single |Problem]] because condition \eqref{eq:networks:2:PS2} does not embody all implications of pairwise stability on non-existing links.
While the optimization problem in \eqref{eq:OR:networks:2} is quadratic, it is not necessarily convex because <math>q</math> may not be positive definite.
Nonetheless, the simulations reported by <ref name="pau:shu:tam18"/> suggest that <math>\outr{\theta}</math> can be computed rapidly, as least for the examples they considered.
'''Key Insight:'''<i>
At the beginning of this section I highlighted some key challenges to inference in network formation models.
When data is observed from a single network, as in Identification [[#IP:networks:single |Problem]], <ref name="pau:shu:tam18"/> proposal to base inference on local networks achieves two main benefits.
First, it delivers consistently estimable features of the game, namely the probability that an agent belongs to one of a finite collection of network types.
Second, it achieves dimension reduction, so that computation of outer regions on <math>\theta</math> remains feasible even with large networks and allowing for unrestricted selection among multiple equilibria.
</i>
===<span id="subsec:applications:struct"></ref> and <ref name="pak:por:ho:ish15"/> propose to embed revealed preference-based inequalities into structural models of both demand and supply in markets where firms face discrete choices of product configuration or of location.
Revealed preference arguments are a trademark of the literature on discrete choice analysis.
<ref name="pak10"/> and <ref name="pak:por:ho:ish15"/> use these arguments to leverage a subset of the model's implications to obtain easy-to-compute moment inequalities.
For example, in the context of entry games such as the ones discussed in Section [[#subsec:multiple:eq |Static, Simultaneous-Move Finite Games with Multiple Equilibria]], they propose to base inference on the implication that a player enters the market if and only if (s)he expects to make non-negative profits.
This condition can be exploited even when players have heterogeneous (unobserved to the researcher) information sets, and it implies that the expected profits for entrants should be non-negative.
Nonetheless, the condition does not suffice to obtain moment inequalities that include only observed payoff shifters and preference parameters.
This is because the expected value of unobserved payoff shifters for entrants is not equal to zero, as the group of entrants is selected.
The authors require the availability of valid (monotone) instrumental variables to solve this problem (see Section [[guide:Ec36399528#subsec:programme:eval |Treatment Effects with and without Instrumental Variables]] for uses of instrumental variables and monotone instrumental variables in partial identification analysis of treatment effects).
Interesting features of their approach include that the researcher does not need to solve for the set of equilibria, nor to require that the distribution of unobservable payoff shifters is known up to finite dimensional parameter vector.
Moreover, the same basic ideas can be applied to single agent models (with or without heterogeneous information sets).
A shortcoming of the method is that the set of parameter vectors satisfying the moment inequalities may be wider than the sharp identification region under the maintained assumptions.
The breadth of applications of the approach proposed by <ref name="pak10"/> and <ref name="pak:por:ho:ish15"/> is vast.<ref group="Notes" >Statistical inference in these papers is often carried out using the methods proposed by {{ref|name=che:hon:tam07}}, {{ref|name=ber:mol08}}, and {{ref|name=and:soa10}}. Model specification tests, if carried out, are based on the method proposed by {{ref|name=bug:can:shi15}}. See Sections [[guide:6d1a428897#subsec:CS |Confidence Sets Satisfying Various Coverage Notions]] [[guide:7b0105e1fc#sec:misspec |and]], respectively, for a discussion of confidence sets and specification tests.</ref>
For example, <ref name="ho09"><span style="font-variant-caps:small-caps">Ho, K.</span>  (2009): “Insurer-Provider Networks in the Medical Care  Market” ''The American Economic Review'', 99(1), 393--430.</ref> uses it to model the formation of the hospital networks offered by US health insurers, and <ref name="ho:ho:mor12"><span style="font-variant-caps:small-caps">Ho, K., J.Ho,  <span style="font-variant-caps:normal">and</span> J.H. Mortimer</span>  (2012): “The Use of  Full-Line Forcing Contracts in the Video Rental Industry” ''The  American Economic Review'', 102(2), 686--719.</ref> and <ref name="lee13"><span style="font-variant-caps:small-caps">Lee, R.S.</span>  (2013): “Vertical Integration and Exclusivity in Platform  and Two-Sided Markets” ''The American Economic Review'', 103(7),  2960--3000.</ref> use it to obtain bounds on firm fixed costs as an input to modeling product choices in the movie industry and in the US video game industry, respectively.
<ref name="hol11"><span style="font-variant-caps:small-caps">Holmes, T.J.</span>  (2011): “The diffusion of Wal-mart and economies of  density” ''Econometrica'', 79(1), 253--302.</ref> estimates the effects of Wal-Mart's strategy of creating a high density network of stores.
While the close proximity of stores implies cannibalization in sales, Wal-Mart is willing to bear it to achieve density economies, which in turn yield savings in distribution costs.
His results suggest that Wal-Mart substantially benefits from high store density.
<ref name="ell:hou:tim13"><span style="font-variant-caps:small-caps">Ellickson, P.B., S.Houghton,  <span style="font-variant-caps:normal">and</span> C.Timmins</span>  (2013):  “Estimating network economies in retail chains: a revealed preference  approach” ''The RAND Journal of Economics'', 44(2), 169--193.</ref> measure the effects of chain economies, business stealing, and heterogeneous firms' comparative advantages in the discount retail industry.
<ref name="kaw:wat13"><span style="font-variant-caps:small-caps">Kawai, K.,  <span style="font-variant-caps:normal">and</span> Y.Watanabe</span>  (2013): “Inferring Strategic  Voting” ''American Economic Review'', 103(2), 624--62.</ref> estimate a model of strategic voting and quantify the impact it has on election outcomes.
As in other models analyzed in this section, the one they study yields multiple predicted outcomes, so that partial identification methods are required to carry out the empirical analysis if one does not assume a specific selection mechanism to resolve the multiplicity.
They estimate their model on Japanese general-election data, and uncover a sizable fraction of strategic voters.
They also estimate that only a small fraction of voters are misaligned (voting for a candidate other than their most preferred one).
<ref name="eiz14"><span style="font-variant-caps:small-caps">Eizenberg, A.</span>  (2014): “Upstream Innovation and Product Variety in  the U.S. Home PC Market” ''The Review of Economic Studies'', 81(3),  1003--1045.</ref> studies whether the rapid removal from the market for personal computers of existing central processing units upon creation of new ones through innovation reduces surplus.
He finds that a limited group of price-insensitive consumers enjoys the largest share of the welfare gains from innovation.
A policy that kept older technologies on the shelf would allow for the benefits from innovation to reach price-sensitive consumers thanks to improved access to mobile computing, but total welfare would not increase because consumer welfare gains would be largely offset by producer losses.
<ref name="ho:pak14"><span style="font-variant-caps:small-caps">Ho, K.,  <span style="font-variant-caps:normal">and</span> A.Pakes</span>  (2014): “Hospital Choices, Hospital  Prices, and Financial Incentives to Physicians” ''The American Economic  Review'', 104(12), 3841--3884.</ref> analyze hospital referrals for labor and birth episodes in California in 2003, for patients enrolled with six health insurers that use, to a different extent, incentives to referring physicians groups to reduce hospital costs (capitation contracts).
The aim is to learn whether enrollees with high-capitation insurers tend to be referred to lower-priced hospitals (ceteris paribus) compared to other patients with same-severity conditions, and whether quality of care was affected.
Their model allows for an insurer-specific preference function that is additively separable in the hospital price paid by the insurer (which is allowed to be measured with error), the distance traveled, and plan and severity-specific hospital fixed effects.
Importantly, unobserved heterogeneity entering the preference function is not assumed to be drawn from a distribution known up to finite dimensional parameter vector.
The results of the empirical analysis indicate that the price paid by insurers to hospitals has an impact on referrals, with higher elasticity to price for insurers whose physicians groups are more highly capitated.
<ref name="dic:mor18"><span style="font-variant-caps:small-caps">Dickstein, M.J.,  <span style="font-variant-caps:normal">and</span> E.Morales</span>  (2018): “What do  Exporters Know?” ''The Quarterly Journal of Economics'', 133(4),  1753--1801.</ref> study how the information that potential exporters have to predict the profits they will earn when serving a foreign market influences their decisions to export.
They propose a model where the researcher specifies and observes a subset of the variables that agents use to form their expectations, but may not observe other variables that affect firms' expectations heterogeneously (across firms and markets, and over time).
Because only a subset of the variables entering the firms' information set is observed, partial identification results.
They show that, under rational expectations, they can test whether potential exporters know and use specific variables to predict their export profits.
They also use their model's estimates to quantify the value of information.
<ref name="wol18"><span style="font-variant-caps:small-caps">Wollmann, T.G.</span>  (2018): “Trucks without Bailouts: Equilibrium  Product Characteristics for Commercial Vehicles” ''American Economic  Review'', 108(6), 1364--1406.</ref> studies the implications of the \$85 billion automotive industry bailout in 2009 on the commercial vehicle segment.
He finds that had Chrysler and GM been liquidated (or aquired by a major competitor) rather than bailed out, the surviving firms would have experienced a rise in profits high enough to induce them to introduce new products.
A different use of revealed preference arguments appears in the contributions of <ref name="blu:bro:cra08"><span style="font-variant-caps:small-caps">Blundell, R., M.Browning,  <span style="font-variant-caps:normal">and</span> I.Crawford</span>  (2008): “Best  Nonparametric Bounds on Demand Responses” ''Econometrica'', 76(6),  1227--1262.</ref>, <ref name="blu:kri:mat14"><span style="font-variant-caps:small-caps">Blundell, R., D.Kristensen,  <span style="font-variant-caps:normal">and</span> R.Matzkin</span>  (2014):  “Bounding quantile demand functions using revealed preference  inequalities” ''Journal of Econometrics'', 179(2), 112 -- 127.</ref>, <ref name="hod:sto14"><span style="font-variant-caps:small-caps">Hoderlein, S.,  <span style="font-variant-caps:normal">and</span> J.Stoye</span>  (2014): “Revealed Preferences  in a Heterogeneous Population” ''Review of Economics and Statistics'',  96(2), 197--213.</ref><ref name="hod:sto15"><span style="font-variant-caps:small-caps">Hoderlein, S.,  <span style="font-variant-caps:normal">and</span> J.Stoye</span>  (2015): “Testing stochastic rationality and predicting  stochastic demand: the case of two goods” ''Economic Theory Bulletin'',  3(2), 313–328.</ref>, <ref name="man14"><span style="font-variant-caps:small-caps">Manski, C.F.</span>  (2014): “Identification of income–leisure preferences  and evaluation of income tax policy” ''Quantitative Economics'', 5(1),  145--174.</ref>, <ref name="bar:mol:tei16"><span style="font-variant-caps:small-caps">Barseghyan, L., F.Molinari,  <span style="font-variant-caps:normal">and</span> J.C. Teitelbaum</span>  (2016):  “Inference under stability of risk preferences” ''Quantitative  Economics'', 7(2), 367--409.</ref>, <ref name="hau:new16"><span style="font-variant-caps:small-caps">Hausman, J.A.,  <span style="font-variant-caps:normal">and</span> W.K. Newey</span>  (2016): “Individual  Heterogeneity and Average Welfare” ''Econometrica'', 84(3), 1225--1248.</ref>, <ref name="ada19"><span style="font-variant-caps:small-caps">Adams, A.</span>  (2019): “Mutually Consistent Revealed Preference Demand  Predictions” ''American Economic Journal: Microeconomics'', forthcoming.</ref>, and many others.
For example, <ref name="man14"/> proposes a method to partially identify income-leisure preferences and to evaluate the associated effects of tax policies.
He starts from basic revealed-preference analysis performed under the assumption that individuals prefer more income and leisure, and no other restriction.
The analysis shows that observing an individual's time allocation under a status quo tax policy yields bounds on his allocation that may or may not be informative, depending on how the person allocates his time under the status quo policy and on the tax schedules.
He then explores what more can be learned if one additionally imposes restrictions on the distribution of income-leisure preferences, using the method put forward by <ref name="man07b"/>.
One assumption restricts groups of individuals facing different choice sets to have the same distribution of preferences.
The other assumption restricts this distribution to a parametric family.
<ref name="kli:tar16"><span style="font-variant-caps:small-caps">Kline, P.,  <span style="font-variant-caps:normal">and</span> M.Tartari</span>  (2016): “Bounding the Labor  Supply Responses to a Randomized Welfare Experiment: A Revealed Preference  Approach” ''American Economic Review'', 106(4), 972--1014.</ref> build on and expand <ref name="man14"/>'s framework to evaluate the effect of Connecticut's Jobs First welfare reform experiment on women' labor supply and welfare participation decisions.
<ref name="bar:mol:tei16"/> propose a method to learn features of households' risk preferences in a random utility model that nests expected utility theory plus a range of non-expected utility models.<ref group="Notes" >Their model is based on the one  put forward by {{ref|name=bar:mol:odo:tei13}}. See {{ref|name=bar:mol:odo:tei18}} for a review of these and other non-expected utility models in the context of estimation of risk preferences.</ref>
They allow for unobserved heterogeneity in preferences (that may enter the utility function non-separably) and leave completely unspecified their distribution.
The authors use revealed preference arguments to infer, for each household, a set of values for its unobserved heterogeneity terms that are consistent with the household's choices in the three lines of insurance coverage.
As their core restriction, they assume that each household's preferences are ''stable'' across contexts: the household's utility function is the same when facing distinct but closely related choice problems.
This allows them to use the inferred set valued data to partially identify features of the distribution of preferences, and to classify households into preference types.
They apply their proposed method to analyze data on households' deductible choices across three lines of insurance coverage (home all perils, auto collision, and auto comprehensive).<ref group="Notes" >Auto collision coverage pays for damage to the insured vehicle caused by a collision with another vehicle or object, without regard to fault.
Auto comprehensive coverage pays for damage to the insured vehicle from all other causes, without regard to fault.
Home all perils (or simply home) coverage pays for damage to the insured home from all causes, except those that are specifically excluded (e.g., flood, earthquake, or war).</ref>
Their results show that between 70 and 80 percent of the households make choices that can be rationalized by a model with linear utility and monotone, quadratic, or even linear probability distortions.
These probability distortions substantially overweight small probabilities.
By contrast, fewer than 40 percent can be rationalized by a model with concave utility but no probability distortions.
<ref name="hau:new16"/> propose a method to carry out demand analysis while allowing for general forms of unobserved heterogeneity.
Preferences and linear budget sets are assumed to be statistically independent (conditional on covariates and control functions).
<ref name="hau:new16"/> show that for continuous demand, average surplus
is generally not identified from the distribution of demand for a given
price and income, and therefore propose a partial identification approach.
They use bounds on income effects to derive bounds on average surplus. They apply the bounds to gasoline demand, using data from the
2001 U.S. National Household Transportation Survey.
Another strand of empirical applications pertains to the analysis of discrete games.
<ref name="cil:tam09"/> use the method they develop, described in Section [[#subsubsec:tam03:cil:tam09 |An Inference Approach Robust to the Presence of Multiple Equilibria]], to study market structure in the US airline industry and the role that firm heterogeneity plays in shaping it.
Their findings suggest that the competitive effects of each carrier increase in that carrier's airport presence, but also that the competitive effects of large carriers (American, Delta, United) are different from those of low cost ones (Southwest).
They also evaluate the effect of a counterfactual policy repealing the Wright Amendment, and find that doing so would see an increase in the number of markets served out of Dallas Love.
<ref name="gri14"/> proposes a model of static entry that extends the one in Section [[#subsec:multiple:eq |Static, Simultaneous-Move Finite Games with Multiple Equilibria]] by allowing individuals to have flexible information structures, where players's payoffs depend on both a common-knowledge unobservable payoff shifter, and a private-information one.
His characterization of <math>\idr{\theta}</math> is based on using an unrestricted selection mechanism, as in <ref name="ber:tam06"/> and <ref name="cil:tam09"/>.
He applies the model to study the impact of supercenters such as Wal-Mart, that sell both food and groceries, on the profitability of rural grocery stores.
He finds that entry by a supercenter outside, but within 20 miles, of a local monopolist's market has a smaller impact on firm profits than entry by a local grocer. 
Their entrance has a small negative effect on the number of grocery stores in surrounding markets as well as on their profits.
The results suggest that location and format-based differentiation  partially insulate rural stores from competition with supercenters. 
A larger class of information structures is considered in the analysis of static discrete games carried out by <ref name="mag:ron17"/>.
They allow for all information structures consistent with the players knowing their own payoffs and the distribution of opponents' payoffs.
As solution concept they adopt the Bayes Correlated Equilibrium recently developed by <ref name="ber:mor16"/>.
Also with this solution concept multiple equilibria are possible.
The authors leave completely unspecified the selection mechanism picking the equilibrium played in the regions of multiplicity, so that partial identification attains.
<ref name="mag:ron17"/> use the random sets approach to characterize <math>\idr{\theta}</math>.
They apply the method to estimate a model of entry in the Italian supermarket industry and quantify the effect of large malls on local
grocery stores.
<ref name="nor:tan14"><span style="font-variant-caps:small-caps">Norets, A.,  <span style="font-variant-caps:normal">and</span> X.Tang</span>  (2014): “{Semiparametric Inference  in Dynamic Binary Choice Models}” ''The Review of Economic Studies'',  81(3), 1229--1262.</ref> provide partial identification results (and Bayesian inference methods) for semiparametric dynamic binary choice models without imposing distributional assumptions on the unobserved state variables.
They carry out an empirical application using <ref name="rus87"><span style="font-variant-caps:small-caps">Rust, J.</span>  (1987): “Optimal Replacement of GMC Bus Engines: An  Empirical Model of Harold Zurcher” ''Econometrica'', 55(5), 999--1033.</ref>'s model of bus engine replacement.
Their results suggest that parametric assumptions about the distribution of the unobserved states can have a considerable effect on the estimates of per-period payoffs, but not a noticeable one on the counterfactual conditional choice probabilities.
<ref name="ber:com19"><span style="font-variant-caps:small-caps">Berry, S.T.,  <span style="font-variant-caps:normal">and</span> G.Compiani</span>  (2019): “An Instrumental  Variable Approach to Dynamic Models” available at  [https://drive.google.com/file/d/1pl1PW1w8eh3gnrTMKUBuS6T6TIKtvf9c/view https://drive.google.com/file/d/1pl1PW1w8eh3gnrTMKUBuS6T6TIKtvf9c/view].</ref> use the random sets approach to partially identify and estimate dynamic discrete choice models with serially correlated unobservables, under instrumental variables restrictions.
They extend two-step dynamic estimation methods to characterize a set of structural parameters that are consistent with the dynamic model, the instrumental variables restrictions, and the data.<ref group="Notes" >Statistical inference on <math>\theta</math> is carried out using {{ref|name=che:che:kat18}}'s method.</ref>
<ref name="gua19"/> uses the random sets approach and a network formation model, to learn about Italian firms' incentives for having their executive directors sitting on the board of their competitors.
<ref name="bar:cou:mol:tei18"/> use the method described in Section [[#subsubsec:BCMT |Unobserved Heterogeneity in Choice Sets and/or Consideration Sets]] to partially identify the distribution of risk preferences using data on deductible choices in auto collision insurance.<ref group="Notes" >Statistical inference on projections of <math>\theta</math> is carried out using {{ref|name=kai:mol:sto19}}'s method.</ref>
They posit an expected utility theory model and allow for unobserved heterogeneity in households' risk aversion and choice sets, with unrestricted dependence between them.
Motivation for why unobserved heterogeneity in choice sets might be an important factor in this empirical framework comes from the earlier analysis of <ref name="bar:mol:tei16"/> and novel findings that are part of <ref name="bar:cou:mol:tei18"/> contribution.
They show that commonly used models that make strong assumptions about choice sets (e.g., the mixed logit model with each individual's choice set assumed equal to the feasible set, and various models of choice set formation) can be rejected in their data.
With regard to risk aversion, their key finding is that their estimated lower bounds are significantly smaller than the point estimates obtained in the related literature.
This suggests that the data can be explained by expected utility theory with lower and more homogeneous levels of risk aversion than it had been uncovered before.
This provides new evidence on the importance of developing models that differ in their specification of ''which'' alternatives agents evaluate (rather than or in addition to models focusing on ''how'' they evaluate them), and to data collection efforts that seek to directly measure agents' heterogeneous choice sets <ref name="cap16"></ref>.
<ref name="iar:shi:shu18"><span style="font-variant-caps:small-caps">Iaryczower, M., X.Shi,  <span style="font-variant-caps:normal">and</span> M.Shum</span>  (2018): “Can Words Get  in the Way? The Effect of Deliberation in Collective Decision Making”  ''Journal of Political Economy'', 126(2), 688--734.</ref> study the effect of pre-vote deliberation on the decisions of US appellate courts.
The question of interest is weather deliberation increases or reduces the probability of an incorrect decision.
They use a model where communication equilibrium is the solution concept, and only observed heterogeneity in payoffs is allowed for.
In the model, multiple equilibria are again possible, and the authors leave the selection mechanism completely unspecified.
They characterize <math>\idr{\theta}</math> through an optimization problem, and structurally estimate the model on US Courts of Appeal data.
<ref name="iar:shi:shu18"/> compare the probability of making incorrect decisions under the pre-vote deliberation mechanism, to that in a counterfactual environment where no deliberation occurs.
The results suggest that there is a range of parameters in <math>\idr{\theta}</math>, for which judges have ex-ante disagreement of imprecise prior information, for which deliberation is beneficial.
Otherwise deliberation leads to lower effectiveness for the court.
<ref name="dha:gai:mau18"><span style="font-variant-caps:small-caps">D'Haultfoeuille, X., C.Gaillac,  <span style="font-variant-caps:normal">and</span> A.Maurel</span>  (2018):  “Rationalizing Rational Expectations? Tests and Deviations” NBER working  paper 25274, available at [https://www.nber.org/papers/w25274 https://www.nber.org/papers/w25274].</ref> propose a test for the hypothesis of rational expectations for the case that one observes only the marginal distributions of realizations and subjective beliefs, but not their joint distribution (e.g., when subjective beliefs are observed in one dataset, and realizations in a different one, and the two cannot be matched).
They establish that the hypothesis of rational expectations can be expressed as testing that a continuum of moment inequalities is satisfied, and they leverage the results in <ref name="and:shi17"><span style="font-variant-caps:small-caps">Andrews, D. W.K.,  <span style="font-variant-caps:normal">and</span> X.Shi</span>  (2017): “Inference based on many conditional moment  inequalities” ''Journal of Econometrics'', 196(2), 275 -- 287.</ref> to provide a simple-to-compute test for this hypothesis.
They apply their method to test for and quantify deviations from rational expectations about future earnings, and examine the consequences of such departures in the context of a life-cycle model of consumption.
<ref name="teb:tor:yan19"><span style="font-variant-caps:small-caps">Tebaldi, P., A.Torgovitsky,  <span style="font-variant-caps:normal">and</span> H.Yang</span>  (2019):  “Nonparametric Estimates of Demand in the California Health Insurance  Exchange” NBER Working Paper No. 25827, available at  [https://www.nber.org/papers/w25827 https://www.nber.org/papers/w25827].</ref> estimate the demand for health insurance under the Affordable Care Act using data from California.
Methodologically, they use a discrete choice model that allows for endogeneity in insurance premiums (which enter as explanatory variables in the model) and dispenses with parametric assumptions about the unobserved components of utility leveraging the availability of instrumental variables, similarly to the framework presented in Section [[#subsubsec:CRS |Endogenous Explanatory Variables]].
The authors provide a characterization of sharp bounds on the effects of changing premium subsidies on coverage choices, consumer surplus, and government spending, as solutions to linear programming problems, rendering their method computationally attractive.
Another important strand of theoretical literature is concerned with partial identification of panel data models.
<ref name="hon:tam06"><span style="font-variant-caps:small-caps">Honoré, B.E.,  <span style="font-variant-caps:normal">and</span> E.Tamer</span>  (2006): “Bounds on Parameters  in Panel Dynamic Discrete Choice Models” ''Econometrica'', 74(3),  611--629.</ref> consider a dynamic random effects probit model, and use partial identification analysis to obtain bounds on the model parameters that circumvent the initial conditions problem.
<ref name="ros12"><span style="font-variant-caps:small-caps">Rosen, A.M.</span>  (2012): “Set identification via quantile restrictions in  short panels” ''Journal of Econometrics'', 166(1), 127 -- 137.</ref> considers a fixed effect panel data model where he imposes a conditional quantile restriction on time varying unobserved heterogeneity.
Differencing out inequalities resulting from the conditional quantile restriction delivers inequalities that depend only on observable variables and parameters to be estimated, but not on the fixed effects, so that they can be used for estimation.
<ref name="che:fer:hah:new13"><span style="font-variant-caps:small-caps">Chernozhukov, V., I.Fernández-Val, J.Hahn,  <span style="font-variant-caps:normal">and</span> W.Newey</span>  (2013): “Average and quantile effects in nonseparable panel models”  ''Econometrica'', 81(2), 535--580.</ref> obtain bounds on average and quantile treatment effects in nonparametric and semiparametric nonseparable panel data models.
<ref name="kha:pon:tam16"><span style="font-variant-caps:small-caps">Khan, S., M.Ponomareva,  <span style="font-variant-caps:normal">and</span> E.Tamer</span>  (2016):  “Identification of panel data models with endogenous censoring”  ''Journal of Econometrics'', 194(1), 57 -- 75.</ref> provide partial identification results in linear panel data models when censored outcomes, with unrestricted dependence between censoring and observable and unobservable variables.
Their results are derived for two classes of models, one where the unobserved heterogeneity terms satisfy a stationarity restriction, and one where they are nonstationary but satisfy a conditional independence restriction.
<ref name="tor19"><span style="font-variant-caps:small-caps">Torgovitsky, A.</span>  (2019a): “Nonparametric Inference on State  Dependence in Unemployment” ''Econometrica'', forthcoming.</ref> provides a method to partially identify state dependence in panel data models where individual unobserved heterogeneity needs not be time invariant.
<ref name="pak:por16"><span style="font-variant-caps:small-caps">Pakes, A.,  <span style="font-variant-caps:normal">and</span> J.Porter</span>  (2016): “Moment Inequalities for  Multinomial Choice with Fixed Effects” Working Paper 21893, National Bureau  of Economic Research.</ref> study semiparametric multinomial choice panel models with fixed effects where the random utility function is assumed additively separable in unobserved heterogeneity, fixed effects, and a linear covariate index.
The key semiparametric assumption is a group stationarity condition on the disturbances which places no restrictions on either the joint distribution of the disturbances across choices or the correlation of disturbances across time. <ref name="pak:por16"/> propose a within-group comparison that delivers a collection of conditional moment inequalities that they use to provide point and partial identification results.
<ref name="ari19"><span style="font-variant-caps:small-caps">Aristodemou, E.</span>  (2019): “Semiparametric Identification in Panel Data  Discrete Response Models” available at  [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3420016 https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3420016].</ref> proposes a related method, where partial identification relies on the observation of individuals whose outcome changes in two consecutive time periods, and leverages shape restrictions to reduce the number of between alternatives comparisons needed to determine the optimal choice.
==General references==
{{cite arXiv|last1=Molinari|first1=Francesca|year=2020|title=Microeconometrics with Partial Identification|eprint=2004.11751|class=econ.EM}}
==Notes==
{{Reflist|group=Notes}}
==References==
{{reflist}}

Latest revision as of 03:23, 31 May 2024

[math] \newcommand{\edis}{\stackrel{d}{=}} \newcommand{\fd}{\stackrel{f.d.}{\rightarrow}} \newcommand{\dom}{\operatorname{dom}} \newcommand{\eig}{\operatorname{eig}} \newcommand{\epi}{\operatorname{epi}} \newcommand{\lev}{\operatorname{lev}} \newcommand{\card}{\operatorname{card}} \newcommand{\comment}{\textcolor{Green}} \newcommand{\B}{\mathbb{B}} \newcommand{\C}{\mathbb{C}} \newcommand{\G}{\mathbb{G}} \newcommand{\M}{\mathbb{M}} \newcommand{\N}{\mathbb{N}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\T}{\mathbb{T}} \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\W}{\mathbb{W}} \newcommand{\bU}{\mathfrak{U}} \newcommand{\bu}{\mathfrak{u}} \newcommand{\bI}{\mathfrak{I}} \newcommand{\cA}{\mathcal{A}} \newcommand{\cB}{\mathcal{B}} \newcommand{\cC}{\mathcal{C}} \newcommand{\cD}{\mathcal{D}} \newcommand{\cE}{\mathcal{E}} \newcommand{\cF}{\mathcal{F}} \newcommand{\cG}{\mathcal{G}} \newcommand{\cg}{\mathcal{g}} \newcommand{\cH}{\mathcal{H}} \newcommand{\cI}{\mathcal{I}} \newcommand{\cJ}{\mathcal{J}} \newcommand{\cK}{\mathcal{K}} \newcommand{\cL}{\mathcal{L}} \newcommand{\cM}{\mathcal{M}} \newcommand{\cN}{\mathcal{N}} \newcommand{\cO}{\mathcal{O}} \newcommand{\cP}{\mathcal{P}} \newcommand{\cQ}{\mathcal{Q}} \newcommand{\cR}{\mathcal{R}} \newcommand{\cS}{\mathcal{S}} \newcommand{\cT}{\mathcal{T}} \newcommand{\cU}{\mathcal{U}} \newcommand{\cu}{\mathcal{u}} \newcommand{\cV}{\mathcal{V}} \newcommand{\cW}{\mathcal{W}} \newcommand{\cX}{\mathcal{X}} \newcommand{\cY}{\mathcal{Y}} \newcommand{\cZ}{\mathcal{Z}} \newcommand{\sF}{\mathsf{F}} \newcommand{\sM}{\mathsf{M}} \newcommand{\sG}{\mathsf{G}} \newcommand{\sT}{\mathsf{T}} \newcommand{\sB}{\mathsf{B}} \newcommand{\sC}{\mathsf{C}} \newcommand{\sP}{\mathsf{P}} \newcommand{\sQ}{\mathsf{Q}} \newcommand{\sq}{\mathsf{q}} \newcommand{\sR}{\mathsf{R}} \newcommand{\sS}{\mathsf{S}} \newcommand{\sd}{\mathsf{d}} \newcommand{\cp}{\mathsf{p}} \newcommand{\cc}{\mathsf{c}} \newcommand{\cf}{\mathsf{f}} \newcommand{\eU}{{\boldsymbol{U}}} \newcommand{\eb}{{\boldsymbol{b}}} \newcommand{\ed}{{\boldsymbol{d}}} \newcommand{\eu}{{\boldsymbol{u}}} \newcommand{\ew}{{\boldsymbol{w}}} \newcommand{\ep}{{\boldsymbol{p}}} \newcommand{\eX}{{\boldsymbol{X}}} \newcommand{\ex}{{\boldsymbol{x}}} \newcommand{\eY}{{\boldsymbol{Y}}} \newcommand{\eB}{{\boldsymbol{B}}} \newcommand{\eC}{{\boldsymbol{C}}} \newcommand{\eD}{{\boldsymbol{D}}} \newcommand{\eW}{{\boldsymbol{W}}} \newcommand{\eR}{{\boldsymbol{R}}} \newcommand{\eQ}{{\boldsymbol{Q}}} \newcommand{\eS}{{\boldsymbol{S}}} \newcommand{\eT}{{\boldsymbol{T}}} \newcommand{\eA}{{\boldsymbol{A}}} \newcommand{\eH}{{\boldsymbol{H}}} \newcommand{\ea}{{\boldsymbol{a}}} \newcommand{\ey}{{\boldsymbol{y}}} \newcommand{\eZ}{{\boldsymbol{Z}}} \newcommand{\eG}{{\boldsymbol{G}}} \newcommand{\ez}{{\boldsymbol{z}}} \newcommand{\es}{{\boldsymbol{s}}} \newcommand{\et}{{\boldsymbol{t}}} \newcommand{\ev}{{\boldsymbol{v}}} \newcommand{\ee}{{\boldsymbol{e}}} \newcommand{\eq}{{\boldsymbol{q}}} \newcommand{\bnu}{{\boldsymbol{\nu}}} \newcommand{\barX}{\overline{\eX}} \newcommand{\eps}{\varepsilon} \newcommand{\Eps}{\mathcal{E}} \newcommand{\carrier}{{\mathfrak{X}}} \newcommand{\Ball}{{\mathbb{B}}^{d}} \newcommand{\Sphere}{{\mathbb{S}}^{d-1}} \newcommand{\salg}{\mathfrak{F}} \newcommand{\ssalg}{\mathfrak{B}} \newcommand{\one}{\mathbf{1}} \newcommand{\Prob}[1]{\P\{#1\}} \newcommand{\yL}{\ey_{\mathrm{L}}} \newcommand{\yU}{\ey_{\mathrm{U}}} \newcommand{\yLi}{\ey_{\mathrm{L}i}} \newcommand{\yUi}{\ey_{\mathrm{U}i}} \newcommand{\xL}{\ex_{\mathrm{L}}} \newcommand{\xU}{\ex_{\mathrm{U}}} \newcommand{\vL}{\ev_{\mathrm{L}}} \newcommand{\vU}{\ev_{\mathrm{U}}} \newcommand{\dist}{\mathbf{d}} \newcommand{\rhoH}{\dist_{\mathrm{H}}} \newcommand{\ti}{\to\infty} \newcommand{\comp}[1]{#1^\mathrm{c}} \newcommand{\ThetaI}{\Theta_{\mathrm{I}}} \newcommand{\crit}{q} \newcommand{\CS}{CS_n} \newcommand{\CI}{CI_n} \newcommand{\cv}[1]{\hat{c}_{n,1-\alpha}(#1)} \newcommand{\idr}[1]{\mathcal{H}_\sP[#1]} \newcommand{\outr}[1]{\mathcal{O}_\sP[#1]} \newcommand{\idrn}[1]{\hat{\mathcal{H}}_{\sP_n}[#1]} \newcommand{\outrn}[1]{\mathcal{O}_{\sP_n}[#1]} \newcommand{\email}[1]{\texttt{#1}} \newcommand{\possessivecite}[1]{\citeauthor{#1}'s \citeyear{#1}} \newcommand\xqed[1]{% \leavevmode\unskip\penalty9999 \hbox{}\nobreak\hfill \quad\hbox{#1}} \newcommand\qedex{\xqed{$\triangle$}} \newcommand\independent{\perp\!\!\!\perp} \DeclareMathOperator{\Int}{Int} \DeclareMathOperator{\conv}{conv} \DeclareMathOperator{\cov}{Cov} \DeclareMathOperator{\var}{Var} \DeclareMathOperator{\Sel}{Sel} \DeclareMathOperator{\Bel}{Bel} \DeclareMathOperator{\cl}{cl} \DeclareMathOperator{\sgn}{sgn} \DeclareMathOperator{\essinf}{essinf} \DeclareMathOperator{\esssup}{esssup} \newcommand{\mathds}{\mathbb} \renewcommand{\P}{\mathbb{P}} [/math]

In this section I focus on the literature concerned with learning features of structural econometric models. These are models where economic theory is used to postulate relationships among observable outcomes [math]\ey[/math], observable covariates [math]\ex[/math], and unobservable variables [math]\nu[/math]. For example, economic theory may guide assumptions on economic behavior (e.g., utility maximization) and equilibrium that yield a mapping from [math](\ex,\nu)[/math] to [math]\ey[/math]. The researcher is interested in learning features of these relationships (e.g., utility function, distribution of preferences), and to this end may supplement the data and economic theory with functional form assumptions on the mapping of interest and distributional assumptions on the observable and unobservable variables. The earlier literature on partial identification of features of structural models includes important examples of nonparametric analysis of random utility models and revealed preference extrapolation, e.g. [1], [2], [3], [4], [5], [6], and others.

The earlier literature also addresses semiparametric analysis, where the underlying models are specified up to parameters that are finite dimensional (e.g., preference parameters) and parameters that are infinite dimensional (e.g., distribution functions); important examples include [7], [8], [9](Section 2.10), [10], [11], [12], [13], [14], [15], [16], [17], [18], and others. Contrary to the nonparametric bounds results discussed in Section, and especially in the case of semiparametric models, structural partial identification often yields an identification region that is not constructive.[Notes 1] Indeed, the boundary of the set is not obtained in closed form as a functional of the distribution of the observable data. Rather, the identification region can often be characterized as a level set of a properly specified criterion function. The recent spark of interest in partial identification of structural microeconometric models was fueled by the work of [19], [20] and [21], and [22]. Each of these papers has advanced the literature in fundamental ways, studying conceptually very distinct problems. [19] are concerned with partial identification of the decision process yielding binary outcomes in a semiparametric model, when one of the explanatory variables is interval valued.

Hence, the root cause of the identification problem they study is that the data is incomplete.[Notes 2]

[20] and [21] are concerned with identification (and estimation) of simultaneous equation models with dummy endogeneous variables which are representations of two-player entry games with multiple equilibria.[Notes 3] [22] are concerned with nonparametric identification and estimation of the distribution of valuations in a model of English auctions under weak assumptions on bidders' behavior. In both cases, the root cause of the identification problem is that the structural model is incomplete. This is because the model makes multiple predictions for the observed outcome variables (respectively: the players' actions; and the bidders' bids), but does not specify how one of them is selected to yield the observed data.

Set-valued predictions for the observable outcome (endogenous variables) are a key feature of partially identified structural models. The goal of this section is to explain how they result in a wide array of theoretical frameworks, and how sharp identification regions can be characterized using a unified approach based on random set theory. Although the work of [19], [20] and [21], and [22] has spurred many of the developments discussed in this section, for pedagogical reasons I organize the presentation based on application topic rather than chronologically. The work of [23] and [24] further stimulated a large empirical literature that applies partial identification methods to a wide array of questions of substantive economic importance, to which I return in Section Further Theoretical Advances and Empirical Applications.

Discrete Choice in Single Agent Random Utility Models

Let [math]\cI[/math] denote a population of decision makers and [math]\cY=\{c_1,\dots,c_{|\cY|}\}[/math] a finite universe of potential alternatives (feasible set henceforth). Let [math]\bU[/math] be a family of real valued functions defined over the elements of [math]\cY[/math]. Let [math]\in^* [/math] denote “is chosen from.” Then observed choice is consistent with a ’'random utility model if there exists a function [math]\bu_i[/math] drawn from [math]\bU[/math] according to some probability distribution, such that [math]\P(c \in^* C)=\P(\bu_i(c) \ge \bu_i(b)\forall b \in C)[/math] for all [math]c\in C[/math], all non empty sets [math]C \subset \cY[/math], and all [math]i\in\cI[/math] [1]. See [25](Chapter 13) for a textbook presentation of this class of models, and [26] for a review of sufficient conditions for point identification of nonparametric and semiparametric limited dependent variables models. As in the seminal work of [27], assume that the decision makers and alternatives are characterized by observable and unobservable vectors of real valued attributes. Denote the observable attributes by [math]\ex_i \equiv \{\ex_i^1,(\ex_{ic}^2,c\in\cY)\},i\in\cI[/math]. These include attribute vectors [math]\ex_i^1[/math] that are specific to the decision maker, as well as attribute vectors [math]\ex_{ic}^2[/math] that include components that are specific to the alternative and components that are indexed by both. Denote the unobservable attributes (preferences) by [math]\nu_i\equiv(\zeta_i,\{\epsilon_{ic},c\in\cY\}),i\in\cI[/math]. These are idiosyncratic to the decision maker and similarly may include alternative and decision maker specific terms. Denote [math]\cX,\cV[/math] the supports of [math]\ex,\nu[/math], respectively. In what follows, I label “standard” a random utility model that maintains some form of exogeneity for [math]\ex_i[/math] (e.g., mean or quantile or statistical independence with [math]\nu_i[/math]) and presupposes observation of data that include [math]\{(\eC_i,\ey_i,\ex_i):\ey_i \in^* \eC_i\}, i=1,\dots,n[/math], with [math]\eC_i[/math] the choice set faced by decision maker [math]i[/math] and [math]|\eC_i|\ge 2[/math] (e.g., [28](Assumption 1)). Often it is also assumed that all members of the population face the same choice set, [math]\eC_i=D[/math] for all [math]i\in\cI[/math] and some known [math]D\subseteq\cY[/math], although this requirement is not critical to identification analysis.

Semiparametric Binary Choice Models with Interval Valued Covariates

[19] provide inference methods for nonparametric, semiparametric, and parametric conditional expectation functions when one of the conditioning variables is interval valued. I have discussed their nonparametric and parametric sharp bounds on conditional expectations with interval valued covariates in Identification Problems and, and Theorems SIR- and SIR-, respectively. Here I focus on their analysis of semiparametric binary choice models. Compared to the generic notation set forth at the beginning of Section Discrete Choice in Single Agent Random Utility Models, I let [math]\eC_i=\cY=\{0,1\}[/math] for all [math]i\in\cI[/math], and with some abuse of notation I denote the vector of observed covariates [math](\xL,\xU,\ew)[/math].

Identification Problem (Semiparametric Binary Regression with Interval Covariate Data)

Let [math](\ey,\xL,\xU,\ew)\sim\sP[/math] be observable random variables in [math]\{0,1\}\times\R\times\R\times\R^d[/math], [math]d \lt \infty[/math], and let [math]\ex\in\R[/math] be an unobservable random variable. Let [math]\ey=\one(\ew\theta + \delta\ex +\epsilon \gt 0)[/math]. Assume [math]\delta \gt 0[/math], and further normalize [math]\delta=1[/math] because the threshold-crossing condition is invariant to the scale of the parameters. Here [math]\epsilon[/math] is an unobserved heterogeneity term with continuous distribution conditional on [math](\ew,\ex,\xL,\xU)[/math], [math](\ew,\ex,\xL,\xU)[/math]-a.s., and [math]\theta\in\Theta\subset\R^d[/math] is a parameter vector representing decision makers’ preferences, with compact parameter space [math]\Theta[/math]. Assume that [math]\sR[/math], the joint distribution of [math](\ey,\ex,\xL,\xU,\ew,\epsilon)[/math], is such that [math]\sR(\xL\le\ex\le\xU)=1[/math]; [math] \sR(\epsilon |\ew,\ex,\xL,\xU)=\sR(\epsilon|\ew,\ex)[/math]; and for a specified [math]\alpha \in (0,1)[/math], [math]\sq_{\sR}^\epsilon(\alpha,\ew,\ex)=0[/math] and [math]\sR(\epsilon \le 0|\ew,\ex)=\alpha[/math], [math](\ew,\ex)[/math]-a.s.. In the absence of additional information, what can the researcher learn about [math]\theta[/math]?


Compared to Identification Problem, here one continues to impose [math]\ex\in[\xL,\xU][/math] a.s. The sign restriction on [math]\delta[/math] replaces the monotonicity restriction (M) in Identification Problem, but does not imply it unless the distribution of [math]\epsilon[/math] is independent of [math]\ex[/math] conditional on [math]\ew[/math]. The quantile independence restriction is inspired by [29]. For given [math]\theta\in\Theta[/math], this model yields set valued predictions because [math]\ey=1[/math] can occur whenever [math]\epsilon \gt -\ew\theta-\xU[/math], whereas [math]\ey=0[/math] can occur whenever [math]\epsilon\le -\ew\theta-\xL[/math], and [math]-\ew\theta-\xU \le -\ew\theta-\xL[/math]. Conversely, observation of [math]\ey=1[/math] allows one to conclude that [math]\epsilon\in(-\ew\theta-\xU,+\infty)[/math], whereas observation of [math]\ey=0[/math] allows one to conclude that [math]\epsilon\in(-\infty,-\ew\theta-\xL][/math], and these regions of possible realizations of [math]\epsilon[/math] overlap. In contrast, when [math]\ex[/math] is observed the prediction is unique because the value [math]-\ew\theta-\ex[/math] partitions the space of realizations of [math]\epsilon[/math] in two disjoint sets, one associated with [math]\ey=1[/math] and the other with [math]\ey=0[/math]. Figure depicts the model's set-valued predictions for [math]\ey[/math] given [math](\ew,\xL,\xU)[/math] as a function of [math]\epsilon[/math], and the model's set valued predictions for [math]\epsilon[/math] given [math](\ew,\xL,\xU)[/math] as a function of [math]\ey[/math].[Notes 4] Why does this set-valued prediction hinder point identification? The reason is that the distribution of the observable data relates to the model structure in an incomplete manner. The model predicts [math]\sM(\ey=1|\ew,\xL,\xU)=\int \sR(\ey=1|\ew,\ex,\xL,\xU)d\sR(\ex|\ew,\xL,\xU)=\int \sR(\epsilon \gt -\ew\theta-\ex|\ew,\ex)d\sR(\ex|\ew,\xL,\xU),(\ew,\xL,\xU)[/math]-a.s. Because the distribution [math]\sR(\ex|\ew,\xL,\xU)[/math] is left completely unspecified, one can find multiple values for [math](\theta,\sR(\ex|\ew,\xL,\xU),\sR(\epsilon|\ew,\ex))[/math], satisfying the assumptions in Identification Problem, such that [math]\sM(\ey=1|\ew,\xL,\xU)=\sP(\ey=1|\ew,\xL,\xU),(\ew,\xL,\xU)[/math]-a.s. Nonetheless, in general, not all values of [math]\theta\in\Theta[/math] can be paired with some [math]\sR(\ex|\ew,\xL,\xU)[/math] and [math]\sR(\epsilon|\ew,\ex)[/math] so that they are compatible with [math]\sP(\ey=1|\ew,\xL,\xU),(\ew,\xL,\xU)[/math]-a.s. and with the maintained assumptions. Hence, [math]\theta[/math] can be partially identified using the information in the model and observed data.

Predicted value of [math]\ey[/math] as a function of [math]\epsilon[/math], and admissible values of [math]\epsilon[/math] for each realization of [math]\ey[/math], in Identification Problem, conditional on [math](\ew,\xL,\xU)[/math].
Theorem (Semiparametric Binary Regression with Interval Covariate Data)


Under the Assumptions of Identification Problem, the sharp identification region for [math]\theta[/math] is

[[math]] \begin{multline} \idr{\theta}=\Big\{\vartheta\in \Theta: \sP\Big((\ew,\xL,\xU):\, \{0\le\ew\vartheta+\xL\cap \sP(\ey=1|\ew,\xL,\xU)\le 1-\alpha\}\\ \cup \{\ew\vartheta+\xU\le 0\cap \sP(\ey=1|\ew,\xL,\xU)\ge 1-\alpha\}\Big) = 0 \Big\}.\label{eq:ThetaI_man:tam02_binary} \end{multline} [[/math]]

Show Proof

For any [math]\vartheta\in\Theta[/math], define the set of possible values for the unobservable associated with the possible realizations of [math](\ey,\ew,\xL,\xU)[/math], illustrated in Figure, as [Notes 5]

[[math]] \begin{align} \Eps_\vartheta(\ey,\ew,\xL,\xU) =\left \{ \begin{array}{ll} (-\infty,-\ew\vartheta-\xL] & \textrm{if} &\ey=0,\\ [-\ew\vartheta-\xU,+\infty) & \textrm{if} &\ey=1. \end{array} \right.\label{eq:def_Epsilon:man:tam} \end{align} [[/math]]

Then [math]\Eps_\vartheta(\ey,\ew,\xL,\xU)[/math] is a random closed set as per Definition. To simplify notation, let [math]\Eps_\vartheta(\ey)\equiv\Eps_\vartheta(\ey,\ew,\xL,\xU)[/math] suppressing the dependence on [math](\ew,\xL,\xU)[/math]. Let [math](\Eps_\vartheta(\ey),\ew,\xL,\xU)=\Eps_\vartheta(\ey)\times(\ew,\xL,\xU)=\{(\mathbf{e},\ew,\xL,\xU):\mathbf{e}\in\Eps_\vartheta(\ey)\}[/math]. If the model is correctly specified, for the data generating value [math]\theta[/math], [math](\epsilon,\ew,\xL,\xU) \in (\Eps_\theta(\ey),\ew,\xL,\xU)[/math] a.s. By Theorem and Theorem 2.33 in [30], this occurs if and only if

[[math]] \begin{align} \sR(\epsilon\in C|\ew,\xL,\xU)&\ge \sP(\Eps_\theta(\ey)\subset C|\ew,\xL,\xU),(\ew,\xL,\xU)\text{-a.s.}\forall C\in\cF,\label{eq:Artstein_on_man:tam} \end{align} [[/math]]
where [math]\cF[/math] here denotes the collection of closed subsets of [math]\R[/math]. We then have that [math]\vartheta[/math] is observationally equivalent to [math]\theta[/math] if and only if \eqref{eq:Artstein_on_man:tam} holds for [math]\Eps_\vartheta(\ey)[/math] as defined in \eqref{eq:def_Epsilon:man:tam}. The condition can be rewritten as

[[math]] \begin{align*} \int \sR(\epsilon\in C|\ew,\ex,\xL,\xU)d\sR(\ex|\ew,\xL,\xU)&\ge \sP(\Eps_\vartheta(\ey)\subset C|\ew,\xL,\xU),(\ew,\xL,\xU)\text{-a.s.}\forall C\in\cF. \end{align*} [[/math]]
The assumption that [math]\sR(\epsilon|\ew,\ex,\xL,\xU)=\sR(\epsilon|\ew,\ex)[/math] yields that the above system of inequalities reduces to

[[math]] \begin{align*} \int \sR(\epsilon\in C|\ew,\ex)d\sR(\ex|\ew,\xL,\xU)&\ge \sP(\Eps_\vartheta(\ey)\subset C|\ew,\xL,\xU),(\ew,\xL,\xU)\text{-a.s.}\forall C\in\cF. \end{align*} [[/math]]
Next, note that given the possible realizations of [math]\Eps_\vartheta(\ey)[/math], the above inequality is trivially satisfied unless [math]C=(-\infty,t][/math] or [math]C=[t,\infty)[/math] for some [math]t\in\R[/math]. Finally, the only restriction on the distribution of [math]\epsilon[/math] is the quantile independence condition, hence it suffices to consider [math]t=0[/math]. To see why this is the case, let for example [math]t \gt 0[/math] and fix a realization [math](w,x_L,x_U)[/math] for [math](\ew,\xL,\xU)[/math].[Notes 6] Then for the inequality not to be trivially satisfied it must be that either [math]w\vartheta+x_L\ge -t[/math] or [math]w\vartheta+x_U\le -t[/math] (both are not possible because [math]w\vartheta+x_L\le w\vartheta+x_U[/math]). If [math]w\vartheta+x_U\le -t[/math], it must be that [math]t\in(0,-w\vartheta-x_U][/math] and [math]-w\vartheta-x_U \gt 0[/math]. Then a distribution [math]\sR[/math] such that [math]\int \sR(\epsilon\in [0,t)|\ew=w,\ex)d\sR(\ex|\ew=w,\xL=x_L,\xU=x_U)=0[/math] is always feasible for [math]t\in(0,-w\vartheta-x_U][/math]. A similar argument holds if [math]w\vartheta+x_L\ge -t[/math]; and also if [math]t \lt 0[/math]. We then have that if the inequalities are satisfied for [math]t=0[/math], they are satisfied also for [math]t\neq 0[/math]. Finally, using the definition of [math]\Eps_\vartheta(\ey)[/math], for [math]t=0[/math] we have

[[math]] \begin{align} 1-\alpha &\ge \sP(\ey=1|\ew,\xL,\xU)\text{for all}(\ew,\xL,\xU)\text{such that } \ew\vartheta+\xU\le 0,\label{eq:key_sharp:man:tam02_1}\\ 1-\alpha & \le \sP(\ey=1|\ew,\xL,\xU)\text{for all}(\ew,\xL,\xU)\text{such that } \ew\vartheta+\xL \ge 0.\label{eq:key_sharp:man:tam02_2} \end{align} [[/math]]

Any given [math]\vartheta\in\Theta[/math], [math]\vartheta\neq\theta[/math], violates the above conditions if and only if [math]\sP\big((\ew,\xL,\xU):\, \{0\le\ew\vartheta+\xL\cap \sP(\ey=1|\ew,\xL,\xU)\le 1-\alpha\}\cup \{\ew\vartheta+\xU\le 0\cap \sP(\ey=1|\ew,\xL,\xU)\ge 1-\alpha\}\big) \gt 0[/math].


Key Insight: The analysis in [19] systematically studies what can be learned under increasingly strong sets of assumptions. These include both assumptions that constrain the model from fully nonparametric to semiparametric to parametric, as well as assumptions that constrain the distribution of the observable covariates. For example, [19](Corollary to Proposition 2) provide sufficient conditions on the joint distribution of [math](\ew,\xL,\xU)[/math] that allow for identification of the sign of components of [math]\theta[/math], as well as for point identification of [math]\theta[/math].[Notes 7] The careful analysis of the identifying power of increasingly stronger assumptions is the pillar of the partial identification approach to empirical research proposed by Manski, as illustrated in Section. The work of [19] was the first example of this kind in semiparametric structural models.

Revisiting [19] study of Identification Problem nearly 20 years later yields important insights on the differences between point and partial identification analysis. It is instructive to take as a point of departure the analysis of [29], which under the additional assumption that [math](\ey,\ew,\ex)[/math] is observed yields

[[math]] \begin{align*} \ew\theta+\ex \gt 0 \Leftrightarrow \sP(\ey=1|\ew,\ex) \gt 1-\alpha. \end{align*} [[/math]]

In this case, [math]\theta[/math] is identified relative to [math]\vartheta\in\Theta[/math] if

[[math]] \begin{align} \sP\left((\ew,\ex):\, \{\ew\theta+\ex\le 0 \lt \ew\vartheta+\ex\} \cup \{\ew\vartheta+\ex\le 0 \lt \ew\theta+\ex\}\right) \gt 0.\label{eq:manski85} \end{align} [[/math]]

[19] extend this reasoning to the case that [math]\ex[/math] is unobserved, but known to satisfy [math]\ex\in [\xL,\xU][/math] a.s. The first part of their analysis, collected in their Proposition 2, characterizes the collection of values that cannot be distinguished from [math]\theta[/math] on the basis of [math]\sP(\ew,\xL,\xU)[/math] alone, through a clear generalization of \eqref{eq:manski85}:

[[math]] \begin{align} \{\vartheta\in \Theta: \sP\left((\ew,\xL,\xU):\, \{\ew\theta+\xU\le 0 \lt \ew\vartheta+\xL\} \cup \{\ew\vartheta+\xU\le 0 \lt \ew\theta+\xL\}\right) = 0\}.\label{eq:region:man:tam02:potential} \end{align} [[/math]]

It is worth emphasizing that the characterization in \eqref{eq:region:man:tam02:potential} depends on [math]\theta[/math], and makes no use of the information in [math]\sP(\ey|\ew,\xL,\xU)[/math]. The Corollary to Proposition 2 yields conditions on [math]\sP(\ew,\xL,\xU)[/math] under which either the sign of components of [math]\theta[/math], or [math]\theta[/math] itself, can be identified, regardless of the distribution of [math]\ey|\ew,\xL,\xU[/math]. [19](Lemma 1) provide a second characterization, which presupposes knowledge of [math]\sP(\ey,\ew,\xL,\xU)[/math], yields a set smaller than the one in \eqref{eq:region:man:tam02:potential}, and coincides with the result in Theorem SIR-. [19] use the same notation for the two sets, although the sets are conceptually and mathematically distinct.[Notes 8] The result in Theorem SIR- is due to [19](Lemma 1), but the proof provided here is new, as is the use of random set theory in this application.[Notes 9]

Key Insight:The preceding discussion allows me to draw a novel connection between the two characterizations in [19], and the distinction put forward by [31] and [32](Chapter XXX in this Volume, Definition 2) in partial identification between potential observational equivalence and observational equivalence.[Notes 10] Applying [31]'s definition, parameter vectors [math]\theta[/math] and [math]\vartheta[/math] are potentially observationally equivalent if there exists some distribution of [math]\ey|\ew,\xL,\xU[/math] for which conditions \eqref{eq:key_sharp:man:tam02_1}-\eqref{eq:key_sharp:man:tam02_2} hold. Simple algebra confirms that this yields the region in \eqref{eq:region:man:tam02:potential}. This notion of potential observational equivalence parallels one of the notions used to obtain sufficient conditions for point identification in the semiparametric literature (as in, e.g. [29]). Both notions, as explained in [32](Section 4.1), make no reference to the conditional distribution of outcomes given covariates delivered by the process being studied. To obtain that parameters [math]\theta[/math] and [math]\vartheta[/math] are observationally equivalent one requires instead that conditions \eqref{eq:key_sharp:man:tam02_1}-\eqref{eq:key_sharp:man:tam02_2} hold for the observed distribution [math]\sP(\ey=1|\ew,\xL,\xU)[/math] (as opposed to “for some distribution” as in the case of potential observational equivalence). This yields the sharp identification region in \eqref{eq:ThetaI_man:tam02_binary}.

[33] studies random ’'expected utility models, where agents choose the alternative that maximizes their expected utility. The core difference with standard models is that [33] does not fully specify the subjective beliefs that agents use to form their expectations, but only a set of such beliefs. [33] shows that the resulting, partially identified, discrete choice model can be formulated similarly to how [19] treat interval valued covariates, and leverages their results to obtain bounds on preference parameters.[Notes 11]

[34] consider a different but closely related model to the semiparametric binary response model studied by [19]. They assume that an instrumental variable [math]\ez[/math] is available, that [math]\epsilon[/math] is independent of [math]\ex[/math] conditional on [math](\ew,\ez)[/math], and that [math]Corr(\ez,\epsilon)=0[/math]. They assume that the distribution of [math]\ex[/math] is absolutely continuous with support [math][v_1,v_k][/math], and that [math]\ex[/math] is not a deterministic linear function of [math](\ew,\ez)[/math]. They consider the case that [math]\ex[/math] is unobserved but known to belong to one of the fixed (and known) intervals [math][v_i,v_{i+1})[/math], [math]i=1,\dots,k-1[/math], with [math]\sR[\ex\in[v_i,v_{i+1})|\ew,\ez] \gt 0[/math] almost surely for all [math]i[/math]. Finally, they assume that [math](-\ew\theta-\epsilon)\in [v_1,v_k][/math] with probability one. They do not, however, make quantile independence assumptions. Their point of departure is the fact that under these conditions, if [math]\ex[/math] were observed, one could employ a transformation proposed by [35] for the binary outcome [math]\ey[/math], such that [math]\theta[/math] can be identified through a simple linear moment condition. Specifically, let

[[math]] \begin{align*} \tilde{\ey}=\frac{\ey - \one_{\ex \gt 0}}{f_\ex(\ex|\ew,\ez)}, \end{align*} [[/math]]

where [math]f_\ex(\cdot|\ew,\ez)[/math] is the conditional density function of [math]\ex[/math]. Then, using the assumption that [math]\ez[/math] and [math]\epsilon[/math] are uncorrelated, one has

[[math]] \begin{align} \E_\sP(\ez \tilde{\ey})-\E_\sP(\ez \ew^\top) \theta = 0.\label{eq:sem-bin} \end{align} [[/math]]


With interval valued [math]\ex[/math], [34] denote by [math]\ex^*[/math] the random variable that takes value [math]i\in\{1,\dots,k-1\}[/math] if [math]\ex\in[v_i,v_{i+1})[/math], so that the observed data are draws from the joint distribution of [math](\ey,\ew,\ez,\ex^*)[/math]. They let [math]\delta(\ex^*)=v_{\ex^*+1}-v_{\ex^*}[/math] denote the length of the [math]\ex^*[/math]-th interval, and define the transformed outcome variable:

[[math]] \ey^*=\frac{\delta(\ex^*)}{\sP(\ex^*=i|\ew,\ez)}\ey-v_k. [[/math]]

The assumptions on [math]\ex[/math] yield that, given [math]\ez[/math] and [math]\ew[/math], [math]\epsilon[/math] does not depend on [math]\ex^*[/math]. Moreover, [math]\sP(\ey=1|\ex^*,\ew,\ez)[/math] is non-decreasing in [math]\ex^*[/math] and [math]\sF_\epsilon(\cdot|\ez,\ew,\ex,\ex^*)=\sF_\epsilon(\cdot|\ez,\ew)[/math]. [34] show that the sharp identification region for [math]\theta[/math] is

[[math]] \begin{align} \idr{\theta}=\E_\sP(\ez \ew^\top)^{-1}\E_\sP(\ez \ey^* + \ez \eU),\label{eq:SIR:mag:mau} \end{align} [[/math]]

where [math]\E_\sP(\ez \ey^* + \ez \eU)[/math] is the Aumann (or selection) expectation of the random interval [math]\ez \ey^* + \ez \eU[/math], see Definition, with

[[math]] \begin{align*} \eU=\left[-\sum_{i=1}^{k-1}(r_i(\ew,\ez)-r_{i-1}(\ew,\ez))(v_{i+1}-v_i), \sum_{i=1}^{k-1}(r_{i+1}(\ew,\ez)-r_i(\ew,\ez))(v_{i+1}-v_i) \right]. \end{align*} [[/math]]

In this expression, [math]r_{\ex^*}(\ew,\ez)\equiv\sP(\ey=1|\ex^*,\ew,\ez)[/math] and by convention [math]r_0(\ew,\ez)=0[/math] and [math]r_K(\ew,\ez)=1[/math], see [34](Theorem 4). If [math]r_i(\ew,\ez),i=0,\dots,k[/math], were observed, this characterization would be very similar to the one provided by [36] for Identification Problem, see equation eq:ThetaI_BLP. However, these random functions need to be estimated. While the first-stage estimation of [math]r_i(\ew,\ez),i=0,\dots,k[/math], does not affect the identification arguments, it does complicate inference, see [37] and the discussion in Section.

Endogenous Explanatory Variables

Whereas the standard random utility model presumes some form of exogeneity for [math]\ex[/math], in practice often some explanatory variables are endogenous. This problem has been addressed in the literature to obtain point identification of the model through a combination of several assumptions, including large support conditions, special regressors, control function restrictions, and more (see, e.g., [38][39][35][40]). [41] analyze the distinct but related problem of identification in a censored regression model with endogeneous explanatory variables, and provide sufficient conditions for point identification.[Notes 12] Here I discuss how to carry out identification analysis in the absence of such assumptions when instrumental variables [math]\ez[/math] are available, as proposed by [42]. They consider a more general case than I do here, with utility function that is not parametrically specified and not restricted to be separable in the unobservables. Even in that more general case, the identification analysis follows through similar steps as reported here.

Identification Problem (Discrete Choice with Endogenous Explanatory Variables)


Let [math](\ey,\ex,\ez)\sim\sP[/math] be observable random variables in [math]\cY\times\cX\times\cZ[/math]. Let all members of the population face the same choice set [math]\cY[/math]. Suppose that each alternative has one unobservable attribute [math]\epsilon_c,c\in\cY[/math] and let [math]\nu\equiv(\epsilon_{c_1},\dots,\epsilon_{c_{|\cY|}})[/math].[Notes 13] Let [math]\nu\sim\sQ[/math] and assume that [math]\nu\independent\ez[/math]. Suppose [math]\sQ[/math] belongs to a nonparametric family of distributions [math]\cT[/math], and that the conditional distribution of [math]\nu|\ex,\ez[/math], denoted [math]\sR(\nu|\ex,\ez)[/math], is absolutely continuous with respect to Lebesgue measure with everywhere positive density on its support, [math](\ex,\ez)[/math]-a.s. Suppose utility is separable in unobservables and has a functional form known up to finite dimensional parameter vector [math]\theta\in\Theta\subset\R^m[/math], so that [math]\bu_i(c)=g(\ex_c;\theta)+\epsilon_c[/math], [math](\ex_c,\epsilon_c)[/math]-a.s., for all [math]c\in\cY[/math]. Maintain the normalizations [math]g(\ex_{c_{|\cY|}};\theta)=0[/math] for all [math]\theta\in\Theta[/math] and all [math]\ex\in\cX[/math], and [math]g(x_c^0;\theta)=\bar{g}[/math] for known [math](x_c^0,\bar{g})[/math] for all [math]\theta\in\Theta[/math] and [math]c\in\cY[/math].[Notes 14] Given [math](\ex,\ez,\nu)[/math], suppose [math]\ey[/math] is the utility maximizing choice in [math]\cY[/math]. In the absence of additional information, what can the researcher learn about [math](\theta,\sQ)[/math]?

The key challenge to identification here results because the distribution of [math]\nu[/math] can vary across different values of [math]\ex[/math], both conditional and unconditional on [math]\ez[/math]. Why does this fact hinder point identification? For a given [math]\vartheta\in\Theta[/math] and for any [math]c\in\cY[/math] and [math]x\in\cX[/math], the model yields that [math]c[/math] is optimal, and hence chosen, if and only if [math]\nu[/math] realizes in the set

[[math]] \begin{align} \cE_\vartheta(c,x)=\{e\in\cV:g(x_c;\vartheta)+e_c\ge g(x_d;\vartheta)+e_d\forall d\in\cY\}.\label{eq:che:ros:E} \end{align} [[/math]]

Figure plots the set [math]\cE_\vartheta(\ey,\ex)[/math] in a stylized example with [math]\cY=\{1,2,3\}[/math] and [math]\cX=\{x^1,x^2\}[/math], as a function of [math](\epsilon_1-\epsilon_3,\epsilon_2-\epsilon_3)[/math].[Notes 15] Consider the model implied distribution, denoted [math]\sM[/math] below, of the optimal choice. Then, recalling the restriction [math]\ez\independent\nu[/math], we have

[[math]] \begin{align} \sM(c|\ex\in R_x,\ez;\vartheta)&=\int_{x\in R_x}\sR(\cE_\vartheta(c,\ex)|\ex=x,\ez)d\sP(x|z),\forall R_x\subseteq\cX,\ez\text{-a.s.}\label{eq:che:ros:model:distrib}\\ \sQ(F)&=\int_{x\in\cX}\sR(F|\ex=x,\ez)d\sP(x|z),\forall F\subseteq\cV,\ez\text{-a.s.},\label{eq:che:ros:instrument} \end{align} [[/math]]

Because the joint distribution of [math](\ex,\nu)[/math] conditional on [math]\ez[/math] is left completely unrestricted (other than \eqref{eq:che:ros:instrument}), one can find multiple triplets [math](\vartheta,\sQ,\sR(\nu|\ex,\ez))[/math] satisfying the maintained assumptions and with [math]\sM(c|\ex\in R_x,\ez;\vartheta)=\sP(c|\ex\in R_x,\ez)[/math] for all [math]c\in\cY[/math] and [math]R_x\subseteq\cX[/math], [math]\ez[/math]-a.s.

The set [math]\cE_\vartheta[/math] in equation \eqref{eq:che:ros:E} and the corresponding admissible values for [math](\ey,\ex)[/math] as a function of [math](\epsilon_1-\epsilon_3,\epsilon_2-\epsilon_3)[/math] under the simplifying assumption that [math]\cX=\{x^1,x^2\}[/math] and [math]\cY=\{1,2,3\}[/math]. The admissible values for [math](\ey,\ex)[/math] are [math]\{(c,x^1)\}[/math] in the gray area, and [math]\{(c,x^2)\}[/math] in the area with vertical lines. Because the two areas overlap, the model has set-valued predictions for [math](\ey,\ex)[/math].

It is instructive to compare \eqref{eq:che:ros:model:distrib}-\eqref{eq:che:ros:instrument} with [27] conditional logit. Under the standard assumptions, [math]\ex\independent\nu[/math] so that no instrumental variables are needed. This yields [math]\sQ(\nu)=\sR(\nu|\ex)[/math] [math]\ex[/math]-a.s., and in addition [math]\sQ[/math] is typically known, with corresponding simplifications in \eqref{eq:che:ros:model:distrib}. The resulting system of equalities can be inverted under standard order and rank conditions to yield point identification of [math]\theta[/math]. Further insights can be gained by looking at Figure. As the value of [math]\ex[/math] changes from [math]x^1[/math] to [math]x^2[/math], the region of values where, say, alternative 1 is optimal changes. When [math]\ex[/math] is exogenous, say independent of [math]\nu[/math], this yields a system of equalities relating [math](\theta,\sQ)[/math] to the observed distribution [math]\sP(\ey,\ex)[/math] which, as stated above, can be inverted to obtain point identification. When [math]\ex[/math] is endogenous, this reasoning breaks down because the conditional distribution [math]\sR(\nu|\ex,\ez)[/math] may change across realizations of [math]\ex[/math]. Figure also offers an instructive way to connect Identification Problem with the identification problem studied in Section Semiparametric Binary Choice Models with Interval Valued Covariates (as well as with those in Sections Static, Simultaneous-Move Finite Games with Multiple Equilibria-Auction Models with Independent Private Values below). In the latter, the model has set-valued predictions for the outcome variable given realizations of the covariates and unobserved heterogeneity terms, which overlap across realizations of the unobserved heterogeneity terms. In the problem studied here, the model has singleton-valued predictions for the outcome variable of interest [math]\ey[/math] as a function of the observable explanatory variables [math]\ex[/math] and unobservables [math]\nu[/math]. However, for given realization of [math]\nu[/math], the model admits sets of values for the endogenous variables [math](\ey,\ex)[/math], which overlap across realizations of [math]\nu[/math]. Because the model is silent on the joint distribution of [math](\ex,\nu)[/math] (except for requiring that the marginal distribution of [math]\nu[/math] does not depend on [math]\ez[/math]), partial identification results. It is possible to couple the maintained assumptions with the observed data to learn features of [math](\theta,\sQ)[/math]. Because the observed choice [math]\ey[/math] is assumed to maximize utility, for the data generating [math](\theta,\sQ)[/math] the model yields

[[math]]\lt/ref\gt, it follows that \ltmath\gt(\vartheta,\tilde\sQ)[[/math]]

is observationally equivalent to [math](\theta,\sQ)[/math] if and only if

[[math]] \begin{align*} \tilde\sQ(F|\ex,\ez)\ge \sP(\cE_\vartheta(\ey,\ex)\subseteq F|\ex,\ez),\forall F\in\cF,(\ex,\ez)\text{-a.s.} \end{align*} [[/math]]

As the distribution of [math]\nu[/math] is only restricted so that [math]\nu\independent\ez[/math], one can integrate both sides of the inequality with respect to [math]\ex[/math]. The final result follows because [math]\tilde\sQ[/math] does not depend on [math]\ez[/math].}}

While Theorem SIR- relies on checking inequality \eqref{eq:SIR:discrete:choice:endogenous} for all [math]F\in\cF[/math], the results in [42](Theorem 2) and [30](Chapter 2) can be used to obtain a smaller collection of sets over which to verify it. In particular, if [math]\ex[/math] has a discrete distribution, it suffices to use a finite collection of sets. For example, in the case depicted in Figure with [math]\cX=\{x^1,x^2\}[/math], [42](Section 3.3 of the 2011 CeMMAP working paper version CWP39/11) show that [math]\idr{\theta,\sQ}[/math] is obtained by checking at most twelve inequalities in \eqref{eq:SIR:discrete:choice:endogenous}. The left hand side of these inequalities is a linear function of six values that the distribution [math]\tilde\sQ[/math] assigns to each of the component regions depicted in Figure (the one where [math]\cE_\vartheta(1,x^1)\cap\cE_\vartheta(1,x^2)[/math] realizes; the one where [math]\cE_\vartheta(1,x^1)\cap\cE_\vartheta(3,x^2)[/math] realizes; etc.) Hence, in this example, [math](\vartheta,\tilde\sQ)\in\idr{\theta,\sQ}[/math] if and only if [math]\tilde\sQ[/math] assigns to these six regions a probability mass such that for [math]\vartheta[/math] the twelve inequalities characterized by [42] hold.

Key Insight: A conceptual contribution of [42] is to show that one can frame models with endogenous explanatory variables as incomplete models. Incompleteness here results from the fact that the model does not specify how the endogenous variables [math]\ex[/math] are determined. One can then think of these as models with set-valued predictions for the endogeneous variables ([math]\ey[/math] and [math]\ex[/math] in this application), even though the outcome of the model ([math]\ey[/math]) is uniquely predicted by the realization of the observed explanatory variables ([math]\ex[/math]) and the unobserved heterogeneity terms ([math]\nu[/math]). Random set theory can again be leveraged to characterize sharp identification regions. [32](Chapter XXX in this Volume) discuss related generalized instrumental variables models where random set methods are used to obtain characterizations of sharp identification regions in the presence of endogenous explanatory variables.

Unobserved Heterogeneity in Choice Sets and/or Consideration Sets

Compared to the general framework set forth at the beginning of Section Discrete Choice in Single Agent Random Utility Models, as pointed out in [43], often the researcher observes [math](\ey_i,\ex_i)[/math] but not [math]\eC_i[/math], [math]i=1,\dots,n[/math]. Even when [math]\eC_i[/math] is observable, the researcher may be unaware of which of its elements the decision maker actually evaluates before selecting one. In what follows, to shorten expressions, I refer to both the measurement problem of unobserved choice sets and the (cognitive) problem of limited consideration as “unobserved heterogeneity in choice sets.”

Learning features of preferences using discrete choice data in the presence of unobserved heterogeneity in choice sets is a formidable task. When a decision maker chooses an alternative, this may be because her choice set equals the feasible set and the chosen alternative is the one yielding the highest utility. Then observed choice reveals preferences. But it can also be that the decision maker has access to/considers only the chosen alternative (e.g., [1](p. 99)). Then observed choice is driven entirely by choice set composition, and is silent about preferences. A plethora of scenarios between these extremes is possible, but the researcher does not know which has generated the observed data. This fundamental identification problem calls either for restrictions on the random utility model and consideration set formation process, or for collection of richer data that eliminates unobserved heterogeneity in [math]\eC_i[/math] or allows for enhanced modeling of it (see, e.g., [44]).

A sizable literature spanning behavioral economics, econometrics, experimental economics, marketing, microeconomics, and psychology, has put forward different models to formalize the complex process that leads to the formation of the set of alternatives that the agent considers or can choose from (see, e.g., [45][46][47](for early contributions)). [43] proposes both a general econometric model where decision makers draw choice sets from an unknown distribution, as well as a specific model of choice set formation, independent from preferences, and studies their implications for the distributional structure of random utility models.[Notes 16]

However, assumptions about the choice set formation process are often rooted in a desire to achieve point identification rather than in information contained in the model or observed data.[Notes 17] It is then important to ask what can be learned about decision maker’s preferences under minimal assumptions on the choice set formation process. Allowing for unrestricted dependence between choice sets and preferences, while challenging for identification analysis, is especially relevant. Indeed, decision makers' unobserved attributes may determine both their preferences and which items in the feasible set they pay attention to or are available to them (e.g., through unobserved liquidity constraints, unobserved characteristics such as religious preferences in the context of school choice, or behavioral phenomena such as aversion to extremes, salience, etc.). Here I use the framework put forward by [48] to study identification of discrete choice models with unobserved heterogeneity in choice sets and preferences.

\begin{IP}[Discrete Choice with Unobserved Heterogeneity in Choice Sets and Preferences]\label{IP:BCMT} Let [math](\ey,\ex)\sim \sP[/math] be observable random variables in [math]\cY\times\cX[/math]. Assume that there exists a real valued function [math]g[/math], which for simplicity I posit known up to parameter [math]\delta\in\Delta\subset\R^m[/math] and continuous in its second argument, such that [math]\bu_i(c)=g(\ex_{ic},\nu_i;)[/math], [math](\ex_{ic},\nu_i)[/math]-a.s., for all [math]c\in\cY,i\in\cI[/math], where [math]\ex_{ic}[/math] denotes the vectors of attributes relevant to alternative [math]c[/math], and includes attributes that are alternative invariant and ones that are alternative specific (respectively, [math]\ex_i^1[/math] and [math]\ex_{ic}^2[/math] in the general notation laid out in Section Discrete Choice in Single Agent Random Utility Models). Suppose that [math]\ey=\arg\max_{c\in \eC}g(\ex_c,\nu;\delta)[/math], where ties are assumed to occur with probability zero and [math]\eC[/math] is an unobservable choice set drawn from the subsets of [math]\cY[/math] according to some unknown probability distribution. Suppose [math]\sR(|\eC|\ge\kappa)=1[/math] for some known constant [math]\kappa\ge 2[/math]. Let [math]\sQ[/math] denote the distribution of [math]\nu[/math], and assume that it is known up to a finite dimensional parameter [math]\gamma\in\Gamma\subset\R^k[/math]. For simplicity, assume that [math]\nu\independent\ex[/math].[Notes 18] In the absence of additional information, what can the researcher learn about [math]\theta\equiv[\delta;\gamma][/math]? \qedex }}

Predicted value of [math]\ey[/math] in Identification Problem as a function of [math]\nu[/math] for [math]\kappa=|\cY|-1[/math]. In this case, [math]\eC=\cY\setminus\{c\}[/math] for some [math]c\in\cY[/math], and the model predicts either the first or the second best alternative in [math]\cY[/math].

The model just laid out has set valued predictions for the decision maker's optimal choice, because different alternatives might be optimal depending on which choice set the decision maker draws. Figure, which is based on the analysis in [48], illustrates the set valued predictions in a stylized example. In the figure [math]\nu[/math] is assumed to be a scalar; [math]\bar{\nu}_{j,m}[/math] denotes the threshold value of [math]\nu[/math] above which [math]c_j[/math] yields higher utility than [math]c_m[/math] and below which [math]c_m[/math] yields higher utility than [math]c_j[/math] (the threshold's dependence on [math](\ex;\delta)[/math] is suppressed for notational convenience). Consider the case that [math]\nu\in[\bar{\nu}_{2,3},\bar{\nu}_{1,2}][/math], so that [math]c_2[/math] is the option yielding the highest utility among all options in [math]\cY[/math]. When [math]\kappa=|\cY|-1[/math], the agent may draw a choice set that does not include one of the alternatives in [math]\cY[/math]. If the excluded alternative is not [math]c_2[/math] (or if [math]\eC[/math] realizes equal to [math]\cY[/math]), the model predicts that the decision maker chooses [math]c_2[/math]. If [math]\eC[/math] realizes equal to [math]\cY\setminus\{c_2\}[/math], the model predicts that the decision maker chooses the second best: [math]c_1[/math] if [math]\nu\in[\bar{\nu}_{1,3},\bar{\nu}_{1,2}][/math], and [math]c_3[/math] if [math]\nu\in[\bar{\nu}_{2,3},\bar{\nu}_{1,3}][/math]. Conversely, observation of [math]\ey=c_1[/math] allows one to conclude that [math]\nu\ge\bar\nu_{1,3}[/math], and [math]\ey=c_2[/math] that [math]\nu\ge\bar\nu_{2,4}[/math], with [math]\bar\nu_{2,4}\le\bar\nu_{1,3}[/math], and these regions of possible realizations of [math]\nu[/math] overlap.

Why does this set valued prediction hinder point identification? The reason is similar to the explanation given for Identification Problem: the distribution of the observable data relates to the model structure in an incomplete manner, because the distribution of the (unobserved) choice sets is left completely unspecified. [48] show that one can find multiple candidate distributions for [math]\eC[/math] and parameter vectors [math]\vartheta[/math], such that together they yield a model implied distribution for [math]\ey|\ex[/math] that matches [math]\sP(\ey|\ex)[/math], [math]\ex[/math]-a.s. [48] propose to work directly with the set of model implied optimal choices given [math](\ex,\nu)[/math] associated with each possible realization of [math]\eC[/math], which is depicted in Figure for a specific example. The key idea is that, according to the model, the observed choice maximizes utility among the alternatives in [math]\eC[/math]. Hence, for the data generating value of [math]\theta[/math], it belongs to the set of model implied optimal choices. With this, the authors are able to characterize [math]\idr{\theta}[/math] through Theorem as the collection of parameter vectors that satisfy a finite number of conditional moment inequalities.

Key Insight: [48] show that working directly with the set of model implied optimal choices given [math](\ex,\nu)[/math] allows one to dispense with considering all possible distributions of choice sets that are allowed for in Identification Problem to complete the model. Such distributions may depend on [math]\nu[/math] even after conditioning on observables and may constitute an infinite dimensional nuisance parameter, which creates great difficulties for the computation of [math]\idr{\theta}[/math] and for inference.

Identification Problem sets up a structure where preferences include idiosyncratic components [math]\nu[/math] that are decision maker specific and can depend on [math]\eC[/math], and where heterogeneity in [math]\eC[/math] can be driven either by a measurement problem, or by the decision maker's limited attention to the options available to her. However, for computational and finite sample inference reasons, it restricts the family of utility functions to be known up to a finite dimensional parameter vector [math]\delta[/math].

A rich literature in decision theory has analyzed a different framework, where the decision maker's choice set is observable to the researcher, but the decision maker does not consider all alternatives in it (for recent contributions see, e.g., [49][50]). In this literature, the utility function is left completely unspecified, so that interest focuses on identification of preference orderings of the available options. Unobserved heterogeneity in preferences is assumed away, so that heterogeneous choice is driven by randomness in consideration sets. If the consideration set formation process is left unspecified or is subject only to weak restrictions, point identification of the preference orderings is not possible even if preferences are homogeneous and the researcher observes a representative agent facing multiple distinct choice problems with varying choice sets.

[51] propose a general model for the consideration set formation process where the only restriction is a weak and intuitive monotonicity condition: the probability that any particular consideration set is drawn does not decrease when the number of possible consideration sets decreases. Within this framework, they provide revealed preference theory and testable implications for observable choice probabilities. \begin{IP}[Homogeneous Preference Orderings in Random Attention Models]\label{IP:RAM} Let [math](\ey,\eC)\sim\sP[/math] be a pair of observable random variable and random set in [math]\cY\times\mathfrak{D}[/math], where [math]\mathfrak{D}=\{D:D\subseteq\cY\}\setminus\emptyset[/math].[Notes 19] Let [math]\mu:\mathfrak{D}\times\mathfrak{D}\to[0,1][/math] denote an attention rule such that [math]\mu(A|G)\ge 0[/math] for all [math]A\subseteq G[/math], [math]\mu(A|G)=0[/math] for all [math]A\nsubseteq G[/math], and [math]\sum_{A\subset G}\mu(A|G)=1[/math], [math]A,G\in\mathfrak{D}[/math]. Assume that for any [math]b\in G\setminus A[/math],

[[math]] \begin{align} \label{eq:RAM:monotonicity} \mu(A|G)\le\mu(A|G\setminus\{b\}), \end{align} [[/math]]

and that the decision maker has a strict preference ordering [math]\succ[/math] on [math]\cY[/math].[Notes 20] In the absence of additional information, what can the researcher learn about [math]\succ[/math]? \qedex }} [51] posit that an observed distribution of choice [math]\sP(\ey|\eC)[/math] has a random attention representation, and hence they name it a random attention model, if there exists a preference ordering [math]\succ[/math] over [math]\cY[/math] and a monotonic attention rule [math]\mu[/math] such that

[[math]] \begin{align} \cp(c|G)\equiv\sP(\ey=c|\eC=G)=\sum_{A\subseteq G}\one(c\text{ is }\succ\text{-best in }A)\mu(A|G),\forall c\in G,\forall G\in\mathfrak{D}.\label{eq:RAM} \end{align} [[/math]]

The sharp identification region for the preference ordering, denoted [math]\idr{\succ}[/math] henceforth, is given by the collection of preference orderings for which one can find a monotonic attention rule to pair it with, so that \eqref{eq:RAM} holds. Of course, an observed distribution of choice can be represented by multiple preference orderings and attention rules. The authors, however, show in their Lemma 1 that if for some [math]G\in\mathfrak{D}[/math] with [math]\{b,c\}\in G[/math],

[[math]] \begin{align} \cp(c|G) \gt \cp(c|G\setminus \{b\}),\label{eq:RAM_violation_reg} \end{align} [[/math]]

then [math]c \succ b[/math] for any [math]\succ[/math] for which one can find a monotonic attention rule [math]\mu[/math] such that \eqref{eq:RAM} holds. Because of preference transitivity, one can also learn [math]a\succ b[/math] if in addition to the above condition one has [math]\cp(a|G^\prime) \gt \cp(a|G^\prime\setminus \{c\})[/math] for some [math]c\in G^\prime[/math] and [math]G^\prime\in\mathfrak{D}[/math]. The authors further show in their Theorem 1 that the collection of preference relations associated with all possible instances of \eqref{eq:RAM_violation_reg} for all [math]c\in G[/math] and [math]G\in\mathfrak{D}[/math] yield all information about preferences given the observed choice probabilities. This yields a system of linear inequalities in [math]\cp(c|G)[/math] that fully characterize [math]\idr{\succ}[/math]. Let [math]\vec{\cp}[/math] denote the vector with elements [math][\cp(c|G):c\in G,G\in\mathfrak{D}][/math] and [math]\Pi_\succ[/math] denote a conformable matrix collecting the constraints on [math]\sP(\ey|\eC)[/math] embodied in \eqref{eq:RAM_violation_reg} and its generalizations based on transitive closure. Then

[[math]] \begin{align} \idr{\succ}=\{\succ: \Pi_\succ \vec{\cp}\le 0\}.\label{eq:SIR:RAM} \end{align} [[/math]]

The authors show that for any given preference ordering [math]\succ[/math], the matrix [math]\Pi_\succ[/math] characterizing whether [math]\succ \in \idr{\succ}[/math] through the system of linear inequalities in \eqref{eq:SIR:RAM} is unique, and they provide a simple algorithm to compute it. They also show that mild additional assumptions, such as, for example, that decision makers facing binary choice sets pay attention to both alternatives frequently enough, can substantially increase the informational content of the data (i.e., substantially tighten [math]\idr{\succ}[/math]).

Key Insight: [51] show that learning features of preference orderings in Identification Problem requires the existence in the data of choice problems where the choice probabilities satisfy \eqref{eq:RAM_violation_reg}. The latter is a violation of the principle of “regularity” [52] according to which the probability of choosing an alternative from any set is at least as large as the probability of choosing it from any of its supersets. Regularity is a monotonicity property of choice probabilities, and it is implied by a wide array of models of decision making. The monotonicity of attention rules in \eqref{eq:RAM:monotonicity} can be viewed as regularity of the process that chooses a consideration set from the subsets of the choice set. [51] show that it is implied by various models of limited attention. While the violation required in \eqref{eq:RAM_violation_reg} is weak in that it needs only to occur for some [math]G[/math], it sheds a different light on the severity of the identification problem described at the beginning of this section. Regularity of choice probabilities and (partial) identification of preference orderings can co-exist only under restrictions on the consideration set formation process that are stronger than the regularity of attention rules in \eqref{eq:RAM:monotonicity}.

[53] and [54] provide different sets of sufficient conditions for point identification of models of limited consideration. In both cases, the authors posit specific models of consideration set formation and provide sufficient conditions for point identification under exclusion and large support assumptions. [53] assume that unobserved heterogeneity in preferences and in consideration sets are independent. They exploit violations of Slutsky symmetry that result from inattention, assuming that for each alternative there is an observable characteristic with large support that does not affect the consideration probability of the other options.

[54] provide a thorough analysis of the extent of dependency between consideration and preferences under which semi-nonparametric point identification of the distribution of preferences and consideration attains. They exploit a requirement of standard economic theory --the Spence-Mirrlees single crossing property of utility functions-- coupled with a mild strengthening of the classic conditions for semi-nonparametric identification of discrete choice models with full consideration and identical choice sets (see, e.g., [26]), assuming that there is at least one decision maker-specific characteristic with large support that affects utility but not consideration.

Prediction of Choice Behavior with Counterfactual Choice Sets

Building on [2], [55] studies a question related but distinct from those in Identification problem - problem. He is concerned with prediction of choice behavior when decision makers face counterfactual choice sets. [55] frames this question as one of predicting treatment response (see Section Treatment Effects with and without Instrumental Variables). Here the collection of potential treatments is given by [math]\mathfrak{D}[/math], the nonempty subsets of the universe of feasible alternatives [math]\cY[/math], and the response function specifies the alternative chosen by a decision maker when facing choice set [math]G\in\mathfrak{D}[/math]. [55] assumes that the researcher observes realized choice sets and chosen alternatives, [math](\ey,\eC)\sim\sP[/math].[Notes 21] Under the standard assumptions laid out at the beginning of Section Discrete Choice in Single Agent Random Utility Models, specifically if utility functions are (say) linear in [math]\epsilon_{ic}[/math] and the distribution of [math]\epsilon_{ic}[/math] is (say) Type I extreme value or multivariate normal, prediction of choice behavior with counterfactual choice sets is immediate (and point identified). [55], however, leaves utility functions completely unspecified, and in fact works directly with preference orderings, which he labels decision maker’s types. He places no restriction on the distribution of preference types, except requiring that they are independent of the observed choice sets. [55] shows that under these rather weak assumptions, the distribution of predicted choices from counterfactual choice sets can be partially identified, and characterized as the solution to linear programs.

Specifically, let [math]\ey^*(G)[/math] denote the decision maker's optimal choice when facing choice set [math]G\in\mathfrak{D}[/math]. Assume [math]\ey^*(\cdot)\independent\eC[/math], and let [math]y_k[/math] denote the choice function for a decision maker of type [math]k[/math] --that is, a decision maker with a specific preference ordering labeled [math]k[/math]. One example of such preference ordering might be [math]c_1\succ c_2\succ\dots\succ c_{|\cY|}[/math]. If a decision maker of this type faces, say, choice set [math]G=\{c_2,c_3,c_4\}[/math], then she chooses alternative [math]c_2[/math]. Let [math]K[/math] denote the set of logically possible types, and [math]\theta_k[/math] the probability that a decision maker in the population is of type [math]k[/math]. Suppose that the researcher posits a behavioral model specifying [math]K[/math], [math]\{y_k,k=1,\dots,K\}[/math], and restrictions that constrain [math]\theta[/math] to lie in some specified set of distributions. Let [math]\Theta[/math] denote the values of [math]\vartheta[/math] that satisfy these requirements plus the conditions [math]\vartheta_k\ge 0[/math] for all [math]k\in K[/math] and [math]\sum_{k\in K}\vartheta_k=1[/math]. Then for any [math]c\in\cY[/math] and [math]\vartheta\in\Theta[/math], the model predicts

[[math]] \begin{align*} \sQ(\ey^*(G)=c)=\sum_{k\in K}\one(y_k(G)=c)\vartheta_k. \end{align*} [[/math]]

How can one partially identify this probability based on the observed data? Suppose [math]\eC[/math] is observed to take realizations [math]D_1,\dots,D_m[/math]. Then the data reveal

[[math]] \begin{align*} \sP(\ey(D_j)=d_j)=\sum_{k\in K}\one(y_k(D_j)=d_j)\theta_k\forall d_j\in D_j,j=1,\dots,m.\end{align*} [[/math]]

This yields that the sharp identification region for [math]\theta[/math] is

[[math]] \begin{align*} \idr{\theta}=\{\vartheta\in\Theta:\sP(\ey(D_j)=d_j)=\sum_{k\in K}\one(y_k(D_j)=d_j)\vartheta_k\forall d_j\in D_j,j=1,\dots,m\}. \end{align*} [[/math]]

If the behavioral model is correctly specified, [math]\idr{\theta}[/math] is non-empty. In turn, the sharp identification region for each choice probability is

[[math]] \begin{align*} \idr{\sQ(\ey^*(G)=c)}=\left\{\sum_{k\in K}\one(y_k(G)=c)\vartheta_k:\vartheta\in\idr{\theta}\right\}, \end{align*} [[/math]]

and its extreme points can be obtained by solving linear programs.

[56] provide closely related sharp bounds on features of counterfactual choices in the nonparametric random utility model of demand, where observable choices are repeated cross-sections and one allows for unrestricted, unobserved heterogeneity. Their approach builds on the work of [57], who test weather agents' behavior is consistent with the Axiom of Revealed Stochastic Preference (SARP) in a random utility model in which the utility function of each consumer over commodity bundles is assumed to satisfy only the basic restriction that “more is better” with no satiation. Because the testing exercise is to be carried out using repeated cross-sections data, the authors maintain the assumption that multiple populations of consumers who face distinct choice sets have the same distribution of preferences. With this structure in place, de facto the task is to test the full implications of rationality without functional form restrictions. [57]’s approach is based on several novel ideas. As a first step, they leverage an earlier insight of [58] to discretize the data without loss of information, so that they can define a large but finite set of rational preferences types. As a second step, they show that this implies that rationality can be tested by checking whether observed behavior lies in a cone corresponding to positive linear combinations of preference types. While the problem is discrete, its dimension is at first sight prohibitive. Nonetheless, Kitamura and Stoye are able to develop novel computational methods that render the problem tractable. They apply their method to the U.K. Household Expenditure Survey, adapting to their framework results on nonparametric instrumental variable analysis by [59] so that they can handle price endogeneity.

[60] builds on [55] to learn program effects when agents are randomly assigned to control or treatment. The treatment group is provided access to the program, while the control group is not. However, members of the control group may receive access to the program from outside the experiment, leading to noncompliance with the randomly assigned treatment. The researcher wants to learn about the average effect of program access on the decision to participate in the program and on the subsequent outcome. While sufficiently rich data may allow the researcher to learn these effects, [60] is concerned with the identification problem that arises when the researcher only observes the treatment assignment status, the program participation decision, and the outcome, but not the receipt of program access for every agent. [60] formalizes this problem as one where the received treatment is selected from a choice set that depends on the assigned treatment and is unobservable to the researcher, and the agents optimally choose whether to participate in the program by maximizing their utility function over their choice set. Importantly, the utility functions are not subject to parametric restrictions, similarly to [55]. But while [55] assumed independence of choice sets and preference types, [60] allows them to be arbitrarily dependent on each other, as in [48]. [60] approach leverages specific assumptions on random assignment of treatments and on compliance (or lack thereof) of participants to obtain nonparametric bounds on the treatment effects of interest that can be characterized using tractable linear programs.

===</ref> and [21] substantially enlarge the scope of partial identification analysis of structural models by showing how to apply it to learn features of payoff functions in static, simultaneous-move finite games of complete information with multiple equilibria. [61] extend the approach and considerations that follow to games of incomplete information. To start, here I focus on two-player entry games with complete information.[Notes 22]

Identification Problem (Complete Information Two Player Entry Game)

Let [math](\ey_1,\ey_2,\ex_1,\ex_2)\sim\sP[/math] be observable random variables in [math]\{0,1\}\times\{0,1\}\times\R^d\times\R^d[/math], [math]d \lt \infty[/math]. Suppose that [math](\ey_1,\ey_2)[/math] result from simultaneous move, pure strategy Nash play (PSNE) in a game where the payoffs are [math]\bu_j(\ey_j,\ey_{3-j},\ex_j;\beta_j,\delta_j)\equiv \ey_j(\ex_j\beta_j+\delta_j\ey_{3-j}+\eps_j)[/math], [math]j=1,2[/math] and the strategies are “enter” ([math]\ey_j=1[/math]) or “stay out”([math]\ey_j=0[/math]). Here [math](\ex_1,\ex_2)[/math] are observable payoff shifters, [math](\eps_1,\eps_2)[/math] are payoff shifters observable to the players but not to the econometrician, [math]\delta_1\le 0,\delta_2\le 0[/math] are interaction effect parameters, and [math]\beta_1,\beta_2[/math] are parameter vectors in [math]B\subset\R^d[/math] reflecting the effect of the observable covariates on payoffs. Each player enters the market if and only if entering yields non-negative payoff, so that [math]\ey_j=\one(\ex_j\beta_j+\delta_j\ey_{3-j}+\eps_j\ge 0)[/math]. For simplicity, assume that [math]\eps\equiv(\eps_1,\eps_2)[/math] is independent of [math]\ex\equiv(\ex_1,\ex_2)[/math] and has bivariate Normal distribution with mean vector zero, variances equal to one (a normalization required by the threshold crossing nature of the model), and correlation [math]\rho\in [-1,1][/math]. In the absence of additional information, what can the researcher learn about [math]\theta=[\delta_1\delta_2\beta_1\beta_2\rho][/math]?

From the econometric perspective, this is a generalization of a standard discrete choice model to a bivariate simultaneous response model which yields a stochastic representation of equilibria in a two player, two action game. Generically, for a given value of [math]\theta[/math] and realization of the payoff shifters, the model just laid out admits multiple equilibria (existence of PSNE is guaranteed because the interaction parameters are non-positive). In other words, it yields set valued predictions as depicted in Figure.[Notes 23] Why does this set valued prediction hinder point identification? Intuitively, the challenge can be traced back to the fact that for different values of [math]\theta\in\Theta[/math], one may find different ways to assign the probability mass in [math][-\ex_1\beta_1,-\ex_1\beta_1-\delta_1)\times [-\ex_2\beta_2,-\ex_2\beta_2-\delta_2)[/math] to [math](0,1)[/math] and [math](1,0)[/math], so as to match the observed distribution [math]\sP(\ey_1,\ey_2|\ex_1,\ex_2)[/math]. More formally, for fixed [math]\vartheta\in\Theta[/math] and given [math](\ex,\eps)[/math] and [math](y_1,y_2)\in\{0,1\}\times\{0,1\}[/math], let

[[math]] \begin{align*} \cE_\vartheta[(1,0),(0,1);\ex]&\equiv[-\ex_1\beta_1,-\ex_1\beta_1-\delta_1)\times [-\ex_2\beta_2,-\ex_2\beta_2-\delta_2),\\ \cE_\vartheta[(y_1,y_2);\ex]&\equiv\{(\eps_1,\eps_2):(y_1,y_2)\text{is the unique equilibrium}\}, \end{align*} [[/math]]

so that in Figure [math]\cE_\vartheta[(1,0),(0,1);\ex][/math] is the gray region, [math]\cE_\vartheta[(0,1);\ex][/math] is the dotted region, etc. Let [math]\sR(y_1,y_2|\ex,\eps)[/math] be a ’'selection mechanism that assigns to each possible outcome of the game [math](y_1,y_2)\in\{0,1\}\times\{0,1\}[/math] the probability that it is played conditional on observable and unobservable payoff shifters. In order to be admissible, [math]\sR(y_1,y_2|\ex,\eps)[/math] must be such that [math]\sR(y_1,y_2|\ex,\eps)\ge 0[/math] for all [math](y_1,y_2)\in\{0,1\}\times\{0,1\}[/math], [math]\sum_{(y_1,y_2)\in\{0,1\}\times\{0,1\}}\sR(y_1,y_2|\ex,\eps)=1[/math], and

[[math]] \begin{align} \forall \eps\in\cE_\vartheta[(1,0),(0,1);\ex],&\sR(0,0|\ex,\eps)=\sR(1,1|\ex,\eps)=0 \label{eq:games:sel:mec:1}\\ \forall \eps\in\cE_\vartheta[(y_1,y_2);\ex],&\sR(\tilde y_1,\tilde y_2|\ex,\eps)=0 \forall(\tilde y_1,\tilde y_2)\in\{0,1\}\times\{0,1\}\text{s.t. }(\tilde y_1,\tilde y_2)\neq(y_1,y_2).\label{eq:games:sel:mec:2} \end{align} [[/math]]

Let [math]\Phi_r[/math] denote the probability distribution of a bivariate Normal random variable with zero means, unit variances, and correlation [math]r\in[-1,1][/math]. Let [math]\sM(y_1,y_2|\ex)[/math] denote the model predicted probability that the outcome of the game realizes equal to [math](y_1,y_2)[/math]. Then the model yields

[[math]] \begin{align} \sM(y_1,y_2|\ex)&=\int\sR(y_1,y_2|\ex,\eps)d\Phi_r\notag\\ &=\int_{(\eps_1,\eps_2)\in\cE_\vartheta[(y_1,y_2);\ex]}d\Phi_r+\int_{\eps_1,\eps_2\in\cE_\vartheta[(1,0),(0,1);\ex]}\sR(y_1,y_2|\ex,\eps)d\Phi_r.\label{eq:games_model:pred} \end{align} [[/math]]

Because [math]\sR(\cdot|\ex,\eps)[/math] is left completely unspecified, other than the basic restrictions listed above that render it an admissible selection mechanism, one can find multiple values for [math](\vartheta,\sR(\cdot|\ex,\eps))[/math] such that [math]\sM(y_1,y_2|\ex)=\sP(y_1,y_2|\ex)[/math] for all [math](y_1,y_2)\in\{0,1\}\times\{0,1\}[/math] [math]\ex[/math]-a.s.

PSNE outcomes of the game in Identification Problem as a function of [math](\eps_1,\eps_2)[/math].

Multiplicity of equilibria implies that the mapping from the model's exogenous variables [math](\ex_1,\ex_2,\eps_1,\eps_2)[/math] to outcomes [math](\ey_1,\ey_2)[/math] is a correspondence rather than a function. This violates the classical “principal assumptions” or “coherency conditions” for simultaneous discrete response models discussed extensively in the econometrics literature (e.g., [62][63][64][65][66]). Such coherency conditions require the existence of a unique reduced form, mapping the model's exogenous variables and parameters to a unique realization of the endogenous variable; hence, they constrain the model to be recursive or triangular in nature. As pointed out by [67], however, the coherency conditions shut down exactly the social interaction effect of interest by requiring, e.g., that [math]\delta_1\delta_2=0[/math], so that at least one player's action has no impact on the other player's payoff.

The desire to learn about interaction effects coupled with the difficulties generated by multiplicity of equilibria prompted the earlier literature to provide at least two different ways to achieve point identification. The first one relies on imposing simplifying assumptions that shift focus to outcome features that are common across equilibria. For example, [68][69][70] and [71] study entry games where the number, though not the identities, of entrants is uniquely predicted by the model in equilibrium. Unfortunately, however, these simplifying assumptions substantially constrain the amount of heterogeneity in player's payoffs that the model allows for. The second approach relies on explicitly modeling a selection mechanism which specifies the equilibrium played in the regions of multiplicity. For example, [67] assume it to be a constant; [72] assume a more flexible, covariate dependent parametrization; and [71] considers two possible selection mechanism specifications, one where the incumbent moves first, and the other where the most profitable player moves first. Unfortunately, however, the chosen selection mechanism can have non-trivial effects on inference, and the data and theory might be silent on which is more appropriate. A nice example of this appears in [71](Table VII). [61] review and extend a number of results on the identification of entry models extensively used in the empirical literature. [14] discusses the observable implications of models with multiple equilibria, and within the analysis of a model with homogeneous preferences shows that partial identification is possible (see [14](p. 1435)). I refer to [73] for a review of the literature on econometric analysis of games with multiple equilibria.

[21] show, on the other hand, that it is possible to partially identify entry models that allow for rich heterogeneity in payoffs and for any possible selection mechanism (even ones that are arbitrarily dependent on the unobservable payoff shifters after conditioning on the observed payoff shifters). In addition, [20] provides sufficient conditions for point identification based on exclusion restrictions and large support assumptions. [74] analyze partial identification of nonparametric models of entry in a two-player model, drawing connections with the program evaluation literature.

Key Insight: An important conceptual contribution of [20] is to clarify the distinction between a model which is incoherent, so that no reduced form exists, and a model which is incomplete, so that multiple reduced forms may exist. Models with multiple equilibria belong to the latter category. Whereas the earlier literature in partial identification had been motivated by measurement problems, e.g., missing or interval data, the work of [20] and [21] is motivated by the fact that economic theory often does not specify how an equilibrium is selected in the regions of the exogenous variables which admit multiple equilibria. This is a conceptually completely distinct identification problem.

[21] propose to use simple and tractable implications of the model to learn features of the structural parameters of interest. Specifically, they point out that the probability of observing any outcome of the game cannot be smaller than the model's implied probability that such outcome is the unique equilibrium of the game, and cannot be larger than the model's implied probability that such outcome is one of the possible equilibria of the game. Looking at Figure this means, for example, that the observed [math]\sP((\ey_1,\ey_2)=(0,1)|\ex_1,\ex_2)[/math] cannot be smaller than the probability that [math](\eps_1,\eps_2)[/math] realizes in the dotted region, and cannot be larger than the probability that it realizes either in the dotted region or in the gray region. Compared to the model predicted distribution in \eqref{eq:games_model:pred}, this means that [math]\sP((\ey_1,\ey_2)=(0,1)|\ex_1,\ex_2)[/math] cannot be smaller than the expression obtained setting, for [math]\eps\in\Eps_\vartheta[(1,0);(0,1);\ex][/math], [math]\sR(0,1|\ex,\eps)=0[/math], and cannot be larger than that obtained with [math]\sR(0,1|\ex,\eps)=1[/math]. Denote by [math]\Phi(A_1,A_2;\rho)[/math] the probability that the bivariate normal with mean vector zero, variances equal to one, and correlation [math]\rho[/math] assigns to the event [math]\{\eps_1\in A_1,\eps_2\in A_2\}[/math]. Then [21] show that any [math]\vartheta=[d_1,d_2,b_1,b_2,r][/math] that is observationally equivalent to the data generating value [math]\theta[/math] satisfies, [math](\ex_1,\ex_2)[/math]-a.s.,

[[math]] \begin{align} \sP((\ey_1,\ey_2)=(0,0)|\ex_1,\ex_2)&=\Phi((-\infty,-\ex_1b_1),(-\infty,-\ex_2b_2);r)\label{eq:CT_00}\\ \sP((\ey_1,\ey_2)=(1,1)|\ex_1,\ex_2)&=\Phi([-\ex_1b_1-d_1,\infty),[-\ex_2b_2-d_2,\infty);r)\label{eq:CT_11}\\ \sP((\ey_1,\ey_2)=(0,1)|\ex_1,\ex_2)&\le\Phi((-\infty,-\ex_1b_1-d_1),(-\ex_2b_2,\infty);r)\label{eq:CT_01U}\\ \sP((\ey_1,\ey_2)=(0,1)|\ex_1,\ex_2)&\ge\Big\{\Phi((-\infty,-\ex_1b_1-d_1),(-\ex_2b_2,\infty);r)\notag\\ &\quad\quad-\Phi((-\ex_1b_1,-\ex_1b_1-d_1),(-\ex_2b_2,-\ex_2b_2-d_2);r)\Big\}\label{eq:CT_01L} \end{align} [[/math]]

While the approach of [21] is summarized here for a two player entry game, it extends without difficulty to any finite number of players and actions and to solution concepts other than pure strategy Nash equilibrium.

[75] build on the insights of [21] to study what is the identification power of equilibrium in games. To do so, they compare the set-valued model predictions and what can be learned about [math]\theta[/math] when one assumes only level-[math]k[/math] rationality as opposed to Nash play. In static entry games of complete information, they find that the model's predictions when [math]k\ge 2[/math] are similar to those obtained with Nash behavior and allowing for multiple equilibria and mixed strategies. [76] extend the analysis of [75] to the class of supermodular games.

The collections of parameter vectors satisfying (in)equalities \eqref{eq:CT_00}-\eqref{eq:CT_01L} yields the sharp identification region [math]\idr{\theta}[/math] in the case of two player entry games with pure strategy Nash equilibrium as solution concept, as shown by [77](Supplementary Appendix D, Corollary D.4). When there are more than two players or more than two actions (or with different solutions concepts, such as, e.g., mixed strategy Nash equilibrium; correlated equilibrium; or rationality of level [math]k[/math] as in [75]), the characterization in [21] obtained by extending the reasoning just laid out yields an outer region. [77] use elements of random set theory to provide a general and computationally tractable characterization of the identification region that is sharp, regardless of the number of players and actions, or the solution concept adopted. For the case of PSNE with any finite number of players or actions, [78] provide a computationally tractable sharp characterization of the identification region using elements of optimal transportation theory.

Characterization of Sharpness through Random Set Theory

[77] provide a general approach based on random set theory that delivers sharp identification regions on parameters of structural semiparametric models with set valued predictions. Here I summarize it for the case of static, simultaneous move finite games of complete information, first with PSNE as solution concept and then with mixed strategy Nash equilibrium. Then I discuss games of incomplete information.

For a given [math]\vartheta\in\Theta[/math], denote the set of pure strategy Nash equilibria (depicted in Figure) as [math]\eY_\vartheta(\ex,\eps)[/math]. It is easy to show that [math]\eY_\vartheta(\ex,\eps)[/math] is a random closed set as in Definition. Under the assumption in Identification Problem that [math]\ey[/math] results from simultaneous move, pure strategy Nash play, at the true DGP value of [math]\theta\in\Theta[/math], one has

[[math]] \begin{align} \ey\in\eY_\theta\text{a.s.}\label{eq:y_in_Y_games} \end{align} [[/math]]

Equation \eqref{eq:y_in_Y_games} exhausts the modeling content of Identification Problem. Theorem can be leveraged to extract its empirical content from the observed distribution [math]\sP(\ey,\ex)[/math]. For a given [math]\vartheta\in\Theta[/math] and [math]K\subset\cY[/math], let [math]\sT_{\eY_{\vartheta}(\ex,\eps)}(K;\Phi_r)[/math] denote the probability of the event [math]\{\eY_\vartheta(\ex,\eps)\cap K\neq \emptyset\}[/math] implied when [math]\eps\sim\Phi_r[/math], [math]\ex[/math]-a.s.

Theorem (Structural Parameters in Static, Simultaneous Move Finite Games of Complete Information with PSNE)


Under the assumptions of Identification Problem, the sharp identification region for [math]\theta[/math] is

[[math]] \begin{align} \idr{\theta}=\{\vartheta\in\Theta:\sP(\ey\in K|\ex)\le \sT_{\eY_{\vartheta}(\ex,\eps)}(K;\Phi_r)\,\forall K\subset\cY, \, \ex\text{-a.s.}\}.\label{eq:SIR:entry_game} \end{align} [[/math]]

Show Proof

To simplify notation, let [math]\eY_\vartheta\equiv \eY_\vartheta(\ex,\eps)[/math]. In order to establish sharpness, it suffices to show that [math]\vartheta\in \idr{\theta}[/math] if and only if one can complete the model with an admissible selection mechanism [math]\sR(y_1,y_2|\ex,\eps)[/math] such that [math]\sR(y_1,y_2|\ex,\eps)\ge 0[/math] for all [math](y_1,y_2)\in\{0,1\}\times\{0,1\}[/math], [math]\sum_{(y_1,y_2)\in\{0,1\}\times\{0,1\}}\sR(y_1,y_2|\ex,\eps)=1[/math], and satisfying \eqref{eq:games:sel:mec:1}-\eqref{eq:games:sel:mec:2}, so that [math]\sM(y_1,y_2|\ex)=\sP(y_1,y_2|\ex)[/math] for all [math](y_1,y_2)\in\{0,1\}\times\{0,1\}[/math] [math]\ex[/math]-a.s., with [math]\sM(y_1,y_2|\ex)[/math] defined in \eqref{eq:games_model:pred}. Suppose first that [math]\vartheta[/math] is such that a selection mechanism with these properties is available. Then there exists a selection of [math]\eY_\vartheta[/math] which is equal to the prediction selected by the selection mechanism and whose conditional distribution is equal to [math]\sP(\ey|\ex)[/math], [math]\ex[/math]-a.s., and therefore [math]\vartheta \in \idr{\theta}[/math]. Next take [math]\vartheta \in \idr{\theta}[/math]. Then by Theorem, [math]\ey[/math] and [math]\eY_\vartheta[/math] can be realized on the same probability space as random elements [math]\ey'[/math] and [math]\eY'_\vartheta[/math], so that [math]\ey'[/math] and [math]\eY'_\vartheta[/math] have the same distributions, respectively, as [math]\ey[/math] and [math]\eY_\vartheta[/math], and [math]\ey' \in \Sel(\eY'_\vartheta)[/math], where [math]\Sel(\eY'_\vartheta)[/math] is the set of all measurable selections from [math]\eY'_\vartheta[/math], see Definition. One can then complete the model with a selection mechanism that picks [math]\ey'[/math] with probability 1, and the result follows.

The characterization provided in Theorem SIR- for games with multiple PSNE, taken from [77](Supplementary Appendix D), is equivalent to the one in [78]. When [math]J=2[/math] and [math]\cY=\{0,1\}\times\{0,1\}[/math], the inequalities in \eqref{eq:SIR:entry_game} reduce to \eqref{eq:CT_00}-\eqref{eq:CT_01L}. With more players and/or more actions, the inequalities in \eqref{eq:SIR:entry_game} are a superset of those in \eqref{eq:CT_00}-\eqref{eq:CT_01L}, with the latter comprised of the ones in \eqref{eq:SIR:entry_game} for [math]K=\{k\}[/math] and [math]k=\cY\setminus\{k\}[/math], for all [math]k\in\cY[/math]. Hence, the inequalities in \eqref{eq:SIR:entry_game} are more informative. Of course, the computational cost incurred to characterize [math]\idr{\theta}[/math] may grow with the number of inequalities involved. I discuss computational challenges in partial identification in Section.

Key Insight:(Random set theory and partial identification -- continued) In Identification Problem lack of point identification can be traced back to the set valued predictions delivered by the model, which in turn derive from the model incompleteness defined by [20]. As stated in the Introduction, constructing the (random) set of model predictions delivered by the maintained assumptions is an exercise typically carried out in identification analysis, regardless of whether random set theory is applied. Indeed, for the problem studied in this section, [20](Figure 1) put forward the set of admissible outcomes of the game. [77] propose to work directly with this random set to characterize [math]\idr{\theta}[/math]. The fundamental advantage of this approach is that it dispenses with considering the possible selection mechanisms that may complete the model. Selection mechanisms may depend on the model's unobservables even after conditioning on observables and may constitute an infinite dimensional nuisance parameter, which creates great difficulties for the computation of [math]\idr{\theta}[/math] and for inference.

Next, I discuss the case that the outcome of the game results from simultaneous move, mixed strategy Nash play.[Notes 24] When mixed strategies are allowed for, the model predicts multiple mixed strategy Nash equilibria (MSNE). But whereas when only pure strategies are allowed for, if the model is correctly specified, the observed outcome of the game is one of the predicted PSNE, with mixed strategy it is only the result of a random mixing draw from one of the predicted MSNE. Hence, the identification problem is more complex, and in order to obtain a tractable characterization of [math]\theta[/math]'s sharp identification region one needs to use different tools from random set theory.

To keep the treatment simple here I continue to consider the case of two players with two strategies, as in Identification Problem, with mixed strategies allowed for, and refer to [30](Section 3.4) for the general case. Fix [math]\vartheta\in\Theta[/math]. Let [math]\sigma_j:\{0,1\}\to [0,1][/math] denote the probability that player [math]j[/math] enters the market, with [math]1-\sigma_j[/math] the probability that she stays out. With some abuse of notation, let [math]\bu_j(\sigma_j,\sigma_{-j},\ex_j,\eps_j,\vartheta)[/math] denote the expected payoff associated with the mixed strategy profile [math]\sigma=(\sigma_1,\sigma_2)[/math]. For a given realization [math](x,e)[/math] of [math](\ex,\eps)[/math] and a given value of [math]\vartheta\in\Theta[/math], the set of mixed strategy Nash equilibria is

[[math]] \begin{multline*} S_\vartheta(x,e) =\bigg\{\sigma \in [0,1]^2:\; \bu_j(\sigma_j,\sigma_{-j},x_j,e_j;\vartheta) \geq \bu_j(\tilde{\sigma}_j,\sigma_{-j},x_j,e_j;\vartheta)\;\, \forall \tilde{\sigma}_j\in [0,1]\; j=1,2\bigg\}. \end{multline*} [[/math]]

[77] show that [math]\eS_\vartheta\equiv S_\vartheta(\ex,\eps)[/math] is a random closed set in [math][0,1]^2[/math]. Its realizations are illustrated in Panel (a) of Figure as a function of [math](\eps_1,\eps_2)[/math].[Notes 25]

MSNE strategies ([math]\eS_\vartheta[/math]), set of multinomial distributions over outcomes of the game ([math]\eQ_\vartheta[/math]), and its support function ([math]h_{\eQ_\vartheta}[/math]), as a function of [math](\eps_1,\eps_2)[/math], where [math]\sigma_1^*\equiv\frac{-\eps_2-\ex_2\beta_2}{\vartheta_2},\sigma_2^*\equiv\frac{-\eps_1-\ex_1\beta_1}{\vartheta_1}[/math].

Define the set of possible multinomial distributions over outcomes of the game associated with the selections [math]\sigma[/math] of each possible realization of [math]\eS_{\vartheta}[/math] as

[[math]] \begin{equation} \label{eq:Q-set} \eQ_\vartheta=\left\{\eq(\sigma)\equiv \begin{bmatrix} (1-\sigma_1)(1-\sigma_2)\\ \sigma_1(1-\sigma_2)\\ (1-\sigma_1)\sigma_2\\ \sigma_1\sigma_2 \end{bmatrix}:\, \sigma \in \eS_\vartheta\right\}. \end{equation} [[/math]]

As [math]\eQ_\vartheta[/math] is the image of a continuous map applied to the random compact set [math]\eS_\vartheta[/math], it is a random compact set. Its realizations are plotted in Panel (b) of Figure as a function of [math](\eps_1,\eps_2)[/math].

The multinomial distribution over outcomes of the game determined by a given [math]\sigma\in\eS_\vartheta[/math] is a function of [math]\eps[/math]. To obtain the predicted distribution over outcomes of the game conditional on observed payoff shifters only, one needs to integrate out the unobservable payoff shifters [math]\eps[/math]. Doing so requires care, as it needs to be done for each [math]\eq(\sigma)\in\eQ_\vartheta[/math]. First, observe that all the [math]\eq(\sigma)\in\eQ_\vartheta[/math] are contained in the [math]3[/math] dimensional unit simplex, and are therefore integrable. Next, define the conditional selection expectation (see Definition) of [math]\eQ_\vartheta[/math] as

[[math]] \E_{\Phi_r}(\eQ_\vartheta|\ex)=\Big\{\E_{\Phi_r}(\eq(\sigma)|\ex):\; \sigma \in \Sel(\eS_\vartheta)\Big\}, [[/math]]

where [math]\Sel(\eS_\vartheta)[/math] is the set of all measurable selections from [math]\eS_\vartheta[/math], see Definition. By construction, [math]\E_{\Phi_r}(\eQ_\vartheta|\ex)[/math] is the set of probability distributions over action profiles conditional on [math]\ex[/math] which are consistent with the maintained modeling assumptions, i.e., with all the model's implications (including the assumption that [math]\eps\sim\Phi_r[/math]). If the model is correctly specified, there exists at least one vector [math]\theta\in\Theta[/math] such that the observed conditional distribution [math]\cp(\ex)\equiv[\sP(\ey=y^1|\ex),\dots,\sP(\ey=y^4|\ex)]^\top[/math] almost surely belongs to the set [math]\E_{\Phi_\rho}(\eQ_\theta|\ex)[/math]. Indeed, by the definition of [math]\E_{\Phi_\rho}(\eQ_\theta|\ex)[/math], [math]\cp(\ex)\in \E_{\Phi_\rho}(\eQ_\theta|\ex)[/math] almost surely if and only if there exists [math]\eq\in \Sel(\eQ_\theta)[/math] such that [math]\E_{\Phi_\rho}(\eq|\ex)=\cp(\ex)[/math] almost surely, with [math]\Sel(\eQ_\theta)[/math] the set of all measurable selections from [math]\eQ_\theta[/math]. Hence, the collection of parameter vectors [math]\vartheta\in\Theta[/math] that are observationally equivalent to the data generating value [math]\theta[/math] is given by the ones that satisfy [math]\cp(\ex)\in \E_{\Phi_r}(\eQ_\vartheta|\ex)[/math] almost surely. In turn, observing that by Theorem the set [math]\E_{\Phi_r}(\eQ_\vartheta|\ex)[/math] is convex, we have that [math]\cp(\ex)\in \E_{\Phi_r}(\eQ_\vartheta|\ex)[/math] if and only if [math]u^\top \cp(\ex)\leq h_{\E_{\Phi_r}(\eQ_\vartheta|\ex)}(u)[/math] for all [math]u[/math] in the unit ball (see, e.g., [79](Theorem 13.1)), where [math]h_{\E_{\Phi_r}(\eQ_\vartheta|\ex)}(u)[/math] is the support function of [math]\E_{\Phi_r}(\eQ_\vartheta|\ex)[/math], see Definition.

Theorem (Structural Parameters in Static, Simultaneous Move Finite Games of Complete Information with MSNE)

Under the assumptions in Identification Problem, allowing for mixed strategies and with the observed outcomes of the game resulting from mixed strategy Nash play, the sharp identification region for [math]\theta[/math] is

[[math]] \begin{align} \idr{\theta} &=\bigg\{\vartheta\in \Theta:\; \max_{u\in\mathbb{B}^{|\cY|}}\left( u^\top \cp(\ex) -\E_{\Phi_r}[h_{\eQ_\vartheta}(u)|\ex]\right)=0,\, \ex\text{-a.s.}\bigg\} \label{eq:SIR_sharp_mixed_sup}\\ &=\bigg\{\vartheta \in \Theta:\; \int_{\mathbb{B}^{|\cY|}} (u^\top \cp(\ex) -\E_{\Phi_r}[h_{\eQ_\vartheta}(u)|\ex])_+ \mathrm{d}\mu(u)=0,\, \ex\text{-a.s.}\bigg\} \label{eq:SIR_sharp_mixed_int}, \end{align} [[/math]]

where [math]\mu[/math] is any probability measure on [math]\mathbb{B}^{|\cY|}[/math], and [math]|\cY|=4[/math] in this case.

Show Proof

Theorem (equation eq:dom_Aumann:cond) yields \eqref{eq:SIR_sharp_mixed_sup}, because by the arguments given before the theorem, [math]\idr{\theta}=\{\vartheta \in \Theta:\;\cp(\ex)\in \E_{\Phi_r}(\eQ_\vartheta|\ex),\ex\text{-a.s.}\}[/math]. The result in \eqref{eq:SIR_sharp_mixed_int} follows because the integrand in \eqref{eq:SIR_sharp_mixed_int} is continuous in [math]u[/math] and both conditions inside the curly brackets are satisfied if and only if [math]u^\top \cp(\ex)-\E_{\Phi_r}[h_{\eQ_\vartheta}(u)|\ex]\leq 0[/math] [math]\forall u\in \mathbb{B}^{|\cY|}[/math] [math]\ex[/math]-a.s.

For a fixed [math]u\in\mathbb{B}^4[/math], the possible realizations of [math]h_{\eQ_\vartheta}(u)[/math] are plotted in Panel (c) of Figure as a function of [math](\eps_1,\eps_2)[/math]. The expectation of [math]h_{\eQ_\vartheta}(u)[/math] is quite straightforward to compute, whereas calculating the set [math]\E_{\Phi_r}(\eQ_\vartheta|\ex)[/math] is computationally prohibitive in many cases. Hence, the characterization in \eqref{eq:SIR_sharp_mixed_sup} is computationally attractive, because for each [math]\vartheta\in\Theta[/math] it requires to maximize an easy-to-compute superlinear, hence concave, function over a convex set, and check if the resulting objective value vanishes. Several efficient algorithms in convex programming are available to solve this problem, see for example the MatLab software for disciplined convex programming CVX [80]. Nonetheless, [math]\idr{\theta}[/math] itself is not necessarily convex, hence tracing out its boundary is non-trivial. I return to computational challenges in partial identification in Section.\medskip

Key Insight: Random set theory and partial identification -- continued [77] provide a general characterization of sharp identification regions for models with convex moment predictions. These are models that for a given [math]\vartheta\in\Theta[/math] and realization of observable variables, predict a set of values for a vector of variables of interest. This set is not necessarily convex, as exemplified by [math]\eY_\vartheta[/math] and [math]\eQ_\vartheta[/math], which are finite. No restriction is placed on the manner in which, in the DGP, a specific model prediction is selected from this set. When the researcher takes conditional expectations of the resulting elements of this set, the unrestricted process of selection yields a convex set of moments for the model variables (all possible mixtures). This is the model's convex set of moment predictions. If this set were almost surely single valued, the researcher would learn (features of) [math]\theta[/math] by solving moment equality conditions involving the observed variables and predicted ones. The approach reviewed in this section is a set-valued method of moments that extends the singleton-valued one commonly used in econometrics.

I conclude this section discussing the case of static, simultaneous move finite games of incomplete information, using the results in [77](Supplementary Appendix C).[Notes 26] For clarity, I formalize the maintained assumptions.

Identification Problem (Structural Parameters in Static, Simultaneous Move Finite Games of Incomplete Information with multiple BNE)

Impose the same structure on payoffs, entry decision rule, outcome space, parameter space, and observable variables as in Identification Problem. Assume that the observed outcome of the game results from simultaneous move, pure strategy Bayesian Nash play. Both players and the researcher observe [math](\ex_1,\ex_2)[/math]. However, [math]\eps_j[/math] is private information to player [math]j=1,2[/math] and unobservable to the researcher, with [math]\eps_1\independent\eps_2|(\ex_1,\ex_2)[/math]. Assume that players have correct common prior [math]\sF_\gamma[/math] on the distribution of [math](\eps_1,\eps_2)[/math] and the researcher knows this distribution up to [math]\gamma[/math], a finite dimensional parameter vector. Under these assumptions, multiple Bayesian Nash equilibria (BNE) may result.[Notes 27] In the absence of additional information, what can the researcher learn about [math]\theta=[\delta_1\delta_2\beta_1\beta_2\gamma][/math]?


With incomplete information, players' strategies are decision rules that map the support of [math](\eps,\ex)[/math] into [math]\{0,1\}[/math]. The non-negativity condition on expected payoffs that determines each player's decision to enter the market results in equilibrium mappings (decision rules) that are step functions determined by a threshold: [math]y_j(\eps_j) =\one(\eps_j\geq t_j), j=1,2[/math]. As a result, player [math]j[/math]'s beliefs about player [math]3-j[/math]'s probability of entry under the common prior assumption is [math]\int y_{3-j}(\eps_{3-j}) d\sF_\gamma(\eps_{3-j}|\ex) =1-\sF_\gamma(t_{3-j}|\ex)[/math], and therefore player [math]j[/math]'s best response cutoff is

[[math]] \begin{align*} t_j^b(t_{3-j},\ex;\theta)=-\ex_j\beta_j-\delta_j(1-\sF_\gamma(t_{3-j}|\ex)). \end{align*} [[/math]]

Hence, the set of equilibria can be defined as the set of cutoff rules:

[[math]] \begin{equation*} \eT_{\theta}(\ex)=\left\{(t_1,t_2\right):t_j=t_j^b(t_{3-j},\ex;\theta),j=1,2\}. \end{equation*} [[/math]]

The equilibrium thresholds are functions of [math]\ex[/math] and [math]\theta[/math] only. The set [math]\eT_{\theta}(\ex)[/math] might contain a finite number of equilibria (e.g., if the common prior is the Normal distribution), or a continuum of equilibria. For ease of notation I suppress its dependence on [math]\ex[/math] in what follows. Given the equilibrium decision rules (the selections of the set [math]\eT_\theta[/math]), it is possible to determine their associated action profiles. Because in the simple two-player entry game that I consider actions and outcomes coincide, I denote the set of admissible action profiles by [math]\eY_\theta[/math]:

[[math]] \begin{align} \eY_\theta=\left\{ \ey(\et)\equiv \begin{bmatrix} \one(\eps_1 \lt \et_1,\eps_2 \lt \et_2)\\ \one(\eps_1\ge\et_1,\eps_2 \lt \et_2)\\ \one(\eps_1 \lt \et_1,\eps_2\ge\et_2)\\ \one(\eps_1\ge\et_1,\eps_2\ge\et_2) \end{bmatrix} :\et\in\Sel(\eT_\theta) \right\},\label{eq:q_incomplete} \end{align} [[/math]]

with [math]\Sel(\eT_\theta)[/math] the set of all measurable selections from [math]\eT_\theta[/math], see Definition. To obtain the predicted set of multinomial distributions for the outcomes of the game, one needs to integrate out [math]\eps[/math] conditional on [math]\ex[/math]. Again this can be done by using the conditional Aumann expectation:

[[math]] \begin{equation*} \E_{\sF_\gamma}(\eY_\theta|\ex)=\{\E_{\sF_\gamma}(\ey(\et)|\ex):\et\in\Sel(\eT_\theta)\}. \end{equation*} [[/math]]

This set is closed and convex. Regardless of whether [math]\eT_\theta[/math] contains a finite number of equilibria or a continuum, [math]\eY_\theta[/math] can take on only a finite number of realizations corresponding to each of the vertices of the three dimensional simplex, because the vectors [math]\ey(\et)[/math] in \eqref{eq:q_incomplete} collect threshold decision rules. This implies that [math]\E_{\sF_\gamma}(\eY_\theta|\ex)[/math] is a closed convex polytope [math]\ex[/math]-a.s., fully characterized by a finite number of supporting hyperplanes. Hence, it is possible to determine whether [math]\vartheta\in\idr{\theta}[/math] using efficient algorithms in linear programming.

Theorem (Structural Parameters in Static, Simultaneous Move Finite Games of Incomplete Information with BNE)

Under the assumptions in Identification Problem, the sharp identification region for [math]\theta[/math] is

[[math]] \begin{align} \idr{\theta} &=\bigg\{\vartheta\in \Theta:\; \max_{u\in\mathbb{B}^{|\cY|}} u^\top \cp(\ex) -\E_{\sF_{\tilde\gamma}}[h_{\eY_\vartheta}(u)|\ex]=0,\, \ex\text{-a.s.}\bigg\} \label{eq:SIR:incomplete_info:1} \\ &=\bigg\{\vartheta\in \Theta:\; u^\top \cp(\ex) \le \E_{\sF_{\tilde\gamma}}[h_{\eY_\vartheta}(u)|\ex],\,\forall u\in D, \ex\text{-a.s.}\bigg\},\label{eq:SIR:incomplete_info:2} \\ &= \bigg\{\vartheta\in \Theta:\; \sP(\ey\in K|\ex)\le \sT_{\eY_{\vartheta}(\ex,\eps)}(K;\sF_{\tilde\gamma})\,\forall K\subset\cY,\, \ex\text{-a.s.}\bigg\} \label{eq:SIR:incomplete_info:0}, \end{align} [[/math]]
with [math]D=\{u=[u_1,\dots,u_{|\cY|}]^\top:u_i\in\{0,1\},i=1,...,|\cY|\} [/math], [math]\vartheta=[d_1,d_2,b_1,b_2,\tilde\gamma][/math], and [math]\sT_{\eY_{\vartheta}(\ex,\eps)}(K;\sF_{\tilde\gamma})[/math] the probability that [math]\{\eY_\vartheta(\ex,\eps)\cap K\neq \emptyset\}[/math] implied when [math]\eps\sim\sF_{\tilde\gamma}[/math], [math]\ex[/math]-a.s.

Show Proof

The result in \eqref{eq:SIR:incomplete_info:1} follows by the same argument as in the proof of Theorem SIR-. Next I show equivalence of the conditions

[[math]] \begin{align*} (i)&u^\top\cp(\ex)\le\E_{\sF_{\tilde\gamma}}[h_{\eY_\vartheta}(u)|\ex]\forall u\in\mathbb{B}^{|\cY|}, \\ (ii)&u^\top\cp(\ex)\le\E_{\sF_{\tilde\gamma}}[h_{\eY_\vartheta}(u)|\ex]\forall u\in D. \end{align*} [[/math]]
By the positive homogeneity of the support function, condition [math](i)[/math] is equivalent to [math]\cp(\ex)\le\E_{\sF_{\tilde\gamma}}[h_{\eY_\vartheta}(u)|\ex]\forall u\in\R^{|\cY|}[/math], which implies condition [math](ii)[/math]. Next I show that condition [math](ii)[/math] implies condition [math](i)[/math]. As explained before, the set [math]\eY_\theta[/math], and hence also its convex hull [math]\conv(\eY_\theta)[/math], can take on only a finite number of realizations. Let [math]Y_1,\dots,Y_m[/math] be convex compact sets in the simplex of dimension [math]|\cY|-1[/math] equal to the possible realizations of [math]\conv(\eY_\theta)[/math], and let [math]\varpi_1(\ex),\dots,\varpi_m(\ex)[/math] denote the probability of each of these realizations conditional on [math]\ex[/math]. Then by Theorem 2.1.34 in [81], [math]\E_{\sF_{\tilde\gamma}}(\eY_\theta|\ex)=\sum_{j=1}^m Y_j\varpi_j(\ex)[/math]. By the properties of the support function (see, e.g., [82](Theorem 1.7.5)), [math]h_{\E_{\sF_{\tilde\gamma}}(\eY_\theta|\ex)}(u) =\sum_{j=1}^m \varpi_j(\ex)h_{Y_j}(u)[/math]. For each [math]j=1,...,m,[/math] the vertices of [math]Y_j[/math] are a subset of the vertices of the [math](|\cY|-1)[/math]-dimensional simplex. Hence the supporting hyperplanes of [math]Y_j,j=1,...,m[/math], are a subset of the supporting hyperplanes of that simplex, which in turn are obtained through its support function evaluated in directions [math]u\in D[/math]. Finally, I show equivalence with the result in \eqref{eq:SIR:incomplete_info:0}. Because the vertices of [math]Y_j[/math] are a subset of the vertices of the [math](|\cY|-1)[/math]-dimensional simplex, each direction [math]u\in D[/math] determines a set [math]K_u\subset \cY[/math]. Given the choice of [math]u[/math], the value of [math]u^\top\ey(\et)[/math] equals one if [math]\ey(\et)\in K_u[/math] and zero otherwise. Hence, condition \eqref{eq:SIR:incomplete_info:2} reduces to

[[math]] \begin{align*} \sP(\ey\in K_u|\ex) = u^\top \cp(\ex) &\le \E_{\sF_{\tilde\gamma}}[h_{\eY_\vartheta}(u)|\ex] = \E_{\sF_{\tilde\gamma}}\left[\sup_{\ey(\et)\in\eY_\vartheta}u^\top\ey(\et)|\ex\right] \\ &= \E_{\sF_{\tilde\gamma}}[\one(\eY_\vartheta\cap K_u\neq \emptyset)|\ex]=\sT_{\eY_{\vartheta}(\ex,\eps)}(K_u;\sF_{\tilde\gamma}). \end{align*} [[/math]]
Observing that the collection [math]D[/math] comprises the [math]2^{|\cY|}[/math] vectors with entries equal to either 1 or 0, and that these determine all possible subsets [math]K_u[/math] of [math]\cY[/math], yields condition \eqref{eq:SIR:incomplete_info:0}.

One can use the same argument as in the proof of Theorem SIR-, to show that the Aumann expectation/support function characterization of the sharp identification region in Theorem SIR- coincides with the characterization based on the capacity functional in Theorem SIR-, when only pure strategies are allowed for. This shows that in this class of models, the capacity functional based characterization is a special case of the Aumann expectation/support function based one. [75] study what is the identification power of equilibrium also in the case of static entry games with incomplete information. They show that in the presence of multiple equilibria, assuming Bayesian Nash behavior yields more informative regions for the parameter vector [math]\theta[/math] than assuming only rational behavior, but at the price of a higher computational cost. [83] propose a procedure to test for the sign of the interaction effects (which here I have assumed to be non-positive) in discrete simultaneous games with incomplete information and (possibly) multiple equilibria. As a by-product of this procedure, they also provide a test for the presence of multiple equilibria in the DGP. The test does not require parametric specifications of players' payoffs, the distributions of their private signals, or the equilibrium selection mechanism. Rather, the test builds on the commonly invoked assumption that players' private signals are independent conditional on observed states. [84] introduces an important class of models with flexible information structure. Each player is assumed to have a vector of payoff shifters unobservable by the researcher composed of elements that are private information to the player, and elements that are known to all players. The results of [77] reported in this section apply to this set-up as well.

Auction Models with Independent Private Values

An Inference Approach Robust to Bidding Behavior Assumptions

[22] study what can be learned about the distribution of valuations in an open outcry English auction where symmetric bidders have independent private values for the object being auctioned. The standard theoretical model [85], called “button auction” model, posits that each bidder holds down a button while the object’s price rises continuously and exogenously, releasing it (in the dominant strategy equilibrium) when it reaches her valuation or all her opponents have left. In this case, the distribution of bidder's valuation can be learned exactly. [22] show that much can be learned about the distribution of valuations, even allowing for the fact that real-life auctions may depart from this stylized framework, as in the following identification problem.[Notes 28] \begin{IP}[Incomplete Auction Model with Independent Private Values]\label{IP:auction} For a given auction with [math]n \lt \infty[/math] participating bidders, let [math]\ev_i\sim\sQ,i=1,\dots,n,[/math] be bidder [math]i[/math]'s valuation for the object being auctioned and assume that [math]\ev_i\independent \ev_j[/math] for all [math]i\neq j[/math]. Assume that the support of [math]\sQ[/math] is [math][\underline{v},\bar{v}][/math] and that each bidder knows her own valuation but not that of her opponents. Let the auctioneer set a minimum bid increment [math]\delta\in [0,\bar{v})[/math], and for simplicity suppose there is no reserve price.[Notes 29] Suppose the researcher observes order statistics of the bids, [math]\vec{\eb}_n\equiv(\eb_{1:n},\dots,\eb_{n:n})\sim\sP[/math] in [math]\R^n_+[/math], with [math]\eb_{i:n}[/math] the [math]i[/math]-th lowest of the [math]n[/math] bids. Assume that: (1) Bidders do not bid more than they are willing to pay; (2) Bidders do not allow an opponent to win at a price they are willing to beat. In the absence of additional information, what can the researcher learn about [math]\sQ[/math]? |}}

A realization of the model predicted ordered bids [math]\eB(\vec{\ev}_n)[/math] in \eqref{eq:RCS_auction} for [math]n=3,\vec{\ev}_n=v^0,\delta=0[/math].

The model in Identification Problem delivers set valued predictions because given valuations [math](\ev_1,\dots,\ev_n)[/math], the two fundamental assumptions about bidder's behavior yield

[[math]] \begin{align} \vec{\eb}_n \in \eB(\vec{\ev}_n)\equiv\left[\left\{\prod_{i=1}^{n-1}[\underline{v},\ev_{i:n}]\right\}\times [\ev_{n-1:n}-\delta,\ev_{n:n}]\right]\cap V_n,\label{eq:RCS_auction} \end{align} [[/math]]

where [math]\vec{\ev}_n\equiv(\ev_{1:n},\dots,\ev_{n:n})[/math] denotes the vector of order statistics of the valuations, and [math]V_n=\{v\in\R^n:\underline{v}\le v_1\le v_2\le\dots\le v_n\le \bar{v}\}[/math].[Notes 30] Figure provides a stylized depiction of a realization of this set for [math]\vec{\ev}_n=v^0[/math] when there are three bidders ([math]n=3[/math]), [math]\underline{v}=0[/math], and [math]\delta=0[/math]. In words, [math]\eB(\vec{\ev}_n)[/math] collects the model predicted values of ordered bids. The fact that [math]\eb_{i:n}\le \ev_{i:n}[/math] for all [math]i[/math] results from assumption (1): since each bidder bids at most an amount equal to her valuation, the [math]i[/math]-th highest bid cannot exceed the [math]i[/math]-th highest valuation [22](Lemma 1).[Notes 31] The fact that [math]\eb_{n:n}\ge \ev_{n-1,n}-\delta[/math] follows immediately from assumption (2) [22](Lemma 3). The fact that [math]\vec{\eb}_n[/math] has to lie in [math]V_n[/math] follows because it is a vector of ordered bids. Why does this set-valued prediction hinder point identification? The reason is that the distribution of the observable data relates to the model structure in an incomplete manner.[Notes 32] Define a bidding rule [math]\sB(\eb_{1:n},\dots,\eb_{n:n}|\ev_{1:n},\dots,\ev_{n:n})[/math] to be a conditional joint distribution for the order statistics of the bids conditional on the order statistics of the valuations. Then, for a given realization of the valuations [math]\ev_{1:n}=v_1,\dots,\ev_{n:n}=v_n[/math], the model requires that the support of [math]\sB(\cdot|v_1,\dots,v_n)[/math] is in [math]B(\vec{v})[/math] as defined in \eqref{eq:RCS_auction} with [math]\ev_{1:n}=v_1,\dots,\ev_{n:n}=v_n[/math], but imposes no other restriction on it. Hence, the model implied joint distribution of ordered bids is

[[math]] \begin{align} \sM_{1,\dots,n:n}(\cdot;\sB,\sQ)\equiv\int \sB(\cdot|v_1,\dots,v_n)\sQ_{1,\dots,n:n}(dv_1,\dots,dv_n),\label{eq:model:impl_sel_mech_auction} \end{align} [[/math]]

where [math]\sQ_{1,\dots,n:n}[/math] is the joint distribution of order statistics of the valuations implied by [math]\sQ[/math]. Since the bidding rule [math]\sB[/math] is left completely unspecified (other than requiring it to be a valid joint conditional probability distribution with support in [math]\eB[/math]), one can find multiple pairs [math](\sB ,\sQ)[/math] satisfying the assumptions of Identification Problem, such that [math]\sM_{1,\dots,n:n}(\cdot;\sB,\sQ)=\sG_{1,\dots,n:n}(\cdot)[/math], with [math]\sG_{1,\dots,n:n}[/math] the observed joint CDF of the order statistics of the bids associated with [math]\sP[/math]. [22] propose to use simple and tractable implications of the model to learn features of [math]\sQ[/math]. Recall that with i.i.d. valuations, the distribution of each order statistic uniquely determines [math]\sQ(v)[/math], with [math]\sQ(v)\equiv\sQ(\ev\le v)[/math] for any [math]v\ge\underline{v}[/math], through:

[[math]] \begin{align} \sQ(v)=\sq_{\cB}(\sQ_{i:n}(v);i,n-i+1),\label{eq:HT:beta} \end{align} [[/math]]

where [math]\sQ_{i:n}[/math] is the CDF of [math]\ev_{i:n}[/math] and [math]\sq_{\cB}(\cdot;i,n-i+1)[/math] is the quantile function of a Beta-distributed random variable with parameters [math]i[/math] and [math]n-i+1[/math]. Using this, their Lemmas 1 and 3 yield, respectively,

[[math]] \begin{align} \sQ(v) &\le \min_{n,i}\sq_{\cB}(\sG_{i:n}(v);i,n-i+1),\forall v\in[\underline{v},\bar{v}],\label{eq:HT_upper}\\ \sQ(v) &\ge \max_{n}\sq_{\cB}(\sG_{n:n}(v-\delta);i,n-i+1),\forall v\in[\underline{v},\bar{v}],\label{eq:HT_lower} \end{align} [[/math]]

where, for any [math]v\ge\underline{v}[/math], [math]\sG_{i:n}(v)\equiv\sP(\eb_{i:n}\le v)[/math] denotes the observed CDF of [math]\eb_{i:n}[/math] for [math]i=1,\dots,n[/math].

Key Insight: The model and analysis put forward by [22] trade point identification of the distribution of valuation under stringent assumptions on the bidding rule, for a robust inference approach that yields informative bounds under weak and widely credible assumptions on bidding behavior. Remarkably, “nothing is lost” due to the use of their robust approach: point identification is recovered when the standard assumptions of the button auction model hold.[Notes 33] This is because in the dominant strategy equilibrium the top losing bidder exits at her valuation, followed immediately by the winning bidder. Hence, [math]\eb_{n-1:n}=\ev_{n-1:n}=\eb_{n:n}[/math] and [math]\delta=0[/math], so that the upper and the lower bound in \eqref{eq:HT_upper}-\eqref{eq:HT_lower} coincide and point identify the distribution of valuations.

[22] also provide sharp bounds on the optimal reserve price, which I do not discuss here. However, they leave open the question of whether the collection of CDFs satisfying \eqref{eq:HT_upper}-\eqref{eq:HT_lower} yields the sharp identification region for [math]\sQ[/math]. As discussed in Sections Selectively Observed Data-Interval Data, pointwise bounds on the CDF deliver tubes of admissible CDFs that in general yield outer regions on the CDF of interest. But in this identification problem, the issue of sharpness is even more subtle, and therefore addressed in the following subsection.

Before moving on to that discussion, I note that the work of [22] spurred a rich literature applying partial identification analysis to the study of auction models. [86] studies first price sealed bid auctions with equilibrium behavior, where affiliated valuations prevent --in the absence of parametric restrictions on the distribution of the model primitives-- point identification of the model. He derives bounds on seller revenue under various counterfactual scenarios on reserve prices and auction formats. [87] also studies first price sealed bid auctions with equilibrium behavior, but relaxes the independence assumptions on symmetric valuations by requiring it to hold only conditional on unobserved heterogeneity. He derives bounds on various functionals of the distributions of interest, including the mean bid and mean valuation. [88] analyze second price auctions with correlated private values. In this case, the distribution of valuations is not point identified even under the assumptions of the button auction model [89](Theorem 4). Nonetheless, [88] show that interesting functionals of it (seller profits and bidder surplus) can be bounded, if one assumes that transaction prices are determined by the second highest valuation and imposes some restrictions on the joint distribution of the number of bidders and distribution of the valuations. [90] studies a related model of second-price ascending auctions with arbitrary dependence in bidders’ private values. She provides partial identification results for the joint distribution of values for any subset of bidders under various assumptions about what data the researcher observes. While in her framework the highest bid is never observed, she considers the case where only the winner's identity and the winning price are observed, and the case where all the identities and all the bids except for the highest bid are known. She also investigates the informational content of assuming positive dependence in bidders' values. [91] are concerned with nonparametric identification of a two-stage entry and bidding game. Potential bidders are assumed to have private valuations and observe private signals before deciding whether to enter the auction. The dependence between signals and valuations is only minimally restricted. Hence, even with some excluded instruments that affect selection into the auction, the model primitives are only partially identified. The authors derive bounds on these primitives, and provide conditions under which point identification is restored. [92] provide partial identification results in private value and common value auctions under weak restrictions on the information available to the bidders. Their approach leverages a result in [93] yielding an equivalence between distributions of valuations that obey the restrictions imposed by a Bayesian Correlated Equilibrium and those that obey the restrictions imposed by Bayesian Nash Equilibrium under some information structure. Such equivalence is particularly helpful because the set of Bayesian Correlated Equilibria can be characterized through linear programming, so that the sharp identification region provided by [92] is given by the collection of parameter vectors [math]\vartheta[/math] for which a linear program is feasible. Related results leveraging the linear structure of correlated equilibria in the context of entry games include [94], [77](Supplementary Appendix E.2), and [95].

Characterization of Sharpness through Random Set Theory

[22] bounds exploit the information contained in the marginal CDFs [math]\sG_{i:n}[/math] for each [math]i[/math] and [math]n[/math]. However, in Identification Problem additional information can be extracted from the joint distribution of ordered bids. [31] obtain the sharp identification region [math]\idr{\sQ}[/math] using random set methods (Artstein's characterization in Theorem) applied to a quantile function representation of the order statistics. Here I provide an equivalent characterization that uses equation \eqref{eq:RCS_auction} directly, and which has not appeared in the literature before. Let [math]\cT[/math] denote the space of probability distributions with support on [math][\underline{v},\bar{v}][/math], so that [math]\sQ\in\cT[/math]. For a candidate distribution [math]\tilde{\sQ}\in\cT[/math], let [math]\tilde{\sQ}_{1,\dots,n:n}[/math] denote the implied distribution of order statistics of [math]n[/math] i.i.d. random variables distributed [math]\tilde{\sQ}[/math]. Let [math]\tilde{\eB}[/math] be a random closed set defined as in \eqref{eq:RCS_auction} with respect to order statistics of i.i.d. random variables with distribution [math]\tilde{\sQ}[/math]. For a given set [math]K\in\cK[/math], with [math]\cK[/math] the collection of compact subsets of [math]\R^n[/math], let [math]\sT_{\tilde\eB}(K;\tilde{\sQ})[/math] denote the probability of the event [math]\{\tilde\eB\cap K\neq \emptyset\}[/math] implied by [math]\tilde{\sQ}[/math].

Theorem (Distribution of Valuations in Incomplete Auction Model with Independent Private Values)


Under the assumptions of Identification Problem, the sharp identification region for [math]\sQ[/math] is

[[math]] \begin{align} \label{eq:SIR:auction} \idr{\sQ}= \left\{\tilde{\sQ}\in\cT: \sP(\vec{\eb}_n\in K) \le \sT_{\tilde\eB}(K;\tilde{\sQ}) \forall K\in\cK \right\}. \end{align} [[/math]]

Show Proof

The sharp identification region for [math]\sQ[/math] is given by the collection of probability distributions [math]\tilde{\sQ}\in\cT[/math] for which one can find a bidding rule [math]\sB(\cdot|\cdot)[/math] with support in [math]\tilde{\eB}[/math] a.s. such that [math]\sG_{1,\dots,n:n}(\cdot)=\sM_{1,\dots,n:n}(\cdot;\sB,\tilde{\sQ})[/math]. Here [math]\sM_{1,\dots,n:n}(\cdot;\sB,\tilde{\sQ})[/math] is defined as in \eqref{eq:model:impl_sel_mech_auction} with [math]\tilde{\sQ}[/math] replacing [math]\sQ[/math]. Take a distribution [math]\tilde{\sQ}[/math] satisfying this definition of sharpness. Then there exists a selection of [math]\tilde{\eB}[/math] determined by the bidding rule associated with [math]\tilde{\sQ}[/math], such that its distribution matches that of [math]\vec{\eb}_n[/math]. But then Theorem implies that the inequalities in \eqref{eq:SIR:auction} hold. Conversely, take [math]\tilde{\sQ}[/math] satisfying the inequalities in \eqref{eq:SIR:auction}. Then, by Theorem, [math]\vec{\eb}_n[/math] and [math]\tilde{\eB}[/math] can be realized on the same probability space as random elements [math]\vec{\eb}_n^\prime[/math] and [math]\tilde{\eB}^\prime[/math], [math]\vec{\eb}_n\edis \vec{\eb}_n^\prime[/math], [math]\tilde{\eB}\edis\tilde{\eB}^\prime[/math], such that [math]\vec{\eb}_n^\prime \in \tilde{\eB}^\prime[/math] a.s. One can then complete the auction model with a bidding rule that picks [math]\vec{\eb}_n^\prime[/math] with probability [math]1[/math], and the result follows.

In \eqref{eq:SIR:auction}, [math]\sP(\vec{\eb}_n\in K)[/math] is determined by the joint distribution of the ordered bids and hence can be learned from the data. On the other side, [math]\sT_{\tilde\eB}(K;\tilde{\sQ})[/math] is a function of the model and [math]\tilde{\sQ}\in\cT[/math]. Hence, it can be computed using \eqref{eq:RCS_auction}, with [math]\tilde\eB[/math] defined with respect to order statistics of i.i.d. random variables with distribution [math]\tilde{\sQ}\in\sT[/math]. To gain insights in the characterization of [math]\idr{\sQ}[/math], consider for example the set [math]K=\{\prod_{i=1}^{n-1}(-\infty,+\infty)\}\times(-\infty,v][/math]. Plugging it in the inequalities in \eqref{eq:SIR:auction}, one obtains

[[math]] \begin{align*} \sG_{n:n}(v) \le \sQ_{n-1,n}(v),\text{for all } n, \end{align*} [[/math]]

which, using \eqref{eq:HT:beta}, yields \eqref{eq:HT_lower}. Similarly, plugging in the sets [math]K_j=\{\prod_{i=1}^{j-1}(-\infty,+\infty)\}\times[v,\infty)\times\{\prod_{j+1}^n(-\infty,+\infty)\}[/math], [math]j=1,\dots,n[/math], yields \eqref{eq:HT_upper}. So the inequalities proposed by [22] are a subset of the inequalities yielding the sharp identification region in Theorem SIR-. More information can be obtained by using additional sets [math]K[/math]. For instance, the set [math]K=[v_1,\infty)\times[v_2,\infty)\times\{\prod_{i=1}^{n}(-\infty,+\infty)\}[/math], [math]v_2\ge v_1[/math], yields [math]\sP(\eb_{1:n}\ge v_1,\eb_{2:n}\ge v_2)\le \sQ_{1,2:n}([v_1,\infty)\times[v_2,\infty))[/math], which further restricts [math]\sQ[/math]. Numerous examples can be given. Characterization \eqref{eq:SIR:auction} is stated using inequality eq:domin-t for the collection of compact subsets of [math]\R^n[/math]. One can instead use the (equivalent) inequality eq:dom-c, and show that in fact it suffices to check it for a much smaller collection of sets, as shown by [31] (see also [30](Section 2.2)). Nonetheless, this collection remains extremely large.

Key Insight: Random set theory and partial identification -- continued As stated in the Introduction, constructing the (random) set of model predictions delivered by the maintained assumptions is an exercise typically carried out in identification analysis, regardless of whether random set theory is applied. Indeed, for the problem studied in this section, [22](equation D1) put forward the set of admissible bids in \eqref{eq:RCS_auction}.[Notes 34] With this set in hand, the tools of random set theory (in this case, Theorem) immediately deliver the sharp identification region of interest.

[96] further generalize the analysis in this section by dropping the requirement of independent private values. This allows them, for example, to consider affiliated private values. They show that even in this significantly more complex context, the key behavioral restrictions imposed by [22] to relate bids to valuations can be coupled with the use of random set theory, to characterize sharp identification regions.

Network Formation Models

Strategic models of network formation generalize the frameworks of single agents and multiple agents discrete choice models reviewed in Sections Discrete Choice in Single Agent Random Utility Models and Static, Simultaneous-Move Finite Games with Multiple Equilibria. They posit that pairs of agents (nodes) form, maintain, or sever connections (links) according to an explicit equilibrium notion and utility structure. Each individual's utility depends on the links formed by others (the network) and on utility shifters that may be pair-specific. One may conjecture that the results reported in Sections Discrete Choice in Single Agent Random Utility Models-Static, Simultaneous-Move Finite Games with Multiple Equilibria apply in this more general context too. While of course lessons can be carried over, network formation models present challenges that combined cannot be overcome without the development of new tools. These include the issue of equilibrium existence and the possibility of multiple equilibria when they exist, due to the interdependence in agents' choices (this problem was already discussed in Section Static, Simultaneous-Move Finite Games with Multiple Equilibria). Another challenge is the degree of correlation between linking decisions, which interacts with how the observable data is generated: one may observe a growing number of independent networks, or a growing number of agents on a single network. Yet another challenge, which substantially increases the difficulties associated with the previous two, is the combinatoric complexity of network formation problems. The purpose of this section is exclusively to discuss some recent papers that have made important progress to address these specific challenges and carry out partial identification analysis. For a thorough treatment of the literature on network formation, I refer to the reviews in [97], [98], [99], and [100](Chapter XXX in this Volume).[Notes 35] Depending on whether the researcher observes data from a single network or multiple independent networks, the underlying population of agents may be represented as a continuum or as a countably infinite set in the first case, or as a finite set in the second case. Henceforth, I denote generic agents as [math]i[/math], [math]j[/math], [math]k[/math], and [math]m[/math]. I consider static models of undirected network formation with non-transferable utility.[Notes 36] The collection of all links among nodes forms the network, denoted [math]\ey[/math]. For any pair [math](i,j)[/math] with [math]i\neq j[/math], [math]\ey_{ij}=1[/math] if they are linked, and [math]\ey_{ij}=0[/math] otherwise ([math]\ey_{ii}=0[/math] for all [math]i[/math] by convention). The notation [math]\ey-\{ij\}[/math] denotes the network that results if a link present between nodes [math]i[/math] and [math]j[/math] is deleted, while [math]\ey+\{ij\}[/math] denotes the network that results if a link absent between nodes [math]i[/math] and [math]j[/math] is added. Denote agent [math]i[/math]'s payoff by [math]\bu_i(\ey,\ex,\epsilon)[/math]. This payoff depends on the network [math]\ey[/math] and the payoff shifters [math](\ex,\epsilon)[/math], with [math]\ex[/math] observable both to the agents and to the researcher, [math]\epsilon[/math] only to the agents, and [math](\ex,\epsilon)[/math] collecting [math](\ex_{ij},\epsilon_{ij})[/math] for all [math]i[/math] and [math]j[/math].[Notes 37] Following much of the literature, I employ pairwise stability [101] as equilibrium notion: [math]\ey[/math] is a pairwise stable network if all linked agents prefer not to sever their links, and all non-existing links are damaging to at least one agent. Formally,

[[math]] \begin{align*} \forall(i,j):\ey_{ij}&=1,\bu_i(\ey,\ex,\epsilon)\ge \bu_i(\ey-\{ij\},\ex,\epsilon)\mathrm{and}\bu_j(\ey,\ex,\epsilon)\ge \bu_j(\ey-\{ij\},\ex,\epsilon),\\ \forall(i,j):\ey_{ij}&=0,\mathrm{if}\bu_i(\ey+\{ij\},\ex,\epsilon) \gt \bu_i(\ey,\ex,\epsilon)\mathrm{then}\bu_j(\ey+\{ij\},\ex,\epsilon) \lt \bu_j(\ey,\ex,\epsilon). \end{align*} [[/math]]

Under this equilibrium notion, if equilibria exist multiplicity is likely; see, among others, the examples in [97](p. 475), [99](p. 301), and [102](example 3.1). The model is therefore incomplete, because it does not specify how an equilibrium is selected in the region of multiplicity. For the same reasons as discussed in the context of finite games in Section Static, Simultaneous-Move Finite Games with Multiple Equilibria, partial identification results (unless one is willing to impose restrictions on the equilibrium selection mechanism). However, as I explain below, an immediate application of the identification analysis carried out there presents enormous practical challenges because there are [math]2^{n(n-1)/2}[/math] possible network configurations to be checked for stability (and the dimensionality of the space of unobservables is also very large). In what follows I consider two distinct frameworks that make different assumptions about the utility function and how the data is generated, and discuss what can be learned about the parameters of interest in these cases.

Data from Multiple Independent Networks

I first consider the case that the researcher observes data from multiple independent networks. I follow the set-up put forward by [102].

Identification Problem (Network Formation Model with Multiple Independent Networks)

Let there be [math]n\in\{2,3,\dots\},n \lt \infty[/math] agents, and let [math](\ex,\ey)\sim\sP[/math] be observable random variables in [math]\times_{j=1}^n\R^d\times\{0,1\}^{n(n-1)/2}[/math], [math]d \lt \infty[/math]. Suppose that [math]\ey[/math] is a pairwise stable network. For each agent [math]i[/math], let the utility function be known up to finite dimensional parameter vector [math]\delta\in\Delta\subset\R^p[/math], and given by

[[math]] \begin{multline} \bu_i(\ey,\ex,\epsilon;\delta)=\sum_{j=1}^n \ey_{ij}(f(\ex_i,\ex_j;\delta_1)+\epsilon_{ij})\\ +\delta_2\frac{\sum_{j=1}^n\sum_{k\neq i,k=1}^n\ey_{ij}\ey_{jk}}{n-2}+\delta_3\frac{\sum_{j=1}^n\sum_{k=j+1}^n\ey_{ij}\ey_{ik}\ey_{jk}}{n-2}\label{eq:utility:network:1} \end{multline} [[/math]]
with [math]f(\cdot,\cdot;\cdot)[/math] a continuous function of its arguments.[Notes 38] Suppose that [math]\epsilon_{ij}[/math] are independent for all [math]i\neq j[/math] and identically distributed with CDF known up to parameter vector [math]\gamma\in\Gamma\subset\R^m[/math], denoted [math]\sF_\gamma[/math]. Assume that the support of [math]\sF_\gamma[/math] is [math]\R[/math], that [math]\sF_\gamma[/math] is absolutely continuous with respect to Lebesgue measure, and continuously differentiable with respect to [math]\gamma\in\Gamma[/math]. Let [math]\Theta=\Delta\times\Gamma[/math]. Assume that the researcher observes a random sample of networks and observable payoff shifters drawn from [math]\sP[/math]. In the absence of additional information, what can the researcher learn about [math]\theta\equiv[\delta_1\delta_2\delta_3\gamma][/math]?


[102] analyzes this problem. She establishes equilibrium existence provided that [math]\delta_2\ge 0[/math] and [math]\delta_3\ge 0[/math] [102](Proposition 2.2).[Notes 39] Given payoff shifters [math](\ex,\epsilon)[/math] and parameters [math]\vartheta\equiv[\tilde\delta_1\tilde\delta_2\tilde\delta_3\tilde\gamma]\in\Theta[/math], let [math]\eY_\vartheta(\ex,\epsilon)[/math] denote the collection of pairwise stable networks implied by the model. It is easy to show that [math]\eY_\vartheta(\ex,\epsilon)[/math] is a random closed set as in Definition. The networks in [math]\eY_\vartheta(\ex,\epsilon)[/math] are [math]n\times n[/math] symmetric adjacency matrices with diagonal elements equal to zero and off diagonal elements in [math]\{0,1\}[/math]. To ease notation, I omit [math]\eY_\vartheta[/math]'s dependence on [math](\ex,\epsilon)[/math] in what follows. Under the assumption that [math]\ey[/math] is a pairwise stable network, at the true data generating value of [math]\theta\in\Theta[/math], one has

[[math]] \begin{align} \ey\in\eY_\theta\mathrm{a.s.} \label{eq:y_in_Y_network_multiple} \end{align} [[/math]]

Equation \eqref{eq:y_in_Y_network_multiple} exhausts the modeling content of Identification Problem. Theorem can be leveraged to extract its empirical content from the observed distribution [math]\sP(\ey,\ex)[/math]. Let [math]\cY[/math] be the collection of [math]n\times n[/math] symmetric matrices with diagonal elements equal to zero and all other entries in [math]\{0,1\}[/math], so that [math]|\cY|=2^{n(n-1)/2}[/math]. For a given set [math]K\subset\cY[/math], let [math]\sT_{\eY_{\vartheta}}(K;\sF_\gamma)[/math] denote the probability of the event [math]\{\eY_\vartheta\cap K\neq \emptyset\}[/math] implied when [math]\epsilon\sim\sF_\gamma[/math], [math]\ex[/math]-a.s.

Theorem (Structural Parameters in Network Formation Models with Multiple Independent Networks)


Under the assumptions of Identification Problem, the sharp identification region for [math]\theta[/math] is

[[math]] \begin{align} \idr{\theta}=\{\vartheta\in\Theta:\sP(\ey\in K|\ex)\le \sT_{\eY_{\vartheta}}(K;\sF_{\tilde\gamma})\,\forall K\subset\cY, \, \ex\text{-a.s.}\}.\label{eq:SIR:networks:1} \end{align} [[/math]]

Show Proof

Follows from similar arguments as for the proof of Theorem.

The characterization of [math]\idr{\theta}[/math] in Theorem SIR- is new to this chapter.[Notes 40] While technically it entails a finite number of conditional moment inequalities, in practice their number can be prohibitive as it can be as large as [math]2^{2^{n(n-1)/2}}-2[/math].[Notes 41] Even using only a subset of the inequalities in \eqref{eq:SIR:networks:1} to obtain an outer region, for example applying the insights in [21], may not be practical (with [math]n=20[/math], [math]|\cY|\approx 10^{57}[/math]). Moreover, computation of [math]\sT_{\eY_{\vartheta}}(K;\sF_\gamma)[/math] may require (depending on the set [math]K[/math]) evaluation of rather complex integrals. To circumvent these challenges, [102] proposes to analyze network formation through subnetworks. A subnetwork is the restriction of a network to a subset of the agents (i.e., a subset of nodes and the links between them). For given [math]A\subseteq\{1,2,\dots,n\}[/math], let [math]\ey^A=\{\ey_{ij}\}_{i,j\in A, i\neq j}[/math] be the submatrix in [math]\ey[/math] with rows and columns in [math]A[/math], and let [math]\ey^{-A}[/math] be the remaining elements of [math]\ey[/math] after [math]\ey^A[/math] is deleted. With some abuse of notation, let [math](\ey^A,\ey^{-A})[/math] denote the composition of [math]\ey^A[/math] and [math]\ey^{-A}[/math] that returns [math]\ey[/math]. Recall that [math]\eY_\vartheta\equiv\eY_\vartheta(\ex,\epsilon)[/math], and let

[[math]] \begin{align*} \eY_{\vartheta}^A=\{\ey^A\in\{0,1\}^{|A|}:\exists \ey^{-A}\in\{0,1\}^{|-A|}\mathrm{suchthat}(\ey^A,\ey^{-A})\in\eY_{\vartheta}\} \end{align*} [[/math]]

be the collection of subnetworks with rows and columns in [math]A[/math] that can be part of a pairwise stable network in [math]\eY_\vartheta[/math]. Let [math]\ex^A[/math] denote the subset of [math]\ex[/math] collecting [math]\ex_{ij}[/math] for [math]i,j\in A[/math]. For a given [math]y^A\in\{0,1\}^{|A|}[/math], let [math]\sC_{\eY_{\vartheta}^A}(y^A;\sF_\gamma)[/math] and [math]\sT_{\eY_{\vartheta}^A}(y^A;\sF_\gamma)[/math] denote, respectively, the probability of the events [math]\{\eY_\vartheta^A=\{y^A\}\}[/math] and [math]\{\{y^A\}\in\eY_\vartheta^A\}[/math] implied when [math]\epsilon\sim\sF_\gamma[/math], [math]\ex[/math]-a.s. The first event means that only the subnetwork [math]y^A[/math] is part of a pairwise stable network, while the second event means that [math]y^A[/math] is a possible subnetwork that is part of a pairwise stable network but other subnetworks may be part of it too. [102](Proposition 4.1) provides the following outer region for [math]\theta[/math] by adapting the insight in [21] to subnetworks. In the theorem I abuse notation compared to Table by introducing a superscript, [math]A[/math], to make explicit the dependence of the outer region on it.

Theorem (Subnetworks-based Outer Region on Structural Parameters in Network Formation Models with Multiple Independent Networks)

Under the assumptions of Identification Problem, for any [math]A\subseteq\{1,2,\dots,n\}[/math], an [math]A[/math]-dependent outer region for [math]\theta[/math] is

[[math]] \begin{align} \mathcal{O}^A_\sP[\theta]=\{\vartheta\in\Theta:\sC_{\eY_{\vartheta}^A}(y^A;\sF_{\tilde\gamma})\le\sP(\ey^A=y^A|\ex^A)\le \sT_{\eY_{\vartheta}^A}(y^A;\sF_{\tilde\gamma})\,\forall y^A\subset\cY^A, \, \ex^A\text{-a.s.}\},\label{eq:OR:networks:1} \end{align} [[/math]]

where [math]\cY^A[/math] is the collection of [math]|A|\times|A|[/math] symmetric matrices with diagonal elements equal to zero and all other elements in [math]\{0,1\}[/math] so that [math]|\cY^A|=2^{|A|(|A|-1)/2}[/math].

Show Proof

Let [math]\eu(\tilde\ey|\eY_\vartheta)[/math] be a random variable in the unit simplex in [math]\R^{n(n-1)/2}[/math] which assigns to each possible pairwise stable network [math]\tilde\ey[/math] that may realize given [math](\ex,\epsilon)[/math] and [math]\vartheta\in\Theta[/math] the probability that it is selected from [math]\eY_\vartheta[/math]. Given [math]y\in\cY[/math], denote by [math]\sM(y|\ex)[/math] the model predicted probability that the network realizes equal to [math]y[/math]. Then the model yields

[[math]] \begin{align} \sM(y|\ex)&=\int\eu(y| Y_\vartheta)d\sF_\gamma=\int_{y\in Y_\vartheta,| Y_\vartheta|=1}d\sF_\gamma+\int_{y\in Y_\vartheta,| Y_\vartheta|\ge 2}\eu( y| Y_\vartheta)d\sF_\gamma.\label{eq:model:distrib:network:1} \end{align} [[/math]]
The model implied distribution for subnetwork [math]\tilde\ey^A[/math] is obtained by taking the marginal of expression \eqref{eq:model:distrib:network:1} with respect to [math]\tilde\ey^{-A}[/math]

[[math]] \begin{align} \sM(y^A|\ex)&=\sum_{y^{-A}}\sM((y^A,y^{-A})|\ex)= \int_{y^A\in Y_\vartheta^A,| Y_\vartheta^A|=1}d\sF_\gamma+\int_{y^A\in Y_\vartheta^A,| Y_\vartheta^A|\ge 2}\sum_{y^{-A}}\eu((y^A,y^{-A})| Y_\vartheta)d\sF_\gamma.\label{eq:model:distrib:subnetwork:1} \end{align} [[/math]]
Replacing [math]\eu[/math] in \eqref{eq:model:distrib:subnetwork:1} with zero and one yields the bounds in \eqref{eq:OR:networks:1}.

[102](Section 4.2) further assumes that the selection mechanism [math]\eu(\tilde\ey|\eY_\vartheta)[/math] is invariant to permutations of the labels of the players. Under this condition and the maintained assumptions on [math]\epsilon[/math], she shows that the inequalities in \eqref{eq:OR:networks:1} are invariant under permutations of labels, so subnetworks in any two subsets [math]A,A'\subseteq\{1,2,\dots,n\}[/math] with [math]|A|=|A'|[/math] and [math]\ex^A=\ex^{A'}[/math] yield the same inequalities for all [math]y^A=y^{A'}[/math]. It is therefore sufficient to consider subnetwork [math]A[/math] and the inequalities in \eqref{eq:OR:networks:1} associated with it. Leveraging this result, [102] proposes an outer region obtained by looking at unlabeled subnetworks of size [math]|A|\le\bar{a}[/math] and given by

[[math]] \begin{align*} \outr{\theta}=\bigcap_{|A|\le\bar{a}}\mathcal{O}^A_\sP[\theta]. \end{align*} [[/math]]

As long as the subnetworks are chosen to be small, e.g., [math]|A|\le 2,3,4[/math], the inequalities in \eqref{eq:OR:networks:1} can be computed even if the network is large. [102] shows that the inequalities in \eqref{eq:OR:networks:1} remain informative as [math]n[/math] grows. This fact highlights the importance of working with subnetworks. One could have applied the insight of [21] directly to the full network by setting [math]\eu[/math] equal to zero and to one in \eqref{eq:model:distrib:network:1}. The resulting bounds, however, would vanish to zero as [math]n[/math] grows and become uninformative for [math]\theta[/math]. The characterization in Theorem OR- can be refined to obtain a smaller region, adapting the results in [77](Supplementary Appendix Theorem D.1) to subnetworks. The size of this refined region is weakly decreasing in [math]|A|[/math].[Notes 42] However, the refinement does not yield [math]\idr{\theta}[/math] because it is applied only to subnetworks.

Key Insight: At the beginning of this section I highlighted some key challenges to inference in network formation models. Identification Problem bypasses the concern on the dependence among linking decisions through the independence assumption on [math]\epsilon_{ij}[/math] and the presumption that the researcher observes data from multiple independent networks, which allows for identification of [math]\sP(\ey,\ex)[/math]. [102] takes on the remaining challenges by formally establishing equilibrium existence and allowing for unrestricted selection among multiple equilibria. In order to overcome the computational complexity of the problem, she puts forward the important idea of inference based on subnetworks. While of course information is left on the table, the approach remains feasible even with large networks.

[103] considers a framework similar to the one laid out in Identification Problem. He assumes non-negative externalities, and shows that in this case the set of pairwise stable equilibria is a complete lattice with a smallest and a largest equilibrium.[Notes 43] He then uses moment functions that are monotone in the pairwise stable network (so that they take their extreme values at the smallest and largest equilibria), to obtain moment conditions that restrict [math]\theta[/math]. Examples of the moment functions used include the proportion of pairs with a link, the proportion of links belonging to traingles, and many more (see [103](Table 1)). [104] considers unilateral and bilateral directed network formation games, still under a sampling framework where the researcher observes many independent networks. The equilibrium notion that she uses is pure strategy Nash. She assumes that the payoff that player [math]i[/math] receives from forming link [math]ij[/math] is allowed to depend on the number of additional players forming a link pointing to [math]j[/math], but rules out other spillover effects. Under this assumption and some regularity conditions, [104] shows that the network formation game can be decomposed into local games (i.e., games whose sets of players and strategy profiles are subsets of the network formation game's ones), so that the network formation game is in equilibrium if and only if each local game is in equilibrium. She then obtains a characterization of [math]\idr{\theta}[/math] using elements of random set theory.

Data From a Single Network

When the researcher observes data from a single network, extra care has to be taken to restrict the dependence among linking decisions. This can be done in various ways (see, e.g., [98](for some examples)). Here I consider a framework proposed by [105].

Identification Problem (Network Formation Model with a Single Network)

Let there be a continuum of agents [math]j\in\cI=[0,\mu][/math], with [math]\mu \gt 0[/math] their total measure, who choose whom to link to based on a utility function specified below.[Notes 44] Let [math]y:\cI\times\cI\to\{0,1\}[/math] be an adjacency mapping with [math]y_{jk}=1[/math] if nodes [math]j[/math] and [math]k[/math] are linked, and [math]y_{jk}=0[/math] otherwise. Assume that only connections up to distance [math]\bar{d}[/math] affect utility and that preferences are such that agents never choose to form more than a total of [math]\bar{l}[/math] links.[Notes 45] To simplify exposition, let [math]\bar{d}=2[/math]. Let each agent [math]j[/math] be endowed with characteristics [math]\ex_j\in\cX[/math], with [math]\cX[/math] a finite set in [math]\R^p[/math], that are observable to the researcher. Additionally, let each agent [math]j[/math] be endowed with [math]\bar{l}\times|\cX|[/math] preference shocks [math]\epsilon_{j\ell}(x)\in\R,\ell=1,\dots,\bar{l},x\in\cX[/math], that are unobservable to the researcher and correspond to the possible direct connections and their characteristics.[Notes 46] Suppose that the vector of preference shocks is independent of [math]\ex[/math] and has a distribution known up to parameter vector [math]\gamma\in\Gamma\subset\R^m[/math], denoted [math]\sQ_\gamma[/math]. Let [math]\cI(j)=\{k:y_{jk}=1\}[/math]. Assume that agents with characteristics and preference shocks [math](x,e)[/math] value links according to the utility function

[[math]] \begin{multline} \bu_j(y,x,e)=\sum_{k\in\cI(j)}(f(x_j,x_k)+e_{j\ell(k)}(x_k))\\ +\delta_1\left|\bigcup_{k\in\cI(j)}\cI(k)-\cI(j)-\{j\}\right| +\delta_2\sum_{k\in\cI(j)}\sum_{m\in\cI(j):m \gt k}y_{km}-\infty\one(|\cI(k)| \gt \bar{l})\label{eq:utility:network:2} \end{multline} [[/math]]
Assume that the network [math]\ey[/math] formed by agents with characteristics and shocks [math](\ex,\epsilon)[/math] is pairwise stable. Let [math]\Theta\equiv\Upsilon\times\Delta\times\Gamma[/math], with [math]\Upsilon[/math] the parameter space for [math]\cf\equiv\{f(x,w):x\in\cX,w\in\cX\}[/math]. In the absence of additional information, what can the researcher learn about [math]\theta\equiv[\cf\delta_1\delta_2\gamma][/math]?


Identification Problem enforces dimension reduction through the restrictions on depth and degree (the bounds [math]\bar{d}[/math] and [math]\bar{l}[/math]), so that it is applicable to frameworks with networks that have limited degree distribution (e.g., close friendships network, but not Facebook network). It also requires that individual identities are irrelevant. This substantially reduces the richness of unobserved heterogeneity allowed for and the dimensionality of the space of unobservables. While the latter feature narrows the domain of applicability of the model, it is very beneficial to obtain a tractable characterization of what can be learned about [math]\theta[/math], and yields equilibria that may include isolated nodes, a feature often encountered in networks data. [105] study Identification Problem focusing on the payoff-relevant local subnetworks that result from the maintained assumptions. These are distinct from the subnetworks used by [102]: whereas [102] looks at subnetworks formed by arbitrary individuals and whose size is chosen by the researcher on the base of computational tractability, [105] look at subnetworks among individuals that are within a certain distance of each other, as determined by the structure of the preferences. On the other hand, [102] analysis does not require that agents have a finite number of types nor bounds the number of links that they may form. To characterize the local subnetworks relevant for identification analysis in their framework, [105] propose the concepts of network type and preference class. A network type [math]t=(a,v)[/math] describes the local network up to distance [math]\bar{d}[/math] from the reference node. Here [math]a[/math] is a square matrix of size [math]1+\bar{l}\sum_{d=1}^{\bar{d}}(\bar{l}-1)^{d-1}[/math] that describes the local subnetwork that is utility relevant for an agent of type [math]t[/math]. It consists of the reference node, its direct potential neighbors ([math]\bar{l}[/math] elements), its second order neighbors ([math]\bar{l}(\bar{l}-1)[/math] elements), through its [math]\bar{d}[/math]-th order neighbors ([math]\bar{l}(\bar{l}-1)^{\bar{d}-1}[/math] elements). The other component of the type, [math]v[/math], is a vector of length equal to the size of [math]a[/math] that contains the observable characteristics of the reference node and her alters. The bounds [math]\bar{d}[/math] and [math]\bar{l}[/math] enforce dimension reduction by bounding the number of network types. The partial identification approach of [105] depends on this number, rather than on the number of agents. For example, the number of moment inequalities is determined by the number of network types, not by the number of agents. As such, the approach yields its highest dividends for dimension reduction in large networks. Let [math]\cT[/math] denote the collection of network types generated from a preference structure [math]\bu[/math] and set of characteristics [math]\cX[/math]. For given realization [math](x,e)[/math] of the observable characteristics and preference shocks of a reference agent, and for given [math]\vartheta\in\Theta[/math], define the collection of network types for which no agent wants to drop a link by

[[math]] \begin{align*} H_\vartheta(x,e)=\{(a,v)\in\cT:v_1=x\mathrm{and}\bu(a,v,e)\ge \bu(a_{-\ell},v,e)\forall\ell=1,\dots,\bar{l}\}, \end{align*} [[/math]]

where [math]a_{-\ell}[/math] is equal to the local adjacency matrix [math]a[/math] but with the [math]\ell[/math]-th link removed (that is, it sets the [math](1,\ell+1)[/math] and [math](\ell+1,1)[/math] elements of [math]a[/math] equal to zero). Because [math](\ex,\epsilon)[/math] are random vectors, [math]\eH_\vartheta\equiv H_\vartheta(\ex,\epsilon)[/math] is a random closed set as per Definition. This random set takes on a finite number of realizations (equal to the possible subsets of [math]\cT[/math]), so that its distribution is completely determined by the probability with which it takes on each of these realizations. A preference class [math]H\subset\cT[/math] is one of the possible realizations of [math]\eH_\vartheta[/math] for some [math]\vartheta\in\Theta[/math]. The model implied probability that [math]\eH_\vartheta=H[/math] is given by

[[math]] \begin{align} \sM(H|\ex;\vartheta)\equiv\sQ_{\tilde\gamma}(\epsilon:\eH_\vartheta=H|\ex).\label{eq:model:prediction:network:class} \end{align} [[/math]]

Observation of data from one network allows the researcher, under suitable restrictions on the sampling process, to learn the distribution of network types in the data (type shares), denoted [math]\sP(t)[/math].[Notes 47] For example, in a network of best friends with [math]\bar{l}=1[/math] and [math]\bar{d}=2[/math], and [math]\cX=\{x^1,x^2\}[/math] (e.g., a simplified framework with only two possible races), agents are either isolated or in a pair. Network types are pairs for the agents' race and the best friend's race (with second element equal zero if the agent is isolated). Type shares are the fraction of isolated blacks, the fraction of isolated whites, the fraction of blacks with a black best friend, the fraction of whites with a black best friend, and the fraction of whites with a white best friend. The preference classes for a black agent are [math]H^1(b,e)=\{(b,0)\}[/math], [math]H^2(b,e)=\{(b,0),(b,b)\}[/math], [math]H^3(b,e)=\{(b,0),(b,w)\}[/math], [math]H^4(b,e)=\{(b,0),(b,w),(b,b)\}[/math] (and similarly for whites). In each case, being alone is part of the preference class, as there are no links to sever. In the second class the agent has a preference for having a black friend, in the third class for a white friend, and in the last class for a friend of either race. It is easy to see that the model is incomplete, as for a given realization of [math]\epsilon[/math] it makes multiple predictions on the agent's preference type. [105] propose to map the distribution of preference classes into the observed distribution of preference types in the data through the use of allocation parameters, denoted [math]\alpha_H(t)\in[0,1][/math]. These are distinct from but play the same role as a selection mechanism, and they represent a candidate distribution for [math]t[/math] given [math]\eH_\vartheta=H[/math]. The model, augmented with them, implies a probability that an agent is of network type [math]t[/math]:

[[math]] \begin{align} \sM(t;\vartheta,\alpha)=\frac{1}{\mu}\sum_{H\subset\cT}\mu_{v_1(t)}\sM(H|v_1(t);\vartheta)\alpha_H(t),\label{eq:model:prediction:network:2} \end{align} [[/math]]

where [math]\mu_{v_1(t)}[/math] is the measure of reference agents with characteristics equal to the second component of the preference type [math]t[/math], [math]\ex=v_1(t)[/math], and [math]\alpha\equiv\{\alpha_H(t):t\in \cT, H\subset\cT\}[/math].

[105] provide a characterization of an outer region for [math]\theta[/math] based on two key implications of pairwise stability that deliver restrictions on [math]\alpha[/math]. They also show that under some additional assumptions, this characterization yields [math]\idr{\theta}[/math] [105](Appendix B). Here I focus on their more general result. The first implication that they use is that existing links should not be dropped:

[[math]] \begin{align} t\notin H\Rightarrow\alpha_H(t)=0.\label{eq:networks:2:PS1} \end{align} [[/math]]

The condition in \eqref{eq:networks:2:PS1} is embodied in [math]\bar\alpha\equiv\{\alpha_H(t):t\in H, H\subset\cT\}[/math]. The second implication is that it should not be possible to establish mutually beneficial links among nodes that are far from each other. Let [math]t^\prime[/math] and [math]s^\prime[/math] denote the network types that are generated if one adds a link in networks of types [math]t[/math] and [math]s[/math] among two nodes that are at distance at least [math]2\bar{d}[/math] from each other and each have less than [math]\bar{l}[/math] links. Then the requirement is

[[math]] \begin{align} \left(\sum_{H\subset\cT}\mu_{v_1(t)}\sM(H|v_1(t);\vartheta)\alpha_H(t)\one(t^\prime\in H)\right)\left(\sum_{H\subset\cT}\mu_{v_1(s)}\sM(H|v_1(s);\vartheta)\alpha_H(s)\one(s^\prime\in H)\right)=0\label{eq:networks:2:PS2} \end{align} [[/math]]

In words, if a positive measure of agents of type [math]t[/math] prefer [math]t^\prime[/math] (i.e., [math]\alpha_H(t) \gt 0[/math] for some [math]H[/math] such that [math]t^\prime\in H[/math]), there must be zero measure of type [math]s[/math] individuals who prefer [math]s^\prime[/math], because otherwise the network is unstable. [105] show that the conditions in \eqref{eq:networks:2:PS2} can be embodied in a square matrix [math]q[/math] of size equal to the length of [math]\bar{\alpha}[/math]. The entries of [math]q[/math] are constructed as follows. Let [math]H[/math] and [math]\tilde{H}[/math] be two preference classes with [math]t\in H[/math] and [math]s\in\tilde{H}[/math]. With some abuse of notation, let [math]q_{\alpha_H(t),\alpha_{\tilde{H}}(s)}[/math] denote the element of [math]q[/math] corresponding to the index of the entry in [math]\bar\alpha[/math] equal to [math]\alpha_H(t)[/math] for the row, and to [math]\alpha_{\tilde{H}}(s)[/math] for the column. Then set [math]q_{\alpha_H(t),\alpha_{\tilde{H}}(s)}(\vartheta)=\one(t^\prime\in H)\one(s^\prime\in\tilde{H})[/math]. It follows that this element yields the term [math]\big(\alpha_H(t)\one(t^\prime\in H)\big)\big(\alpha_{\tilde{H}}(s)\one(s^\prime\in \tilde{H})\big)[/math] in the quadratic form [math]\bar{\alpha}^\top q \bar{\alpha}[/math]. As long as [math]\mu_{v_1(\cdot)}[/math] and [math]\sM(\cdot|\ex;\vartheta)[/math] in \eqref{eq:model:prediction:network:class} are strictly positive, this term is equal to zero if and only if condition \eqref{eq:networks:2:PS2} holds for types [math]t[/math] and [math]s[/math].[Notes 48] With this background, Theorem OR- below provides an outer region for [math]\theta[/math]. The proof of this result follows from the arguments laid out above (see [105](Theorems 1 and 2, for the full details)).

Theorem (Outer Region on Parameters of a Network Formation Model with a Single Network)

Under the assumptions of Identification Problem,

[[math]] \begin{align} \outr{\theta}=\left\{\vartheta\in\Theta: \left(\begin{array}{[rl]} \min_{\bar{\alpha}} \bar{\alpha}^\top q \bar{\alpha} & \\ s.t. & \sM(t;\vartheta,\bar{\alpha})=\sP(t) \forall t\in\cT \\ & \sum_{t\in H}\bar\alpha_H(t)=1\forall H\subset \cT \\ & \alpha_H(t)\ge 0\forall t\in H,\forall H\subset \cT \end{array} \right){{=}}0 \right\}.\label{eq:OR:networks:2} \end{align} [[/math]]


The set in \eqref{eq:OR:networks:2} does not equal [math]\idr{\theta}[/math] in all models allowed for in Identification Problem because condition \eqref{eq:networks:2:PS2} does not embody all implications of pairwise stability on non-existing links. While the optimization problem in \eqref{eq:OR:networks:2} is quadratic, it is not necessarily convex because [math]q[/math] may not be positive definite. Nonetheless, the simulations reported by [105] suggest that [math]\outr{\theta}[/math] can be computed rapidly, as least for the examples they considered.

Key Insight: At the beginning of this section I highlighted some key challenges to inference in network formation models. When data is observed from a single network, as in Identification Problem, [105] proposal to base inference on local networks achieves two main benefits. First, it delivers consistently estimable features of the game, namely the probability that an agent belongs to one of a finite collection of network types. Second, it achieves dimension reduction, so that computation of outer regions on [math]\theta[/math] remains feasible even with large networks and allowing for unrestricted selection among multiple equilibria. ===</ref> and [24] propose to embed revealed preference-based inequalities into structural models of both demand and supply in markets where firms face discrete choices of product configuration or of location. Revealed preference arguments are a trademark of the literature on discrete choice analysis. [23] and [24] use these arguments to leverage a subset of the model's implications to obtain easy-to-compute moment inequalities. For example, in the context of entry games such as the ones discussed in Section Static, Simultaneous-Move Finite Games with Multiple Equilibria, they propose to base inference on the implication that a player enters the market if and only if (s)he expects to make non-negative profits. This condition can be exploited even when players have heterogeneous (unobserved to the researcher) information sets, and it implies that the expected profits for entrants should be non-negative. Nonetheless, the condition does not suffice to obtain moment inequalities that include only observed payoff shifters and preference parameters. This is because the expected value of unobserved payoff shifters for entrants is not equal to zero, as the group of entrants is selected. The authors require the availability of valid (monotone) instrumental variables to solve this problem (see Section Treatment Effects with and without Instrumental Variables for uses of instrumental variables and monotone instrumental variables in partial identification analysis of treatment effects). Interesting features of their approach include that the researcher does not need to solve for the set of equilibria, nor to require that the distribution of unobservable payoff shifters is known up to finite dimensional parameter vector. Moreover, the same basic ideas can be applied to single agent models (with or without heterogeneous information sets). A shortcoming of the method is that the set of parameter vectors satisfying the moment inequalities may be wider than the sharp identification region under the maintained assumptions.

The breadth of applications of the approach proposed by [23] and [24] is vast.[Notes 49] For example, [106] uses it to model the formation of the hospital networks offered by US health insurers, and [107] and [108] use it to obtain bounds on firm fixed costs as an input to modeling product choices in the movie industry and in the US video game industry, respectively. [109] estimates the effects of Wal-Mart's strategy of creating a high density network of stores. While the close proximity of stores implies cannibalization in sales, Wal-Mart is willing to bear it to achieve density economies, which in turn yield savings in distribution costs. His results suggest that Wal-Mart substantially benefits from high store density. [110] measure the effects of chain economies, business stealing, and heterogeneous firms' comparative advantages in the discount retail industry. [111] estimate a model of strategic voting and quantify the impact it has on election outcomes. As in other models analyzed in this section, the one they study yields multiple predicted outcomes, so that partial identification methods are required to carry out the empirical analysis if one does not assume a specific selection mechanism to resolve the multiplicity. They estimate their model on Japanese general-election data, and uncover a sizable fraction of strategic voters. They also estimate that only a small fraction of voters are misaligned (voting for a candidate other than their most preferred one). [112] studies whether the rapid removal from the market for personal computers of existing central processing units upon creation of new ones through innovation reduces surplus. He finds that a limited group of price-insensitive consumers enjoys the largest share of the welfare gains from innovation. A policy that kept older technologies on the shelf would allow for the benefits from innovation to reach price-sensitive consumers thanks to improved access to mobile computing, but total welfare would not increase because consumer welfare gains would be largely offset by producer losses. [113] analyze hospital referrals for labor and birth episodes in California in 2003, for patients enrolled with six health insurers that use, to a different extent, incentives to referring physicians groups to reduce hospital costs (capitation contracts). The aim is to learn whether enrollees with high-capitation insurers tend to be referred to lower-priced hospitals (ceteris paribus) compared to other patients with same-severity conditions, and whether quality of care was affected. Their model allows for an insurer-specific preference function that is additively separable in the hospital price paid by the insurer (which is allowed to be measured with error), the distance traveled, and plan and severity-specific hospital fixed effects. Importantly, unobserved heterogeneity entering the preference function is not assumed to be drawn from a distribution known up to finite dimensional parameter vector. The results of the empirical analysis indicate that the price paid by insurers to hospitals has an impact on referrals, with higher elasticity to price for insurers whose physicians groups are more highly capitated. [114] study how the information that potential exporters have to predict the profits they will earn when serving a foreign market influences their decisions to export. They propose a model where the researcher specifies and observes a subset of the variables that agents use to form their expectations, but may not observe other variables that affect firms' expectations heterogeneously (across firms and markets, and over time). Because only a subset of the variables entering the firms' information set is observed, partial identification results. They show that, under rational expectations, they can test whether potential exporters know and use specific variables to predict their export profits. They also use their model's estimates to quantify the value of information. [115] studies the implications of the \$85 billion automotive industry bailout in 2009 on the commercial vehicle segment. He finds that had Chrysler and GM been liquidated (or aquired by a major competitor) rather than bailed out, the surviving firms would have experienced a rise in profits high enough to induce them to introduce new products.

A different use of revealed preference arguments appears in the contributions of [116], [117], [118][119], [120], [121], [122], [123], and many others. For example, [120] proposes a method to partially identify income-leisure preferences and to evaluate the associated effects of tax policies. He starts from basic revealed-preference analysis performed under the assumption that individuals prefer more income and leisure, and no other restriction. The analysis shows that observing an individual's time allocation under a status quo tax policy yields bounds on his allocation that may or may not be informative, depending on how the person allocates his time under the status quo policy and on the tax schedules. He then explores what more can be learned if one additionally imposes restrictions on the distribution of income-leisure preferences, using the method put forward by [55]. One assumption restricts groups of individuals facing different choice sets to have the same distribution of preferences. The other assumption restricts this distribution to a parametric family. [124] build on and expand [120]'s framework to evaluate the effect of Connecticut's Jobs First welfare reform experiment on women' labor supply and welfare participation decisions.

[121] propose a method to learn features of households' risk preferences in a random utility model that nests expected utility theory plus a range of non-expected utility models.[Notes 50] They allow for unobserved heterogeneity in preferences (that may enter the utility function non-separably) and leave completely unspecified their distribution. The authors use revealed preference arguments to infer, for each household, a set of values for its unobserved heterogeneity terms that are consistent with the household's choices in the three lines of insurance coverage. As their core restriction, they assume that each household's preferences are stable across contexts: the household's utility function is the same when facing distinct but closely related choice problems. This allows them to use the inferred set valued data to partially identify features of the distribution of preferences, and to classify households into preference types. They apply their proposed method to analyze data on households' deductible choices across three lines of insurance coverage (home all perils, auto collision, and auto comprehensive).[Notes 51] Their results show that between 70 and 80 percent of the households make choices that can be rationalized by a model with linear utility and monotone, quadratic, or even linear probability distortions. These probability distortions substantially overweight small probabilities. By contrast, fewer than 40 percent can be rationalized by a model with concave utility but no probability distortions.

[122] propose a method to carry out demand analysis while allowing for general forms of unobserved heterogeneity. Preferences and linear budget sets are assumed to be statistically independent (conditional on covariates and control functions). [122] show that for continuous demand, average surplus is generally not identified from the distribution of demand for a given price and income, and therefore propose a partial identification approach. They use bounds on income effects to derive bounds on average surplus. They apply the bounds to gasoline demand, using data from the 2001 U.S. National Household Transportation Survey.

Another strand of empirical applications pertains to the analysis of discrete games. [21] use the method they develop, described in Section An Inference Approach Robust to the Presence of Multiple Equilibria, to study market structure in the US airline industry and the role that firm heterogeneity plays in shaping it. Their findings suggest that the competitive effects of each carrier increase in that carrier's airport presence, but also that the competitive effects of large carriers (American, Delta, United) are different from those of low cost ones (Southwest). They also evaluate the effect of a counterfactual policy repealing the Wright Amendment, and find that doing so would see an increase in the number of markets served out of Dallas Love.

[84] proposes a model of static entry that extends the one in Section Static, Simultaneous-Move Finite Games with Multiple Equilibria by allowing individuals to have flexible information structures, where players's payoffs depend on both a common-knowledge unobservable payoff shifter, and a private-information one. His characterization of [math]\idr{\theta}[/math] is based on using an unrestricted selection mechanism, as in [61] and [21]. He applies the model to study the impact of supercenters such as Wal-Mart, that sell both food and groceries, on the profitability of rural grocery stores. He finds that entry by a supercenter outside, but within 20 miles, of a local monopolist's market has a smaller impact on firm profits than entry by a local grocer. Their entrance has a small negative effect on the number of grocery stores in surrounding markets as well as on their profits. The results suggest that location and format-based differentiation partially insulate rural stores from competition with supercenters.

A larger class of information structures is considered in the analysis of static discrete games carried out by [95]. They allow for all information structures consistent with the players knowing their own payoffs and the distribution of opponents' payoffs. As solution concept they adopt the Bayes Correlated Equilibrium recently developed by [93]. Also with this solution concept multiple equilibria are possible. The authors leave completely unspecified the selection mechanism picking the equilibrium played in the regions of multiplicity, so that partial identification attains. [95] use the random sets approach to characterize [math]\idr{\theta}[/math]. They apply the method to estimate a model of entry in the Italian supermarket industry and quantify the effect of large malls on local grocery stores. [125] provide partial identification results (and Bayesian inference methods) for semiparametric dynamic binary choice models without imposing distributional assumptions on the unobserved state variables. They carry out an empirical application using [126]'s model of bus engine replacement. Their results suggest that parametric assumptions about the distribution of the unobserved states can have a considerable effect on the estimates of per-period payoffs, but not a noticeable one on the counterfactual conditional choice probabilities. [127] use the random sets approach to partially identify and estimate dynamic discrete choice models with serially correlated unobservables, under instrumental variables restrictions. They extend two-step dynamic estimation methods to characterize a set of structural parameters that are consistent with the dynamic model, the instrumental variables restrictions, and the data.[Notes 52] [104] uses the random sets approach and a network formation model, to learn about Italian firms' incentives for having their executive directors sitting on the board of their competitors.

[48] use the method described in Section Unobserved Heterogeneity in Choice Sets and/or Consideration Sets to partially identify the distribution of risk preferences using data on deductible choices in auto collision insurance.[Notes 53] They posit an expected utility theory model and allow for unobserved heterogeneity in households' risk aversion and choice sets, with unrestricted dependence between them. Motivation for why unobserved heterogeneity in choice sets might be an important factor in this empirical framework comes from the earlier analysis of [121] and novel findings that are part of [48] contribution. They show that commonly used models that make strong assumptions about choice sets (e.g., the mixed logit model with each individual's choice set assumed equal to the feasible set, and various models of choice set formation) can be rejected in their data. With regard to risk aversion, their key finding is that their estimated lower bounds are significantly smaller than the point estimates obtained in the related literature. This suggests that the data can be explained by expected utility theory with lower and more homogeneous levels of risk aversion than it had been uncovered before. This provides new evidence on the importance of developing models that differ in their specification of which alternatives agents evaluate (rather than or in addition to models focusing on how they evaluate them), and to data collection efforts that seek to directly measure agents' heterogeneous choice sets [44].

[128] study the effect of pre-vote deliberation on the decisions of US appellate courts. The question of interest is weather deliberation increases or reduces the probability of an incorrect decision. They use a model where communication equilibrium is the solution concept, and only observed heterogeneity in payoffs is allowed for. In the model, multiple equilibria are again possible, and the authors leave the selection mechanism completely unspecified. They characterize [math]\idr{\theta}[/math] through an optimization problem, and structurally estimate the model on US Courts of Appeal data. [128] compare the probability of making incorrect decisions under the pre-vote deliberation mechanism, to that in a counterfactual environment where no deliberation occurs. The results suggest that there is a range of parameters in [math]\idr{\theta}[/math], for which judges have ex-ante disagreement of imprecise prior information, for which deliberation is beneficial. Otherwise deliberation leads to lower effectiveness for the court.

[129] propose a test for the hypothesis of rational expectations for the case that one observes only the marginal distributions of realizations and subjective beliefs, but not their joint distribution (e.g., when subjective beliefs are observed in one dataset, and realizations in a different one, and the two cannot be matched). They establish that the hypothesis of rational expectations can be expressed as testing that a continuum of moment inequalities is satisfied, and they leverage the results in [130] to provide a simple-to-compute test for this hypothesis. They apply their method to test for and quantify deviations from rational expectations about future earnings, and examine the consequences of such departures in the context of a life-cycle model of consumption.

[131] estimate the demand for health insurance under the Affordable Care Act using data from California. Methodologically, they use a discrete choice model that allows for endogeneity in insurance premiums (which enter as explanatory variables in the model) and dispenses with parametric assumptions about the unobserved components of utility leveraging the availability of instrumental variables, similarly to the framework presented in Section Endogenous Explanatory Variables. The authors provide a characterization of sharp bounds on the effects of changing premium subsidies on coverage choices, consumer surplus, and government spending, as solutions to linear programming problems, rendering their method computationally attractive.

Another important strand of theoretical literature is concerned with partial identification of panel data models. [132] consider a dynamic random effects probit model, and use partial identification analysis to obtain bounds on the model parameters that circumvent the initial conditions problem. [133] considers a fixed effect panel data model where he imposes a conditional quantile restriction on time varying unobserved heterogeneity. Differencing out inequalities resulting from the conditional quantile restriction delivers inequalities that depend only on observable variables and parameters to be estimated, but not on the fixed effects, so that they can be used for estimation. [134] obtain bounds on average and quantile treatment effects in nonparametric and semiparametric nonseparable panel data models. [135] provide partial identification results in linear panel data models when censored outcomes, with unrestricted dependence between censoring and observable and unobservable variables. Their results are derived for two classes of models, one where the unobserved heterogeneity terms satisfy a stationarity restriction, and one where they are nonstationary but satisfy a conditional independence restriction. [136] provides a method to partially identify state dependence in panel data models where individual unobserved heterogeneity needs not be time invariant. [137] study semiparametric multinomial choice panel models with fixed effects where the random utility function is assumed additively separable in unobserved heterogeneity, fixed effects, and a linear covariate index. The key semiparametric assumption is a group stationarity condition on the disturbances which places no restrictions on either the joint distribution of the disturbances across choices or the correlation of disturbances across time. [137] propose a within-group comparison that delivers a collection of conditional moment inequalities that they use to provide point and partial identification results. [138] proposes a related method, where partial identification relies on the observation of individuals whose outcome changes in two consecutive time periods, and leverages shape restrictions to reduce the number of between alternatives comparisons needed to determine the optimal choice.

General references

Molinari, Francesca (2020). "Microeconometrics with Partial Identification". arXiv:2004.11751 [econ.EM].

Notes

  1. Of course, this is not always the case, as exemplified by the bounds in [1].
  2. [2] study also partial identification (and estimation) of nonparametric, semiparametric, and parametric conditional expectation functions that are well defined in the absence of a structural model, when one of the conditioning variables is interval valued. I refer to Section for a discussion.
  3. [3] consider more general multi-player entry games.
  4. Figure is based on Figure 1 in [4]. See [5](Chapter XXX in this Volume) for an extensive discussion of the duality between the model's set valued predictions for [math]\ey[/math] as a function of [math]\epsilon[/math] and for [math]\epsilon[/math] as a function of [math]\ey[/math], in both cases given the observed covariates.
  5. In the definition of [math]\Eps_\vartheta(1,\ew,\xL,\xU)[/math] I exploit the fact that under the maintained assumptions [math]\P(\epsilon=-\ew\vartheta-\xU|\ew,\ex,\xL,\xU)=0[/math] to enforce its closedness.
  6. There are no [math](\ew,\xL,\xU)[/math]-cross restrictions.
  7. This Corollary is related in spirit to the analysis in [6].
  8. This was confirmed in personal communication with Chuck Manski and Elie Tamer.
  9. The proof closes a gap in the argument in [7] connecting their Proposition 2 and Lemma 1, due to the fact that for a given [math]\vartheta[/math] the sets
    [[math]]\{(\ew,\xL,\xU):\, \{\ew\theta+\xU\le 0 \lt \ew\vartheta+\xL\} \cup \{\ew\vartheta+\xU\le 0 \lt \ew\theta+\xL\}\}[[/math]]
    and
    [[math]]\begin{split}\{(\ew,\xL,\xU):\, \{0 \lt \ew\vartheta+\xL\cap \sP(\ey=1|\ew,\xL,\xU)\le 1-\alpha\} \\ \cup \{\ew\vartheta+\xU\le 0\cap \sP(\ey=1|\ew,\xL,\xU) \gt 1-\alpha\}\}\end{split}[[/math]]
    need not coincide, with the former being a subset of the latter due to part (c) of the proof of Proposition 2 in [8].
  10. This distinction echos the distinction drawn by [9](Section 1.1.1) between point identification and uniform point identification. [10] considers a scenario where a parameter vector of interest [math]\theta[/math] is defined as the solution to an equation of the form [math]\crit_\sP(\theta)=0[/math] for some criterion function [math]\crit_\sP:\Theta\mapsto\R_+[/math]. Then [math]\theta[/math] is point identified relative to [math](\sP,\Theta)[/math] if it is the unique solution to [math]\crit_\sP(\theta)=0[/math]. It is uniformly point identified relative to [math](\cP,\Theta)[/math], with [math]\cP[/math] a space of probability distributions to which [math]\sP[/math] belongs, if for every [math]\tilde\sP\in\cP[/math], [math]\crit_{\tilde\sP}(\vartheta)=0[/math] has a unique solution.
  11. [11](Supplementary Appendix F) extend the analysis of [12] to multinomial choice models with interval covariates.
  12. The estimator that they propose extends the minimum distance estimator put forward by [13], see Section Consistent Estimation, so that if the conditions required for point identification do not hold, it estimates the parameter's identification region (under regularity conditions). [14] carry out a similar analysis for the binary choice model with endogenous explanatory variables.
  13. Compared to the general model put forward in Section Discrete Choice in Single Agent Random Utility Models, in this model there are no preference heterogeneity terms [math]\zeta[/math] (random coefficients) that vary only across decision makers.
  14. Of course, under these conditions one can work directly with utility differences. To try and economize on notation, I do not explicitly do so here.
  15. This figure is based on Figures 1-3 in [15].
  16. The specific model in [16](Section II-A) is often used in applications. It posits that each alternative [math]c\in\cY[/math] enters the decision maker’s choice set with probability [math]\phi_c[/math], independently of the other alternatives. The probability [math]\phi_c[/math] may depend on observable individual characteristics, and [math]\phi_c=1[/math] for at least one option [math]c\in\cY[/math] (the “default” good).
  17. These assumptions are akin to assumptions about selection mechanisms in models with multiple equilibria. The latter are discussed further below in Section An Inference Approach Robust to the Presence of Multiple Equilibria, along with their criticisms.
  18. This assumption can be relaxed as discussed in [17]. The procedure proposed here can also be adapted to allow for endogenous explanatory variables as in Section Endogenous Explanatory Variables by combining the results in [18] with those in [19].
  19. Here I omit observable covariates [math]\ex[/math] for simplicity.
  20. Specifically, [math]\succ[/math] is an asymmetric, transitive and complete binary relation.
  21. Here I suppress covariates for simplicity.
  22. Completeness of information is motivated by the idea that firms in the industry have settled in a long-run equilibrium, and have detailed knowledge of both their own and their rivals' profit functions.
  23. This figure is based on Figure 1 in [20].
  24. The same reasoning given here applies if instead of mixed strategy Nash the solution concept is correlated equilibrium, by replacing the set of MSNE below with the set of correlated equilibria.
  25. This figure is based on Figure 1 in [21].
  26. See [22](Section 3) and [23] for a thorough discussion of the literature on identification problems in games of incomplete information with multiple Bayesian Nash equilibria (BNE). [24] explain how to extend the approach proposed by [25] to obtain outer regions on [math]\theta[/math] when no restrictions are imposed on the equilibrium selection mechanism that chooses among the multiple BNE.
  27. Both the independence assumption and the correct common prior assumption are maintained here to simplify exposition. Both could be relaxed with no conceptual difficulty, though computation of the set of Bayesian Nash equilibria, for example, would become more cumbersome.
  28. Examples of departures from the standard model include the case where active bidding by a player's opponents may eliminate her incentives to bid close to her valuation or at all; the econometrician does not precisely observe the point at which each bidder drops out; there are discrete bid increments; etc.
  29. If there is a reserve price [math]r \gt \underline{v}[/math], nothing can be learned about [math]\sQ(\ev\in [\underline{v},v])[/math] for any [math]v \lt r[/math]. In that case, one can learn features of the truncated distribution of valuations using the same insights summarized here.
  30. Using the same convention as for the bids, [math]\ev_{i:n}[/math] denotes the [math]i[/math]-th lowest of the [math]n[/math] valuations.
  31. Note that [math]\eb_{i:n}[/math] needs not be the bid made by the bidder with valuation [math]\ev_{i:n}[/math].
  32. [26](Appendix D) provide the discussion summarized here. Additionally, in their Appendix B, they give a simple example of a two-bidder auction satisfying all assumptions in Identification Problem, where two different distributions [math]\sQ[/math] and [math]\tilde{\sQ}[/math] yield the same distribution of ordered bids.
  33. The button auction model yields bidding behavior consistent with Identification Problem.
  34. Equations D1 in [27] and \eqref{eq:RCS_auction} here differ in that the latter also requires bids to be ordered. This observation was besides the point in [28] discussion that led to equation D1.
  35. For a review of the literature on peer group effect analysis, see, e.g., [29], [30], [31], and [32].
  36. Undirected means that if a link from node [math]i[/math] to node [math]j[/math] exists, then the link from [math]j[/math] to [math]i[/math] exists. The discussion that follows can be generalized to the case of models with transferable utility.
  37. Here I consider a framework where the agents have complete information.
  38. The effects of having friends in common and of friends of friends in \eqref{eq:utility:network:1} are normalized by [math]n-2[/math]. This enforces that the marginal utility that [math]i[/math] receives from linking with [math]j[/math] is affected by [math]j[/math] having an additional link with [math]k[/math] to a smaller degree as [math]n[/math] grows. This does not result in diminishing network effects.
  39. With transferable utility, [33](Proposition 2.1) establishes existence for any [math]\delta_2,\delta_3\in\R[/math]. See [34] for an earlier analysis of existence and uniqueness of pairwise stable networks.
  40. [35] has previously used Theorem D.1 in [36], as I do here, to characterize sharp identification regions in unilateral and bilateral directed network formation games.
  41. This number may be reduced drastically using the notion of core determining class of sets, see Definition and the discussion on Basic Definitions and Facts from Random Set Theory. Nonetheless, even with relatively few agents, the number of inequalities in \eqref{eq:SIR:networks:1} may remain overwhelming.
  42. The idea of using random set methods on subnetworks to obtain the refined region was put forward in an earlier version of [37]. She provided a proof that the refined region's size decreases weakly in [math]|A|[/math].
  43. This approach exploits supermodularity, and is related to [38] and [39].
  44. This is an approximation to a framework with a large but finite number of agents. The utility function can be less restrictive than the one considered here (see Assumptions 1 and 2 in [40]).
  45. The distance measure used here is the shortest path between two nodes.
  46. Under this assumption, the preference shocks do not depend on the individual identities of the agents. Hence, it agents [math]k[/math] and [math]m[/math] have the same observable characteristics, then [math]j[/math] is indifferent between them.
  47. Full observation of the network is not required (and in practice it often does not occur). Sampling uncertainty results from it because in this model there is a continuum of agents.
  48. The possibility that [math]\mu_{v_1(\cdot)}[/math] or [math]\sM(\cdot|\ex;\vartheta)[/math] are equal to zero can be accommodated by setting [math]q_{\alpha_H(t),\alpha_{\tilde{H}}(s)}(\vartheta)=(\mu_{v_1(t)}\sM(H|v_1(t);\vartheta)\one(t^\prime\in H))(\mu_{v_1(s)}\sM(H|v_1(s);\vartheta)\one(s^\prime\in\tilde{H}))[/math]. However, in that case [math]q[/math] depends on [math]\vartheta[/math] and its computational cost increases.
  49. Statistical inference in these papers is often carried out using the methods proposed by [41], [42], and [43]. Model specification tests, if carried out, are based on the method proposed by [44]. See Sections Confidence Sets Satisfying Various Coverage Notions and, respectively, for a discussion of confidence sets and specification tests.
  50. Their model is based on the one put forward by [45]. See [46] for a review of these and other non-expected utility models in the context of estimation of risk preferences.
  51. Auto collision coverage pays for damage to the insured vehicle caused by a collision with another vehicle or object, without regard to fault. Auto comprehensive coverage pays for damage to the insured vehicle from all other causes, without regard to fault. Home all perils (or simply home) coverage pays for damage to the insured home from all causes, except those that are specifically excluded (e.g., flood, earthquake, or war).
  52. Statistical inference on [math]\theta[/math] is carried out using [47]'s method.
  53. Statistical inference on projections of [math]\theta[/math] is carried out using [48]'s method.

References

  1. 1.0 1.1 1.2 Block, H.D., and J.Marschak (1960): “Random Orderings and Stochastic Theories of Responses” in Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling, ed. by I.Olkin, pp. 97--132. Stanford University Press.
  2. 2.0 2.1 Marschak, J. (1960): “Binary Choice Constraints on Random Utility Indicators” in Stanford Symposium on Mathematical Methods in the Social Sciences, ed. by K.Arrow. Stanford University Press.
  3. Hall, R.E. (1973): “On the statistical theory of unobserved components” MIT Working Paper 117, available at https://dspace.mit.edu/bitstream/handle/1721.1/63972/onstatisticalthe00hall.pdf?sequence=1.
  4. McFadden, D.L. (1975): “Tchebyscheff bounds for the space of agent characteristics” Journal of Mathematical Economics, 2(2), 225 -- 242.
  5. Falmagne, J. (1978): “A representation theorem for finite random scale systems” Journal of Mathematical Psychology, 18(1), 52 -- 72.
  6. McFadden, D.L., and M.K. Richter (1991): “Stochastic rationality and revealed stochastic preference” in Preferences, Uncertainty and Rationality, ed. by J.S. Chipman, D.L. McFadden, and M.K. Richter, pp. 161--186. Westview Press.
  7. Marschak, J., and W.H. Andrews (1944): “Random Simultaneous Equations and the Theory of Production” Econometrica, 12(3/4), 143--205.
  8. Markowitz, H. (1952): “Portfolio selection” Journal of Finance, 7, 77--91.
  9. Fisher, F.M. (1966): The Identification Problem in Econometrics. McGraw-Hill Book Company.
  10. Harrison, J., and D.M. Kreps (1979): “Martingales and arbitrage in multiperiod securities markets” Journal of Economic Theory, 20(3), 381 -- 408.
  11. Kreps, D.M. (1981): “Arbitrage and equilibrium in economies with infinitely many commodities” Journal of Mathematical Economics, 8(1), 15 -- 35.
  12. Leamer, E.E. (1981): “Is it a Demand Curve, Or Is It A Supply Curve? Partial Identification through Inequality Constraints” The Review of Economics and Statistics, 63(3), 319--327.
  13. Manski, C.F. (1988b): “Identification of Binary Response Models” Journal of the American Statistical Association, 83(403), 729--738.
  14. 14.0 14.1 14.2 Jovanovic, B. (1989): “Observable Implications of Models with Multiple Equilibria” Econometrica, 57(6), 1431--1437.
  15. Phillips, P. C.B. (1989): “Partially Identified Econometric Models” Econometric Theory, 5(2), 181--240.
  16. Hansen, L.P., and R.Jagannathan (1991): “Implications of Security Market Data for Models of Dynamic Economies” Journal of Political Economy, 99(2), 225--262.
  17. Hansen, L.P., J.Heaton, and E.G.J. Luttmer (1995): “Econometric Evaluation of Asset Pricing Models” The Review of Financial Studies, 8(2), 237--274.
  18. Luttmer, E. G.J. (1996): “Asset Pricing in Economies with Frictions” Econometrica, 64(6), 1439--1467.
  19. 19.00 19.01 19.02 19.03 19.04 19.05 19.06 19.07 19.08 19.09 19.10 19.11 19.12 19.13 19.14 Manski, C.F., and E.Tamer (2002): “Inference on Regressions with Interval Data on a Regressor or Outcome” Econometrica, 70(2), 519--546.
  20. 20.0 20.1 20.2 20.3 20.4 20.5 20.6 20.7 Tamer, E. (2003): “Incomplete Simultaneous Discrete Response Model with Multiple Equilibria” The Review of Economic Studies, 70(1), 147--165.
  21. 21.00 21.01 21.02 21.03 21.04 21.05 21.06 21.07 21.08 21.09 21.10 21.11 21.12 21.13 21.14 21.15 Ciliberto, F., and E.Tamer (2009): “Market Structure and Multiple Equilibria in Airline Markets” Econometrica, 77(6), 1791--1828.
  22. 22.00 22.01 22.02 22.03 22.04 22.05 22.06 22.07 22.08 22.09 22.10 22.11 22.12 22.13 22.14 Haile, P.A., and E.Tamer (2003): “Inference with an Incomplete Model of English Auctions” Journal of Political Economy, 111(1), 1--51.
  23. 23.0 23.1 23.2 Pakes, A. (2010): “Alternative models for moment inequalities” Econometrica, 78(6), 1783--1822.
  24. 24.0 24.1 24.2 24.3 Pakes, A., J.Porter, K.Ho, and J.Ishii (2015): “Moment Inequalities and Their Application” Econometrica, 83(1), 315--334.
  25. Manski, C.F. (2007a): Identification for Prediction and Decision. Harvard University Press.
  26. 26.0 26.1 Matzkin, R.L. (2007): “Chapter 73 -- Nonparametric identification” in Handbook of Econometrics, ed. by J.J. Heckman, and E.E. Leamer, vol.6, chap.73, pp. 5307 -- 5368. Elsevier.
  27. 27.0 27.1 McFadden, D.L. (1974): “Conditional Logit Analysis of Qualitative Choice Behavior” in Frontiers in Econometrics, ed. by P.Zarembka. Academic Press.
  28. Manski, C.F. (1975): “Maximum score estimation of the stochastic utility model of choice” Journal of Econometrics, 3(3), 205 -- 228.
  29. 29.0 29.1 29.2 Manski, C.F. (1985): “Semiparametric analysis of discrete response: Asymptotic properties of the maximum score estimator” Journal of Econometrics, 27(3), 313 -- 333.
  30. 30.0 30.1 30.2 30.3 Molchanov, I., and F.Molinari (2018): Random Sets in Econometrics. Econometric Society Monograph Series, Cambridge University Press, Cambridge UK.
  31. 31.0 31.1 31.2 31.3 Chesher, A., and A.M. Rosen (2017a): “Generalized instrumental variable models” Econometrica, 85, 959--989.
  32. 32.0 32.1 32.2 Chesher, A., and A.M. Rosen (2019): “Generalized instrumental variable models, methods, and applications” in Handbook of Econometrics. Elsevier.
  33. 33.0 33.1 33.2 Manski, C.F. (2010): “Random Utility Models with Bounded Ambiguity” in Structural Econometrics, ed. by B.Dutta, pp. 272--284. Oxford University Press, 1 edn.
  34. 34.0 34.1 34.2 34.3 Magnac, T., and E.Maurin (2008): “Partial Identification in Monotone Binary Models: Discrete Regressors and Interval Data” The Review of Economic Studies, 75(3), 835--864.
  35. 35.0 35.1 Lewbel, A. (2000): “Semiparametric qualitative response model estimation with unknown heteroscedasticity or instrumental variables” Journal of Econometrics, 97(1), 145 -- 177.
  36. Beresteanu, A., and F.Molinari (2008): “Asymptotic Properties for a Class of Partially Identified Models” Econometrica, 76(4), 763--814.
  37. Chandrasekhar, A., V.Chernozhukov, F.Molinari, and P.Schrimpf (2018): “Best linear approximations to set identified functions: with an application to the gender wage gap” CeMMAP working paper CWP09/19, available at https://www.cemmap.ac.uk/publication/id/13913.
  38. Matzkin, R.L. (1993): “Nonparametric identification and estimation of polychotomous choice models” Journal of Econometrics, 58(1), 137 -- 168.
  39. Berry, S.T., J.Levinsohn, and A.Pakes (1995): “Automobile Prices in Market Equilibrium” Econometrica, 63(4), 841--890.
  40. Petrin, A., and K.Train (2010): “A Control Function Approach to Endogeneity in Consumer Choice Models” Journal of Marketing Research, 47(1), 3--13.
  41. Hong, H., and E.Tamer (2003b): “Inference in Censored Models with Endogenous Regressors” Econometrica, 71(3), 905--932.
  42. 42.0 42.1 42.2 42.3 42.4 Chesher, A., A.M. Rosen, and K.Smolinski (2013): “An instrumental variable model of multiple discrete choice” Quantitative Economics, 4(2), 157--196.
  43. 43.0 43.1 Manski, C.F. (1977): “The structure of random utility models” Theory and Decision, 8(3), 229--254.
  44. 44.0 44.1 Caplin, A. (2016): “Measuring and Modeling Attention” Annual Review of Economics, 8(1), 379--403.
  45. Simon, H.A. (1959): “Theories of Decision-Making in Economics and Behavioral Science” The American Economic Review, 49(3), 253--283.
  46. Howard, J.A. (1963): Consumer behavior: application of theory. New York: McGraw-Hill, Includes indexes.
  47. Tversky, A. (1972): “Elimination by aspects: A theory of choice” Psychological review, 79(4), 281.
  48. 48.0 48.1 48.2 48.3 48.4 48.5 48.6 48.7 Barseghyan, L., M.Coughlin, F.Molinari, and J.C. Teitelbaum (2019): “Heterogeneous Choice Sets and Preferences” available at https://arxiv.org/abs/1907.02337.
  49. Masatlioglu, Y., D.Nakajima, and E.Y. Ozbay (2012): “Revealed Attention” American Economic Review, 102(5), 2183--2205.
  50. Manzini, P., and M.Mariotti (2014): “Stochastic Choice and Consideration Sets” Econometrica, 82(3), 1153--1176.
  51. 51.0 51.1 51.2 51.3 Cattaneo, M.D., X.Ma, Y.Masatlioglu, and E.Suleymanov (2019): “A Random Attention Model” Journal of Political Economy, forthcoming, available at https://arxiv.org/abs/1712.03448.
  52. Luce, R.D., and P.Suppes (1965): “Chapter 19: Preference, Utility, and Subjective Probability” in Handbook of Mathematical Psychology, vol.3, pp. 249--410.
  53. 53.0 53.1 Abaluck, J., and A.Adams (2018): “What Do Consumers Consider Before They Choose? Identification from Asymmetric Demand Responses” available at https://abiadams.com/wp-content/uploads/2018/06/DiscreteChoiceInattention_master.pdf.
  54. 54.0 54.1 Barseghyan, L., F.Molinari, and M.Thirkettle (2019): “Discrete Choice under Risk with Limited Consideration” available at https://arxiv.org/abs/1902.06629.
  55. 55.0 55.1 55.2 55.3 55.4 55.5 55.6 55.7 55.8 Manski, C.F. (2007b): “Partial Indentification of Counterfactual Choice Probabilities” International Economic Review, 48(4), 1393--1410.
  56. Kitamura, Y., and J.Stoye (2019): “Nonparametric Counterfactuals in Random Utility Models” available at https://arxiv.org/abs/1902.08350.
  57. 57.0 57.1 Kitamura, Y., and J.Stoye (2018): “Nonparametric Analysis of Random Utility Models” Econometrica, 86(6), 1883--1909.
  58. McFadden, D.L. (2005): “Revealed Stochastic Preference: A Synthesis” Economic Theory, 26(2), 245--264.
  59. Imbens, G.W., and W.K. Newey (2009): “Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity” Econometrica, 77(5), 1481--1512.
  60. 60.0 60.1 60.2 60.3 60.4 Kamat, V. (2018): “Identification with Latent Choice Sets” available at https://arxiv.org/abs/1711.02048.
  61. 61.0 61.1 61.2 Berry, S.T., and E.Tamer (2006): “Identification in Models of Oligopoly Entry” in Advances in Economics and Econometrics: Theory and Applications, Ninth World Congress, ed. by R.Blundell, W.K. Newey, and T.E. Persson, vol.2 of Econometric Society Monographs, p. 46–85. Cambridge University Press.
  62. Heckman, J.J. (1978): “Dummy Endogenous Variables in a Simultaneous Equation System” Econometrica, 46(4), 931--959.
  63. Gourieroux, C., J.J. Laffont, and A.Monfort (1980): “Coherency Conditions in Simultaneous Linear Equation Models with Endogenous Switching Regimes” Econometrica, 48, 675--695.
  64. Schmidt, P. (1981): “Constraints on the Parameters in Simultaneous Tobit and Probit Models” in Structural Analysis of Discrete Data and Econometric Applications, ed. by C.F. Manski, and D.McFadden, chap.12, pp. 422--434. MIT Press.
  65. Maddala, G.S. (1983): Limited-Dependent and Qualitative Variables in Econometrics. Cambridge University Press, New York.
  66. Blundell, R., and J.R. Smith (1994): “Coherency and Estimation in Simultaneous Models with Censored or Qualitative Dependent Variables” Journal of Econometrics, 64, 355--373.
  67. 67.0 67.1 Bjorn, P.A., and Q.H. Vuong (1984): “Simultaneous Equations Models for Dummy Endogenous Variables: A Game Theoretic Formulation with an Application to Labor Force Participation” CIT working paper SSWP 537, California Institute of Technology, available at http://resolver.caltech.edu/CaltechAUTHORS:20170919-140310752.
  68. Bresnahan, T.F., and P.C. Reiss (1988): “Do Entry Conditions Vary Across Markets?” Brookings Papers on Economic Activity, pp. 833--871.
  69. Bresnahan, T.F., and P.C. Reiss (1990): “Entry in Monopoly Markets” The Review of Economic Studies, 57(4), 531--553.
  70. Bresnahan, T.F., and P.C. Reiss (1991): “Empirical models of discrete games” Journal of Econometrics, 48(1), 57--81.
  71. 71.0 71.1 71.2 Berry, S.T. (1992): “Estimation of a Model of Entry in the Airline Industry” Econometrica, 60(4), 889--917.
  72. Bajari, P., H.Hong, and S.P. Ryan (2010): “Identification and estimation of a discrete game of complete information” Econometrica, 78(5), 1529--1568.
  73. {\noopsort{Paula}}{de Paula}, A. (2013): “Econometric Analysis of Games with Multiple Equilibria” Annual Review of Economics, 5(1), 107--131.
  74. Kline, B., and E.Tamer (2012): “Bounds for best response functions in binary games” Journal of Econometrics, 166(1), 92 -- 105.
  75. 75.0 75.1 75.2 75.3 Aradillas-Lopez, A., and E.Tamer (2008): “The Identification Power of Equilibrium in Simple Games” Journal of Business & Economic Statistics, 26(3), 261--283.
  76. Molinari, F., and A.M. Rosen (2008): “The Identification Power of Equilibrium in Games: The Supermodular Case (Comment on Aradillas-Lopez and Tamer, 2008)” Journal of Business and Economic Statistics, 26(3), 297--302.
  77. 77.00 77.01 77.02 77.03 77.04 77.05 77.06 77.07 77.08 77.09 77.10 Beresteanu, A., I.Molchanov, and F.Molinari (2011): “Sharp identification regions in models with convex moment predictions” Econometrica, 79(6), 1785--1821.
  78. 78.0 78.1 Galichon, A., and M.Henry (2011): “Set Identification in Models with Multiple Equilibria” The Review of Economic Studies, 78(4), 1264--1298.
  79. Rockafellar, R. (1970): Convex Analysis, Princeton landmarks in mathematics and physics. Princeton University Press.
  80. Grant, M., and S.Boyd (2010): “{CVX}: Matlab Software for Disciplined Convex Programming, Version 1.21” available at http://cvxr.com/cvx.
  81. Molchanov, I. (2017): Theory of Random Sets. Springer, London, 2 edn.
  82. Schneider, R. (1993): Convex Bodies: The Brunn-Minkowski Theory, Encyclopedia of Mathematics and its Applications. Cambridge University Press, 1 edn.
  83. {\noopsort{Paula}}{de Paula}, A., and X.Tang (2012): “Inference of Signs of Interaction Effects in Simultaneous Games With Incomplete Information” Econometrica, 80(1), 143--172.
  84. 84.0 84.1 Grieco, P. L.E. (2014): “Discrete games with flexible information structures: an application to local grocery markets” The RAND Journal of Economics, 45(2), 303--340.
  85. Milgrom, P.R., and R.J. Weber (1982): “A Theory of Auctions and Competitive Bidding” Econometrica, 50(5), 1089--1122.
  86. Tang, X. (2011): “Bounds on revenue distributions in counterfactual auctions with reserve prices” The RAND Journal of Economics, 42(1), 175--203.
  87. Armstrong, T.B. (2013): “Bounds in auctions with unobserved heterogeneity” Quantitative Economics, 4(3), 377--415.
  88. 88.0 88.1 Aradillas‐López, A., A.Gandhi, and D.Quint (2013): “Identification and Inference in Ascending Auctions With Correlated Private Values” Econometrica, 81(2), 489--534.
  89. Athey, S., and P.A. Haile (2002): “Identification of Standard Auction Models” Econometrica, 70(6), 2107--2140.
  90. Komarova, T. (2013): “Partial identification in asymmetric auctions in the absence of independence” The Econometrics Journal, 16(1), S60--S92.
  91. Gentry, M., and T.Li (2014): “Identification in auctions with selective entry” Econometrica, 82(1), 315--344.
  92. 92.0 92.1 Syrgkanis, V., E.Tamer, and J.Ziani (2018): “Inference on auctions with weak assumptions on information” available at https://arxiv.org/abs/1710.03830.
  93. 93.0 93.1 Bergemann, D., and S.Morris (2016): “Bayes correlated equilibrium and the comparison of information structures in games” Theoretical Economics, 11(2), 487--522.
  94. Yang, Z. (2006): “Correlated equilibrium and the estimation of static discrete games with complete information” available at https://ideas.repec.org/p/pra/mprapa/79395.html.
  95. 95.0 95.1 95.2 Magnolfi, L., and C.Roncoroni (2017): “Estimation of Discrete Games with Weak Assumptions on Information” available at http://lorenzomagnolfi.com/estimdiscretegames.
  96. Chesher, A., and A.M. Rosen (2017b): “Incomplete English auction models with heterogeneity” CeMMAP working paper CWP27/17, available at https://www.cemmap.ac.uk/publication/id/9277.
  97. 97.0 97.1 Graham, B.S. (2015): “Methods of Identification in Social Networks” Annual Review of Economics, 7(1), 465--485.
  98. 98.0 98.1 Chandrasekhar, A. (2016): “Econometrics of Network Formation” in Oxford Handbook on the Economics of Networks, ed. by Y.Bramoulle, A.Galeotti, and B.Rogers, chap.13. Oxford University Press.
  99. 99.0 99.1 {\noopsort{Paula}}{de Paula}, A. (2017): “Econometrics of Network Models” in Advances in Economics and Econometrics: Eleventh World Congress, ed. by B.Honoré, A.Pakes, M.Piazzesi, and L.Samuelson, vol.1 of Econometric Society Monographs, p. 268–323. Cambridge University Press.
  100. Graham, B.S. (2019): “The Econometric Analysis of Networks” in Handbook of Econometrics. Elsevier.
  101. Jackson, M.O., and A.Wolinsky (1996): “A Strategic Model of Social and Economic Networks” Journal of Economic Theory, 71(1), 44 -- 74.
  102. 102.00 102.01 102.02 102.03 102.04 102.05 102.06 102.07 102.08 102.09 102.10 102.11 102.12 Sheng, S. (2018): “A structural econometric analysis of network formation games through subnetworks” Econometrica, accepted for publication.
  103. 103.0 103.1 Miyauchi, Y. (2016): “Structural estimation of pairwise stable networks with nonnegative externality” Journal of Econometrics, 195(2), 224 -- 235.
  104. 104.0 104.1 104.2 Gualdani, C. (2019): “An Econometric Model of Network Formation with an Application to Board Interlocks Between Firms” available at http://docs.wixstatic.com/ugd/063589_b751c9f9c4e34d51b4da7ed7e007080a.pdf.
  105. 105.00 105.01 105.02 105.03 105.04 105.05 105.06 105.07 105.08 105.09 105.10 105.11 {\noopsort{Paula}}{de Paula}, A., S.Richards-Shubik, and E.Tamer (2018): “Identifying Preferences in Networks With Bounded Degree” Econometrica, 86(1), 263--288.
  106. Ho, K. (2009): “Insurer-Provider Networks in the Medical Care Market” The American Economic Review, 99(1), 393--430.
  107. Ho, K., J.Ho, and J.H. Mortimer (2012): “The Use of Full-Line Forcing Contracts in the Video Rental Industry” The American Economic Review, 102(2), 686--719.
  108. Lee, R.S. (2013): “Vertical Integration and Exclusivity in Platform and Two-Sided Markets” The American Economic Review, 103(7), 2960--3000.
  109. Holmes, T.J. (2011): “The diffusion of Wal-mart and economies of density” Econometrica, 79(1), 253--302.
  110. Ellickson, P.B., S.Houghton, and C.Timmins (2013): “Estimating network economies in retail chains: a revealed preference approach” The RAND Journal of Economics, 44(2), 169--193.
  111. Kawai, K., and Y.Watanabe (2013): “Inferring Strategic Voting” American Economic Review, 103(2), 624--62.
  112. Eizenberg, A. (2014): “Upstream Innovation and Product Variety in the U.S. Home PC Market” The Review of Economic Studies, 81(3), 1003--1045.
  113. Ho, K., and A.Pakes (2014): “Hospital Choices, Hospital Prices, and Financial Incentives to Physicians” The American Economic Review, 104(12), 3841--3884.
  114. Dickstein, M.J., and E.Morales (2018): “What do Exporters Know?” The Quarterly Journal of Economics, 133(4), 1753--1801.
  115. Wollmann, T.G. (2018): “Trucks without Bailouts: Equilibrium Product Characteristics for Commercial Vehicles” American Economic Review, 108(6), 1364--1406.
  116. Blundell, R., M.Browning, and I.Crawford (2008): “Best Nonparametric Bounds on Demand Responses” Econometrica, 76(6), 1227--1262.
  117. Blundell, R., D.Kristensen, and R.Matzkin (2014): “Bounding quantile demand functions using revealed preference inequalities” Journal of Econometrics, 179(2), 112 -- 127.
  118. Hoderlein, S., and J.Stoye (2014): “Revealed Preferences in a Heterogeneous Population” Review of Economics and Statistics, 96(2), 197--213.
  119. Hoderlein, S., and J.Stoye (2015): “Testing stochastic rationality and predicting stochastic demand: the case of two goods” Economic Theory Bulletin, 3(2), 313–328.
  120. 120.0 120.1 120.2 Manski, C.F. (2014): “Identification of income–leisure preferences and evaluation of income tax policy” Quantitative Economics, 5(1), 145--174.
  121. 121.0 121.1 121.2 Barseghyan, L., F.Molinari, and J.C. Teitelbaum (2016): “Inference under stability of risk preferences” Quantitative Economics, 7(2), 367--409.
  122. 122.0 122.1 122.2 Hausman, J.A., and W.K. Newey (2016): “Individual Heterogeneity and Average Welfare” Econometrica, 84(3), 1225--1248.
  123. Adams, A. (2019): “Mutually Consistent Revealed Preference Demand Predictions” American Economic Journal: Microeconomics, forthcoming.
  124. Kline, P., and M.Tartari (2016): “Bounding the Labor Supply Responses to a Randomized Welfare Experiment: A Revealed Preference Approach” American Economic Review, 106(4), 972--1014.
  125. Norets, A., and X.Tang (2014): “{Semiparametric Inference in Dynamic Binary Choice Models}” The Review of Economic Studies, 81(3), 1229--1262.
  126. Rust, J. (1987): “Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher” Econometrica, 55(5), 999--1033.
  127. Berry, S.T., and G.Compiani (2019): “An Instrumental Variable Approach to Dynamic Models” available at https://drive.google.com/file/d/1pl1PW1w8eh3gnrTMKUBuS6T6TIKtvf9c/view.
  128. 128.0 128.1 Iaryczower, M., X.Shi, and M.Shum (2018): “Can Words Get in the Way? The Effect of Deliberation in Collective Decision Making” Journal of Political Economy, 126(2), 688--734.
  129. D'Haultfoeuille, X., C.Gaillac, and A.Maurel (2018): “Rationalizing Rational Expectations? Tests and Deviations” NBER working paper 25274, available at https://www.nber.org/papers/w25274.
  130. Andrews, D. W.K., and X.Shi (2017): “Inference based on many conditional moment inequalities” Journal of Econometrics, 196(2), 275 -- 287.
  131. Tebaldi, P., A.Torgovitsky, and H.Yang (2019): “Nonparametric Estimates of Demand in the California Health Insurance Exchange” NBER Working Paper No. 25827, available at https://www.nber.org/papers/w25827.
  132. Honoré, B.E., and E.Tamer (2006): “Bounds on Parameters in Panel Dynamic Discrete Choice Models” Econometrica, 74(3), 611--629.
  133. Rosen, A.M. (2012): “Set identification via quantile restrictions in short panels” Journal of Econometrics, 166(1), 127 -- 137.
  134. Chernozhukov, V., I.Fernández-Val, J.Hahn, and W.Newey (2013): “Average and quantile effects in nonseparable panel models” Econometrica, 81(2), 535--580.
  135. Khan, S., M.Ponomareva, and E.Tamer (2016): “Identification of panel data models with endogenous censoring” Journal of Econometrics, 194(1), 57 -- 75.
  136. Torgovitsky, A. (2019a): “Nonparametric Inference on State Dependence in Unemployment” Econometrica, forthcoming.
  137. 137.0 137.1 Pakes, A., and J.Porter (2016): “Moment Inequalities for Multinomial Choice with Fixed Effects” Working Paper 21893, National Bureau of Economic Research.
  138. Aristodemou, E. (2019): “Semiparametric Identification in Panel Data Discrete Response Models” available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3420016.