guide:A85a6b6ff1: Difference between revisions
mNo edit summary |
mNo edit summary |
||
Line 150: | Line 150: | ||
As a rule of thumb, the difficulty in computing estimators of identification regions and confidence sets depends on whether a closed form expression is available for the boundary of the set. | As a rule of thumb, the difficulty in computing estimators of identification regions and confidence sets depends on whether a closed form expression is available for the boundary of the set. | ||
For example, often nonparametric bounds on functionals of a partially identified distribution are known functionals of observed conditional distributions, as in [[guide:Ec36399528#sec:prob:distr |Section]]. | For example, often nonparametric bounds on functionals of a partially identified distribution are known functionals of observed conditional distributions, as in [[guide:Ec36399528#sec:prob:distr |Section]]. | ||
Then | Then “plug in” estimation is possible, and the computational cost is the same as for estimation and construction of confidence intervals (or confidence bands) for point-identified nonparametric regressions (incurred twice, once for the lower bound and once for the upper bound). | ||
Similarly, support function based inference is easy to implement when <math>\idr{\theta}</math> is convex. | Similarly, support function based inference is easy to implement when <math>\idr{\theta}</math> is convex. | ||
Line 176: | Line 176: | ||
On the other hand, there is a paucity of portable software implementing the theoretical methods for inference in structural partially identified models discussed in [[guide:6d1a428897#sec:inference |Section]]. | On the other hand, there is a paucity of portable software implementing the theoretical methods for inference in structural partially identified models discussed in [[guide:6d1a428897#sec:inference |Section]]. | ||
<ref name="cil:tam09"></ref> compute <ref name="che:hon:tam07"></ref> confidence sets for a parameter vector in <math>\R^d</math> in an entry game with six players, with <math>d</math> in the order of <math>20</math> and with tens of thousands of inequalities, through a | <ref name="cil:tam09"></ref> compute <ref name="che:hon:tam07"></ref> confidence sets for a parameter vector in <math>\R^d</math> in an entry game with six players, with <math>d</math> in the order of <math>20</math> and with tens of thousands of inequalities, through a “guess and verify” algorithm based on simulated annealing (with no cooling) that visits many candidate values <math>\vartheta\in\Theta</math>, evaluates <math>\crit_n(\vartheta)</math>, and builds <math>\CS</math> by retaining the visited values <math>\vartheta</math> that satisfy <math>n\crit_n(\vartheta)\le c_{1-\alpha}(\vartheta)</math> with <math>c_{1-\alpha}</math> defined to satisfy [[guide:6d1a428897#eq:CS_coverage:point:pw |eq:CS_coverage:point:pw]]. | ||
Given the computational resources commonly available at this point in time, this is a tremendously hard task, due to the dimension of <math>\theta</math> and the number of moment inequalities employed. | Given the computational resources commonly available at this point in time, this is a tremendously hard task, due to the dimension of <math>\theta</math> and the number of moment inequalities employed. | ||
Line 195: | Line 195: | ||
Recent contributions by <ref name="and:shi17"></ref>, <ref name="che:che:kat18"></ref> and <ref name="bel:bug:che18"></ref>, provide methods to build confidence set, respectively, with a continuum of conditional moment inequalities, and with a number of moment inequalities that may exceed sample size. | Recent contributions by <ref name="and:shi17"></ref>, <ref name="che:che:kat18"></ref> and <ref name="bel:bug:che18"></ref>, provide methods to build confidence set, respectively, with a continuum of conditional moment inequalities, and with a number of moment inequalities that may exceed sample size. | ||
These contributions, however, do not yet answer the question of how to optimally select inequalities to yield confidence sets with best finite sample properties according to some specified notion of | These contributions, however, do not yet answer the question of how to optimally select inequalities to yield confidence sets with best finite sample properties according to some specified notion of “best”. | ||
A different approach proposed by <ref name="che:chr:tam18"></ref> uses directly a quasi-likelihood criterion function. | A different approach proposed by <ref name="che:chr:tam18"></ref> uses directly a quasi-likelihood criterion function. | ||
Line 220: | Line 220: | ||
<ul><li> The ''true'' critical level function <math>c</math> is evaluated at an initial (uniformly randomly drawn from <math>\Theta</math>) set of points <math>\vartheta^1,\dots,\vartheta^k</math>. | <ul><li> The ''true'' critical level function <math>c</math> is evaluated at an initial (uniformly randomly drawn from <math>\Theta</math>) set of points <math>\vartheta^1,\dots,\vartheta^k</math>. | ||
These values are used to compute a current guess for the optimal value, <math>u^\top\vartheta^{*,k}=\max\{u^\top\vartheta:~\vartheta\in\{\vartheta^1,\dots,\vartheta^k\}\text{ and }\bar g(\vartheta)\le c(\vartheta)\}</math>, where <math>\bar g(\vartheta)=\max_{j=1,\dots,J}g_j(\vartheta)</math>. | These values are used to compute a current guess for the optimal value, <math>u^\top\vartheta^{*,k}=\max\{u^\top\vartheta:~\vartheta\in\{\vartheta^1,\dots,\vartheta^k\}\text{ and }\bar g(\vartheta)\le c(\vartheta)\}</math>, where <math>\bar g(\vartheta)=\max_{j=1,\dots,J}g_j(\vartheta)</math>. | ||
The | The “training data” <math>(\vartheta^{\ell},c(\vartheta^{\ell})_{\ell=1}^k</math> is used to compute an ’'approximating surface'' <math>c_k</math> through a Gaussian-process regression model (kriging), as described in <ref name="san:wil:not13"></ref>{{rp|at=Section 4.1.3}}; | ||
</li> | </li> | ||
<li> For <math>L\ge k+1</math>, with probability <math>1-\epsilon</math> the next evaluation point <math>\theta^L</math> for the ''true'' critical level function <math>c</math> is chosen by finding the point that maximizes ''expected improvement'' with respect to the ''approximating surface'', <math>\mathbb{EI}_{L-1}(\vartheta)=(u^\top\vartheta-u^\top\vartheta^{*,L-1})_+\{1-\Phi([\bar g(\vartheta)-c_{L-1}(\vartheta)]/[\hat\varsigma s_{L-1}(\vartheta)])\}</math>. | <li> For <math>L\ge k+1</math>, with probability <math>1-\epsilon</math> the next evaluation point <math>\theta^L</math> for the ''true'' critical level function <math>c</math> is chosen by finding the point that maximizes ''expected improvement'' with respect to the ''approximating surface'', <math>\mathbb{EI}_{L-1}(\vartheta)=(u^\top\vartheta-u^\top\vartheta^{*,L-1})_+\{1-\Phi([\bar g(\vartheta)-c_{L-1}(\vartheta)]/[\hat\varsigma s_{L-1}(\vartheta)])\}</math>. |
Revision as of 23:00, 30 May 2024
As a rule of thumb, the difficulty in computing estimators of identification regions and confidence sets depends on whether a closed form expression is available for the boundary of the set. For example, often nonparametric bounds on functionals of a partially identified distribution are known functionals of observed conditional distributions, as in Section. Then “plug in” estimation is possible, and the computational cost is the same as for estimation and construction of confidence intervals (or confidence bands) for point-identified nonparametric regressions (incurred twice, once for the lower bound and once for the upper bound).
Similarly, support function based inference is easy to implement when [math]\idr{\theta}[/math] is convex. Sometimes the extreme points of [math]\idr{\theta}[/math] can be expressed as known functionals of observed distributions. Even if not, level sets of convex functions are easy to compute.
But as it was shown in Section, many problems of interest yield a set [math]\idr{\theta}[/math] that is ’'not convex. In this case, [math]\idr{\theta}[/math] is obtained as a level set of a criterion function. Because [math]\idr{\theta}[/math] (or its associated confidence set) is often a subset of [math]\R^d[/math] (rather than [math]\R[/math]), even a moderate value for [math]d[/math], e.g., 8 or 10, can lead to extremely challenging computational problems. This is because if one wants to compute [math]\idr{\theta}[/math] or a set that covers it or its elements with a prespecified asymptotic probability (possibly uniformly over [math]\sP\in\cP[/math]), one has to map out a level set in [math]\R^d[/math]. If one is interested in confidence intervals for scalar projections or other smooth functions of [math]\vartheta\in\idr{\theta}[/math], one needs to solve complex nonlinear optimization problems, as for example in eq:CI:BCS and eq:KMS:proj. This can be difficult to do, especially because [math]c_{1-\alpha}(\vartheta)[/math] is typically an unknown function of [math]\vartheta[/math] for which gradients are not available in closed form.
Mirroring the fact that computation is easier when the boundary of [math]\idr{\theta}[/math] is a known function of observed conditional distributions, several portable software packages are available to carry out estimation and inference in this case. For example, [1] provide STATA and MatLab packages implementing the methods proposed by [2][3][4][5][6], [7][8], and [9].
[10] provides a STATA package to implement the bounds proposed by [11]. [12] provide a STATA package to implement bounds on treatment effects with endogenous and misreported treatment assignment and under the assumptions of monotone treatment selection, monotone treatment response, and monotone instrumental variables as in [6], [9], [13], [14], and [15]. The code computes the confidence intervals proposed by [16].
In the more general context of inference for a one-dimensional parameter defined by intersection bounds, as for example the one in eq:intersection:bounds, [17] and [18] provide portable STATA code implementing, respectively, methods to test hypotheses and build confidence intervals in [19] and in [20]. [21] provide portable STATA code implementing [22]'s method for estimation and inference for best linear prediction with interval outcome data as in Identification Problem.
[23] provide R code implementing [24]'s method for estimation and inference for best linear approximations of set identified functions.\medskip On the other hand, there is a paucity of portable software implementing the theoretical methods for inference in structural partially identified models discussed in Section.
[25] compute [26] confidence sets for a parameter vector in [math]\R^d[/math] in an entry game with six players, with [math]d[/math] in the order of [math]20[/math] and with tens of thousands of inequalities, through a “guess and verify” algorithm based on simulated annealing (with no cooling) that visits many candidate values [math]\vartheta\in\Theta[/math], evaluates [math]\crit_n(\vartheta)[/math], and builds [math]\CS[/math] by retaining the visited values [math]\vartheta[/math] that satisfy [math]n\crit_n(\vartheta)\le c_{1-\alpha}(\vartheta)[/math] with [math]c_{1-\alpha}[/math] defined to satisfy eq:CS_coverage:point:pw.
Given the computational resources commonly available at this point in time, this is a tremendously hard task, due to the dimension of [math]\theta[/math] and the number of moment inequalities employed. As explained in Section An Inference Approach Robust to the Presence of Multiple Equilibria, these inequalities, which in a game of entry with [math]J[/math] players and discrete observable payoff shifters are [math]2^J|\cX|[/math] (with [math]\cX[/math] the support of the observable payoff shifters), yield an outer region [math]\outr{\theta}[/math].
It is natural to wonder what are the additional challenges faced to compute [math]\idr{\theta}[/math] as described in Section Characterization of Sharpness through Random Set Theory. A definitive answer to this question is hard to obtain. If one employs ’'all inequalities listed in Theorem, the number of inequalities jumps to [math](2^{2^J}-2)|\cX|[/math], increasing the computational cost. However, as suggested by [27] and extended by other authors (e.g., [28][29][30][31]), often many moment inequalities are redundant, substantially reducing the number of inequalities to be checked. Specifically, [27] propose the notion of core determining sets, a collection of compact sets such that if the inequality in Theorem holds for these sets, it holds for all sets in [math]\cK[/math], see Definition and the surrounding discussion in Appendix. This often yields a number of restrictions similar to the one incurred to obtain outer regions. For example, [28](Section 4.2) analyze a four player, two type entry game with pure strategy Nash equilibrium as solution concept, originally proposed by [32], and show that while a direct application of Theorem entails [math]512|\cX|[/math] inequality restrictions, [math]26|\cX|[/math] suffice. In this example, [25]'s outer region is based on checking [math]18|\cX|[/math] inequalities.
A related but separate question is how to best allocate the computational effort. As one moves from partial identification analysis to finite sample considerations, one may face a trade-off between sharpness of the identification region and statistical efficiency. This is because inequalities that are redundant from the perspective of identification analysis might nonetheless be estimated with high precision, and hence improve the finite sample statistical properties of a confidence set or of a test of hypothesis.
Recent contributions by [33], [34] and [35], provide methods to build confidence set, respectively, with a continuum of conditional moment inequalities, and with a number of moment inequalities that may exceed sample size. These contributions, however, do not yet answer the question of how to optimally select inequalities to yield confidence sets with best finite sample properties according to some specified notion of “best”.
A different approach proposed by [36] uses directly a quasi-likelihood criterion function. In the context, e.g., of entry games, this entails assuming that the selection mechanism depends only on observable payoff shifters, using it to obtain the exact model implied distribution as in eq:games_model:pred, and partially identifying an enlarged parameter vector that includes [math]\theta[/math] and the selection mechanism. In an empirical application with discrete covariates, [36] apply their method to a two player entry game with correlated errors, where [math]\theta\in\R^9[/math] and the selection mechanism is a vector in [math]\R^8[/math], for a total of 17 parameters. In another application to the analysis of trade flows, their empirical application includes 46 parameters.
In terms of general purpose portable code that can be employed in moment inequality models, I am only aware of the MatLab package provided by [37] to implement the inference method of [38] for projections and smooth functions of parameter vectors in models defined by a finite number of unconditional moment (in)equalities. More broadly, their method can be used to compute confidence intervals for optimal values of optimization problems with estimated constraints. Here I summarize their approach to further highlight why the computational task is challenging even in the case of projections.
The confidence interval in eq:def:CI-eq:KMS:proj requires solving two nonlinear programs, each with a linear objective and nonlinear constraints involving a critical value which in general is an unknown function of [math]\vartheta[/math], with unknown gradient. When the dimension of the parameter vector is large, directly solving optimization problems with such constraints can be expensive even if evaluating the critical value at each [math]\vartheta[/math] is cheap.[Notes 1] Hence, [38] propose to use an algorithm (called E-A-M for Evaluation-Approximation-Maximization) to solve these nonlinear programs, which belongs to the family of ’'expected improvement algorithms (see e.g. [39][40][41](and references therein)). Given a constrained optimization problem of the form
to which eq:KMS:proj belongs,[Notes 2] the algorithm attempts to solve it by cycling over three steps:
- The true critical level function [math]c[/math] is evaluated at an initial (uniformly randomly drawn from [math]\Theta[/math]) set of points [math]\vartheta^1,\dots,\vartheta^k[/math]. These values are used to compute a current guess for the optimal value, [math]u^\top\vartheta^{*,k}=\max\{u^\top\vartheta:~\vartheta\in\{\vartheta^1,\dots,\vartheta^k\}\text{ and }\bar g(\vartheta)\le c(\vartheta)\}[/math], where [math]\bar g(\vartheta)=\max_{j=1,\dots,J}g_j(\vartheta)[/math]. The “training data” [math](\vartheta^{\ell},c(\vartheta^{\ell})_{\ell=1}^k[/math] is used to compute an ’'approximating surface [math]c_k[/math] through a Gaussian-process regression model (kriging), as described in [42](Section 4.1.3);
- For [math]L\ge k+1[/math], with probability [math]1-\epsilon[/math] the next evaluation point [math]\theta^L[/math] for the true critical level function [math]c[/math] is chosen by finding the point that maximizes expected improvement with respect to the approximating surface, [math]\mathbb{EI}_{L-1}(\vartheta)=(u^\top\vartheta-u^\top\vartheta^{*,L-1})_+\{1-\Phi([\bar g(\vartheta)-c_{L-1}(\vartheta)]/[\hat\varsigma s_{L-1}(\vartheta)])\}[/math]. Here [math]c_{L-1}(\vartheta)[/math] and [math]\hat\varsigma^2 s_{L-1}^2(\vartheta)[/math] are estimators of the posterior mean and variance of the approximating surface. To aim for global search, with probability [math]\epsilon[/math], [math]\vartheta^L[/math] is drawn uniformly from [math]\Theta[/math]. The approximating surface is then recomputed using [math](\vartheta^{\ell},c(\vartheta^{\ell})_{\ell=1}^L)[/math]. Steps 1 and 2 are repeated until a convergence criterion is met.
- The extreme point of [math]CI_n[/math] is reported as the value [math]u^\top\vartheta^{*,L}[/math] that maximizes [math]u^\top\vartheta[/math] among the evaluation points that satisfy the true constraints, i.e. [math]u^\top\vartheta^{*,L}=\max\{u^\top\vartheta:~\vartheta\in\{\vartheta^1,\dots,\vartheta^L\}\text{ and }\bar g(\vartheta)\le c(\vartheta)\}[/math].
The only place where the approximating surface is used is in Step 2, to choose a new evaluation point. In particular, the reported extreme points of [math]\CI[/math] in eq:def:CI are the extreme values of [math]u^\top\vartheta[/math] that are consistent with the true surface where this surface was computed, not with the approximating surface. [38] establish convergence of their algorithm and obtain a convergence rate, as the number of evaluation points increases, for constrained optimization problems in which the constraints are sufficiently smooth “black box” functions, building on an earlier contribution of [43]. [43] establishes convergence of an expected improvement algorithm for unconstrained optimization problems where the objective is a “black box” function. The rate of convergence that [43] derives depends on the smoothness of the black box objective function. The rate of convergence obtained by [38] depends on the smoothness of the black box constraints, and is slightly slower than [43]’s rate. [38]'s Monte Carlo experiments suggest that the E-A-M algorithm is fast and accurate at computing their confidence intervals. The E-A-M algorithm also allows for very rapid computation of projections of the confidence set proposed by [44], and for a substantial improvement in the computational time of the profiling-based confidence intervals proposed by [45].[Notes 3] In all cases, the speed improvement results from a reduced number of evaluation points required to approximate the optimum. In an application to a point identified setting, [46](Supplement Section S.3) use [38]'s E-A-M method to construct uniform confidence bands for an unknown function of interest under (nonparametric) shape restrictions. They benchmark it against gridding and find it to be accurate at considerably improved speed.
General references
Molinari, Francesca (2020). "Microeconometrics with Partial Identification". arXiv:2004.11751 [econ.EM].
Notes
- [1] propose a linearization method whereby [math]c_{1-\alpha}[/math] is calibrated through repeatedly solving bootstrap linear programs, hence it is reasonably cheap to compute.
- To see this it suffices to set [math]g_j(\vartheta)=\frac{\sqrt{n}\bar{m}_{n,j}(\vartheta)}{\hat{\sigma}_{n,j}(\vartheta)}[/math] and [math]c(\vartheta)= c_{1-\alpha}(\vartheta)[/math].
- [2]'s method does not require solving a nonlinear program such as the one in eq:KMS:proj. Rather it obtains [math]\CI[/math] as in eq:CI:BCS. However, it approximates [math]c_{1-\alpha}[/math] by repeatedly solving bootstrap nonlinear programs, thereby incurring a very high computational cost at that stage.
References
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedber:man00
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedman89
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedman90
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedman94
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedman95
- 6.0 6.1 Cite error: Invalid
<ref>
tag; no text was provided for refs namedman97:monotone
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedhor:man98
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedhor:man00
- 9.0 9.1 Cite error: Invalid
<ref>
tag; no text was provided for refs namedman:pep00
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedtau14
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedlee09
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedmcc:mil:roy15
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedkre:pep07
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedgun:kre:pep12
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedkre:pep:gun:jol12
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedimb:man04
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedche:kim:lee:ros15
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedand:kim:shi17
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedche:lee:ros13
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedand:shi13
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedber:mol:mor10
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedber:mol08
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedcha:che:mol:sch12_code
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedcha:che:mol:sch18
- 25.0 25.1 Cite error: Invalid
<ref>
tag; no text was provided for refs namedcil:tam09
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedche:hon:tam07
- 27.0 27.1 Cite error: Invalid
<ref>
tag; no text was provided for refs namedgal:hen06
- 28.0 28.1 Cite error: Invalid
<ref>
tag; no text was provided for refs namedber:mol:mol08
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedber:mol:mol11
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedche:ros:smo13
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedche:ros17
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedber:tam06
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedand:shi17
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedche:che:kat18
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedbel:bug:che18
- 36.0 36.1 Cite error: Invalid
<ref>
tag; no text was provided for refs namedche:chr:tam18
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedkai:mol:sto:thi17
- 38.0 38.1 38.2 38.3 38.4 38.5 Cite error: Invalid
<ref>
tag; no text was provided for refs namedkai:mol:sto19
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedjon:sch:wel98
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedsch:wel:jon98
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedjon01
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedsan:wil:not13
- 43.0 43.1 43.2 43.3 Cite error: Invalid
<ref>
tag; no text was provided for refs namedbul11
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedand:soa10
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedbug:can:shi17
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedfre:rev17