guide:0449d1fd96: Difference between revisions
No edit summary |
mNo edit summary |
||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Bühlmann credibility theory offers linear estimation methods suitable for a class of random effects models. While not being as predictive as Bayesian credibility estimates, the Bühlmann credibility estimates have a simple formulaic representation that only depends on two fundamental parameters which can be easily estimated using available claims data. | |||
==Linear Approximation to MMSE== | |||
Suppose <math>X_1,\ldots, X_n</math> represents data and we wish to use such data to estimate an unobservable random variable <math>Y</math>. The Bayesian credibility estimate is usually difficult to compute without adding conditions on the random variables <math>X_i</math> and <math>Y</math>. Instead of putting additional constraints on the random variables, we consider the best linear approximation of the minimum mean square estimator (Bayesian credibility estimate): | |||
<math display="block"> | |||
\begin{equation}\label{least-squares-linear-gen} \min \operatorname{E}\left[(Z -Y)^2 \right],\, Z = a_0 + \sum_{i}a_{i}X_{i}. \end{equation} | |||
</math> | |||
The estimator arising from solving \ref{least-squares-linear-gen} isn't as good as the minimum mean square estimator, but is usually far easier to compute than the latter -- there is a trade-off between predictive power and computability. | |||
===The Normal Equations=== | |||
Notice that we haven't really specified any special attributes on the random variables other than requiring that they have finite variances (or simply finite second raw moments). The (unique) solution to \ref{least-squares-linear-gen}, denoted by <math>\hat{Y}</math>, is characterized by the following ''normal equations'': | |||
#<math>\operatorname{E}[Y] = \hat{a}_0 + \sum_{i}\hat{a}_i \operatorname{E}[X_i]=\operatorname{E}[\hat{Y}]</math> (unbiasedness condition) | |||
#<math> \operatorname{E}[Y X_k]=\operatorname{E}[\hat{Y}X_k]=\sum_{i}\hat{a}_i \operatorname{E}[X_i X_k] \,\, \text{for all} \,\, k</math>. | |||
When the random variables in question all have the same expectation, the normal equations reduce to | |||
#<math>\operatorname{E}[Y] = \hat{a}_0 + \sum_{i}\hat{a}_i \operatorname{E}[Y]</math> (unbiasedness condition) | |||
#<math> \operatorname{Cov}(Y,X_k)=\operatorname{Cov}(\hat{Y},X_k)=\sum_{i}\hat{a}_i \operatorname{Cov}(X_i,X_k) \,\, \text{for all} \,\, k</math>. | |||
==The Bühlmann Model == | |||
The Bühlmann model generates data as follows: | |||
*<math>\Theta_i,\, i=1,\dots,I</math> are mutually independent, identically distributed random variables representing risk classes. | |||
*The data is represented by the random variables <math>X_{ij},\,(i=1,\dots,I \,\,, j=1,\dots,n)</math> and <math>(X_i,\Theta_i)\, (i=1,\dots,I)</math> are mutually independent, identically distributed random variables. Furthermore, the random variables <math>X_{i1},\dots,X_{in}</math> are mutually independent, identically distributed random variables conditional on knowing the value of <math>\Theta_i</math>. In other words, the data is generated in a two-step process: generate <math>I</math> risk classes then generate for each risk class an i.i.d sequence of <math>n</math> random variables with a common distribution that depends on the corresponding risk class. | |||
*The conditional mean (expectation) of <math>X_{ij}</math> given <math>\Theta_i</math> is denoted by <math>\mu(\Theta_i)</math> and the unconditional mean (the expectation of the conditional mean) is given by <math>\mu</math>. | |||
*The random variables <math>X_{ij}</math> all have finite variances, | |||
<math display="block"> | |||
\begin{equation} | |||
\label{vhm} | |||
\sigma^2 = \operatorname{E}[\sigma^2(\Theta_i)],\,\, \sigma^2(\Theta_i) = \operatorname{Var}(X_{ij}\,|\,\Theta_i) | |||
\end{equation} | |||
</math> | |||
denotes the expected process variance ('''EPV''') and | |||
<math display="block"> | |||
\begin{equation} | |||
\label{epv} | |||
\rho^2 = \operatorname{Var}[\mu(\Theta_i)] | |||
\end{equation} | |||
</math> | |||
denotes the variance of the hypothetical mean ('''VHM'''). | |||
=== Credibility Estimator === | |||
We are interested in estimating <math>\mu(\Theta_i)</math> for each <math>i</math> via credibility estimators (see [[#Credibility Estimators|Credibility Estimators]] and set <math>Y=\mu(\Theta_i)</math> ). We start with the simple Bühlmann model and set <math>I = 1</math>. Since the optimization problem \ref{least-squares-linear-gen} doesn't depend on the ordering of the data <math>X_i</math>, we must add the additional constraint <math>a_i=a_j</math> for <math>i,j >0</math>. By equation 1 of the [[#The Normal Equations|normal equations]] for this model, we must have | |||
<math display="block"> | |||
\hat{\mu}(\Theta) = \mu (1- \alpha) + \frac{\alpha}{n} \sum_{i}X_i = \mu (1- \alpha) + \alpha \overline{X} | |||
</math> | |||
for an <math>0 \leq \alpha \leq 1</math>. By equation 2 of the [[#The Normal Equations|normal equations]], we must also have | |||
<math display="block"> | |||
\begin{equation} | |||
\label{simple-normaleqn} | |||
\frac{\alpha}{n}\sum_{i}\operatorname{Cov}(X_i,X_k) = \operatorname{Cov}(\mu(\Theta),X_k) \quad \text{for all} \,\, k | |||
\end{equation} | |||
</math> | |||
with | |||
<math display="block"> | |||
\begin{eqnarray} | |||
\label{simple-coveqn-1}\operatorname{Cov}(X_i,X_k) = | |||
\begin{cases} | |||
\operatorname{E}[\operatorname{Var}(X_k|\Theta)] + \operatorname{Var}[\mu(\Theta)] & i=k \\ | |||
\operatorname{Var}[\mu(\Theta)] & i\neq k | |||
\end{cases} | |||
\\ | |||
\label{simple-coveqn-2}\operatorname{Cov}(\mu(\Theta),X_k) = \operatorname{Cov}(\mu(\Theta),\operatorname{E}[X_k|\Theta]) = \operatorname{Var}[\mu(\Theta)]. | |||
\end{eqnarray} | |||
</math> | |||
Using \ref{simple-coveqn-1} and \ref{simple-coveqn-2} in \ref{simple-normaleqn} and then solving for <math>\alpha</math> yields the following credibility estimator: | |||
<math display="block"> | |||
\begin{equation} | |||
\label{} | |||
\hat{\mu}(\Theta) = (1- \alpha) \mu + \alpha \overline{X},\quad \alpha = \frac{n}{n + \kappa},\quad \kappa = \frac{\sigma^2}{\rho^2}. | |||
\end{equation} | |||
</math> | |||
The approach to deriving the credibility estimator for the simple Bühlmann model can be imitated to derive the credibility estimator for the general Bühlmann model (arbitrary number of risk classes). The credibility estimator equals | |||
<math display="block"> | |||
\hat{\mu}(\Theta_i) = (1- \alpha) \mu + \alpha \overline{X}_i,\quad \alpha = \frac{n}{n + \kappa},\quad \kappa = \frac{\sigma^2}{\rho^2} | |||
</math> | |||
with <math>\overline{X}_i</math> denoting the average value for the data generated from risk class <math>i</math>: | |||
<math display="block"> | |||
\overline{X}_i = \frac{1}{n} \sum_{j=1}^n X_{ij}. | |||
</math> | |||
{{alert-info|Since data from distinct risk classes are mutually independent, It shouldn't be surprising that <math>\hat{\mu}(\Theta_i)</math> only depends on the data corresponding to risk class <math>\Theta_i</math>}} | |||
== Bühlmann-Straub Model == | |||
The Bühlmann-Straub Model is similar to the Bühlmann Model except that we introduce time varying exposure levels and the number of observations can vary from one risk class to another. More precisely, think of the index <math>j</math> in the Bühlmann model as a time index and let <math>v_{ij}</math> denote the exposure level (volume measure) associated with the ''average loss per exposure unit'' <math>X_{ij}</math>. The total loss associated with risk <math>i</math> at time <math>j</math> is thus <math>v_{ij}</math>. You can think of the Bühlmann model as being a special case of the Bühlmann-Straub model by setting the exposure levels to 1 at all times and for all risk classes, i.e., <math>v_{ij} = 1</math> for all <math>i</math> and <math>j</math>. We also impose the following two conditional moment conditions: | |||
#<math>\operatorname{E}[X_{ij}\,|\,\Theta_i] = \mu(\Theta_i)</math> (conditional mean is exposure invariant) | |||
#<math>\operatorname{Var}(X_{ij}\,|\,\Theta_i) = \sigma^2(\Theta_i)/v_{ij}</math> (conditional variance scales inversely with exposure). | |||
{{alert-info|Note that <math>\mu(\Theta_i)</math> and <math>\sigma^2(\Theta_i)</math> is defined as the conditional mean and conditional variance respectively when the exposure level is 1.}} | |||
===Credibility Estimator === | |||
As with the Bühlmann model, we can derive the credibility estimator for the Bühlmann-Straub model by solving the [[#The Normal Equations|normal equations]]. The credibility estimator is | |||
<math display="block"> | |||
\begin{equation} | |||
\label{bs-cred-estimator} | |||
\hat{\mu}(\Theta_i) = (1- \alpha_i) \mu + \alpha_i \overline{X}_i,\quad \alpha_i = \frac{v_i}{v_i + \kappa},\quad \kappa = \frac{\sigma^2}{\rho^2} | |||
\end{equation} | |||
</math> | |||
with <math>\overline{X}_i</math> denoting the exposure weighted average of the data (losses) generated from risk class <math>\Theta_i</math>: | |||
<math display="block"> | |||
\overline{X}_i = v_i^{-1} \sum_{j=1}^{n_i} v_{ij} X_{ij},\,\, v_i = \sum_{j=1}^{n_i} v_{ij}. | |||
</math> | |||
=== Application to Claim Frequencies === | |||
We consider the special case when the data represent claim frequencies. More precisely, we assume the same setup as with the Bühlmann-Straub model but we replace the notation a little bit by replacing <math>X_{ij}</math> with <math>F_{ij}</math> to denote the fact that we're modelling ''average claim frequency per exposure unit''. We also have the following additional assumption for the model: | |||
*The total number of claims, <math>N_{ij} = v_{ij}F_{ij}</math>, for risk class <math>\Theta_i</math> at time <math>j</math> is conditionally (conditional on <math>\Theta_i</math>) poisson distributed with mean <math>v_{ij}\Theta_i</math>. | |||
{{alert-info|1= Straightforward calculations show that the moment conditions imposed by the Bühlmann-Straub model are not violated: | |||
<math display="block"> | |||
\operatorname{E}[F_{ij}\,|\,\Theta_i] = \Theta_i,\quad\operatorname{Var}(F_{ij}\,|\,\Theta_i) = \Theta_i/v_{ij}. | |||
</math> | |||
}} | |||
Since this model is a special case of the Bühlmann-Straub model, the credibility estimator is (see \ref{bs-cred-estimator}) | |||
<math display="block"> | |||
\begin{equation} | |||
\label{bs-cred-estimator-freq} | |||
\hat{\mu}(\Theta_i) = (1- \alpha_i) \mu + \alpha_i \overline{F}_i | |||
\end{equation} | |||
</math> | |||
with | |||
<math display="block"> | |||
\begin{align} | |||
\label{claim-freq-kappa} | |||
\kappa &= \sigma^2/\rho^2 | |||
= \operatorname{E}[\Theta_i]/\operatorname{Var}(\Theta_i) | |||
= \mu/\operatorname{Var}(\Theta_i) \\ | |||
\overline{F}_i &= v_i^{-1} \sum_{j=1}^n N_{ij}. | |||
\end{align} | |||
</math> | |||
====Poisson Gamma Model==== | |||
If <math>\Theta_i</math> is Gamma distributed with shape parameter <math>\alpha</math> and scale parameter <math>\beta</math>, then | |||
<math display="block"> | |||
\mu = \alpha \beta,\, \kappa = \beta^{-1}, \, \alpha_i = \frac{v_i}{v_i + \beta^{-1}}. | |||
</math> | |||
The Bayesian credibility estimator for the Poisson-Gamma Model equals: | |||
<math display="block"> | |||
\frac{\alpha + v_i\overline{F}_i}{v_i + \beta^{-1}} = \mu (1 - \alpha_i) + \alpha_i \overline{F}_i. | |||
</math> | |||
For the Poisson-Gamma model for claim frequencies, the Bühlmann credibility estimator equals the Bayesian credibility estimator. It should be noted that we didn't need to perform any algebraic manipulations to show the equality of the estimators: the Bühlmann credibility estimator is the best ''linear'' approximation to the Bayesian credibility estimator, so if the Bayesian credibility estimator is already linear then it must equal the Bühlmann credibility estimator. | |||
{{alert-info|If the Bayesian credibility estimator is linear, then it must equal the Bühlmann credibility estimator.}} | |||
==Estimating Parameters of Interest == | |||
The credibility estimators presented so far depend on the parameters <math>\kappa</math> and <math>\mu</math> which could be unknown or difficult to compute; consequently, it would be useful to estimate these parameters based on the available data. In what follows, we assume the Bühlmann-Straub model holds for the data/observations. | |||
===Estimating μ === | |||
Since <math>\mu</math> is the unconditional mean and the expectation of each observation equals <math>\mu</math>, then a suitable unbiased estimator for <math>\mu</math> is | |||
<math display = "block"> | |||
\begin{equation} | |||
\hat{\mu} =\sum_{i=1}^I \frac{v_i}{v}\overline{X}_i= \sum_{i=1}^{I}\sum_{j=1}^{n_i}\frac{v_{ij}}{v} X_{ij} \,, \,\, v = \sum_{i=1}^I v_i = \sum_{i=1}^I\sum_{j=1}^{n_i}v_{ij}\,. | |||
\end{equation} | |||
</math> | |||
Even though the estimator above is unbiased, the recommended estimator for the unconditional mean is | |||
<math display = "block"> | |||
\hat \mu = \frac{\sum_{i=1}^I \overline{X}_i Z_i}{\sum_{i=1}^I Z_i} | |||
</math> | |||
where <math>Z_i</math> are the credibility weights. The estimator above is the best estimator in the following sense: | |||
<math display = "block"> | |||
\hat \mu = \underset{Y \in \mathcal{F}}{\operatorname{argmin}} \operatorname{E}[\left(Y - \mu \right)^2],\, \mathcal{F} = \{Y = \sum_{i,j}a_{i,j}X_{i,j} \, | \, \sum_{i,j} a_{i,j} = 1\}. | |||
</math> | |||
In other words, the estimator minimizes the mean square error among all convex combinations of the data <math>X_{i,j}</math>. Since we don't know the parameters <math>\sigma^2 </math> and <math>\rho^2</math>, we replace <math>Z_i</math> with <math>\hat Z_i </math>: | |||
<math display = "block"> | |||
\begin{equation} | |||
\label{uncond-mean-est}\hat \mu = \frac{\sum_{i=1}^I \overline{X}_i \hat Z_i}{\sum_{i=1}^I \hat Z_i}. | |||
\end{equation} | |||
</math> | |||
===Estimating σ<sup>2</sup> === | |||
Recall that <math>\sigma^2</math> is the expectation of the conditional variance; consequently, if we can estimate <math>\sigma^2(\Theta_i)</math> for each <math>i</math> then we average out all these estimates to get an estimate for <math>\sigma^2</math>. An [[wikipedia:unbiased(statistics)|unbiased]] estimator for <math>\sigma^2(\Theta_i)</math> is | |||
<math display="block"> | |||
\hat{\sigma}^2(\Theta_i) = \frac{1}{n_i -1}\sum_{j=1}^{n_i}v_{ij}(X_{ij} - \overline{X}_i)^2 | |||
</math> | |||
and thus an unbiased estimator for <math>\sigma^2</math> is given by | |||
<math display="block"> | |||
\hat{\sigma}^2 = \frac{1}{N - I}\sum_{i=1}^I\sum_{j=1}^{n_i}v_{ij}(X_{ij} - \overline{X}_i)^2 \, , \,\, N = \sum_{i=1}^In_i \,\,. | |||
</math> | |||
===Estimating ρ<sup>2</sup> === | |||
Recall that <math>\rho^2</math> is the variance of the conditional means or simply <math>\operatorname{Var}[\mu(\Theta_i)]</math>. The natural approach would be to estimate <math>\mu(\Theta_i)</math> for each <math>i</math> and then calculate a kind of weighted variance of these estimates to get an estimate for <math>\rho^2</math>. Following [[#Estimating μ|estimating μ]], we use <math>\overline{X}_i</math> to estimate <math>\mu(\Theta_i)</math> and use <math>\overline{X}</math> to estimate <math>\mu</math> (or the expectation of each <math>\overline{X}_i</math>); consequently, if the <math>v_i</math> are equal across risk classes then the following seems to be a natural choice as an estimator for <math>\rho^2</math>: | |||
<math display="block"> | |||
\hat{\rho}_1^2 = \frac{1}{I - 1} \sum_{i=1}^{I}\left(\overline{X}_i - \overline{X} \right)^2. | |||
</math> | |||
Unfortunately a straightforward calculation shows that <math>\hat{\rho}_1</math> is a biased estimator for <math>\rho^2</math>: | |||
<math display="block"> | |||
\begin{equation} | |||
\label{rho-est-exp} | |||
\operatorname{E}[\hat{\rho}^2_1] = \operatorname{Var}[X_i] = \frac{\sigma^2}{v_1} + \rho^2. | |||
\end{equation} | |||
</math> | |||
Since <math>\hat{\sigma}^2</math> is an unbiased estimator for <math>\sigma^2</math>, equation \ref{rho-est-exp} shows that | |||
<math display="block"> | |||
\hat{\rho}^2 = \hat{\rho}_1^2 - \frac{\hat{\sigma}^2}{v_1} | |||
</math> | |||
is an unbiased estimator for <math>\rho^2</math>. The general case is a little more complicated and requires some delicate calculations. We have the following proposition: | |||
<div class="card mb-4"><div class="card-header"> Proposition (Unbiasedness of <math>\hat{\rho}^2) </math></div><div class="card-body"> | |||
<p class="card-text"> | |||
<math display="block"> | |||
\hat{\rho}_2^2 = c \left (\hat{\rho}_1^2 - \frac{I \hat{\sigma}^2}{v} \right)\,, \, | |||
\hat{\rho}_1^2 = \frac{I}{I-1} \sum_{i=1}^{I}\frac{v_i}{v}\left(\overline{X}_i - \overline{X} \right)^2 | |||
</math> | |||
with | |||
<math display="block"> | |||
c = \frac{I-1}{I} \left[ \sum_{i=1}^I \frac{v_i}{v} \left(1 - \frac{v_i}{v} \right) \right]^{-1} | |||
</math> | |||
is an unbiased estimator for <math>\rho^2</math>. | |||
</p> | |||
<span class="mw-customtoggle-theo.lassoPiecewiseLinear btn btn-primary" >Show Proof</span><div class="mw-collapsible mw-collapsed" id="mw-customcollapsible-theo.lassoPiecewiseLinear"><div class="mw-collapsible-content p-3"> | |||
We have | |||
<math display="block"> | |||
\begin{align*} | |||
\operatorname{E}\left [ \sum_{i=1}^Iv_i (\,\overline{X}_i - \overline{X} \,)^2 \right] &= | |||
\operatorname{E}\left [\sum_{i=1}^I v_i \overline{X}_i^2 -2\overline{X}\sum_{i=1}^Iv_i \overline{X}_i + v \overline{X}^2 \right] | |||
\\ | |||
&= \operatorname{E}\left [\sum_{i=1}^I v_i \overline{X}_i^2 -v^{-1}\sum_{i=1}^I(v_i \overline{X}_i)^2 \right] \\ | |||
&= \sum_{i=1}^I v_i(1 - \frac{v_i}{v})\operatorname{E}[\,\overline{X}_i^2\,] \\ | |||
&= \sum_{i=1}^I v_i(1 - \frac{v_i}{v})(\frac{\sigma^2}{v_i} + \rho^2) \\ | |||
&= (I-1)\sigma^2+\rho^2 \sum_{i=1}^I v_i(1 - \frac{v_i}{v}) | |||
\end{align*} | |||
</math> | |||
and thus if | |||
<math display="block"> | |||
\hat \rho^2_1 = \frac{I}{I-1} \sum_{i=1}^{I}\frac{v_i}{v}\left(\overline{X}_i - \overline{X} \right)^2 | |||
</math> | |||
then | |||
<math display="block"> | |||
\operatorname{E}[\hat \rho^2_1] = \frac{I\sigma^2}{v}+\rho^2 c^{-1}\,,\, c = \frac{I-1}{I} \left[ \sum_{i=1}^I \frac{v_i}{v} \left(1 - \frac{v_i}{v} \right) \right]^{-1}. | |||
</math> | |||
Hence | |||
<math display="block"> | |||
c \left[\hat \rho^2_1 - \frac{I\hat{\sigma}^2}{v}\right] | |||
</math> | |||
is an unbiased estimator for <math>\rho^2</math>. ■</div></div></div></div> | |||
Since an estimator for <math>\rho^2</math> should always be nonnegative, the estimator is defined as | |||
<math display="block"> | |||
\hat{\rho}^2 = \max(0,\hat{\rho}_2^2). | |||
</math> | |||
{{alert-warning|If <math>\rho^2</math> is estimated to be 0, then the credibility estimator is estimated to be <math>\mu</math> (or <math>\hat{\mu}</math> when the unconditional mean isn't known).}} | |||
===Estimating κ === | |||
An estimate for <math>\kappa</math> can be obtained by dividing the estimate for <math>\sigma^2</math> by the estimate for <math>\rho^2</math>: | |||
<math display="block"> | |||
\hat{\kappa} = \hat{\sigma}^2/\hat{\rho}^2\,\,. | |||
</math> | |||
===Semiparametric Estimation === | |||
As we have seen (recall [[#Application to Claim Frequencies|Application to Claim Frequencies]]), it is sometimes the case that the conditional distribution of <math>X_{ij}</math> given <math>\Theta_i</math> has an explicit (parametric) representation and this can yield a simpler representation for <math>\kappa</math>. Consider the standard example covered in [[#Application to Claim Frequencies|Application to Claim Frequencies]] where the claim frequencies are Poisson distributed with mean <math>\Theta_i</math>. Equation \ref{claim-freq-kappa} shows that a suitable estimator for <math>\kappa</math> is given by | |||
<math display="block"> | |||
\hat{\kappa} = \hat{\mu}/\hat{\rho}^2\,\,. | |||
</math> | |||
and thus we don't have to estimate <math>\sigma^2</math>. The estimator <math>\hat \mu </math> is set to <math>\overline{X}</math> since we can't use the usual estimator (\ref{uncond-mean-est}) (the estimator needs <math>\hat k </math>). | |||
==Wikipedia References== | |||
*{{cite journal | last = Bühlmann | first = Hans | year = 1967 | title = Experience rating and credibility | url = http://www.casact.org/library/astin/vol4no3/199.pdf | publisher = ASTIN Bulletin | volume = 4 | issue = 3 | pages = 99–207}} | |||
*{{cite web |url = https://en.wikipedia.org/w/index.php?title=B%C3%BChlmann_model&oldid=958791017 | title= Bühlmann model | author = Wikipedia contributors | website= Wikipedia |publisher= Wikipedia |access-date = 23 October 2020 }} |
Latest revision as of 23:50, 29 June 2023
Bühlmann credibility theory offers linear estimation methods suitable for a class of random effects models. While not being as predictive as Bayesian credibility estimates, the Bühlmann credibility estimates have a simple formulaic representation that only depends on two fundamental parameters which can be easily estimated using available claims data.
Linear Approximation to MMSE
Suppose [math]X_1,\ldots, X_n[/math] represents data and we wish to use such data to estimate an unobservable random variable [math]Y[/math]. The Bayesian credibility estimate is usually difficult to compute without adding conditions on the random variables [math]X_i[/math] and [math]Y[/math]. Instead of putting additional constraints on the random variables, we consider the best linear approximation of the minimum mean square estimator (Bayesian credibility estimate):
The estimator arising from solving \ref{least-squares-linear-gen} isn't as good as the minimum mean square estimator, but is usually far easier to compute than the latter -- there is a trade-off between predictive power and computability.
The Normal Equations
Notice that we haven't really specified any special attributes on the random variables other than requiring that they have finite variances (or simply finite second raw moments). The (unique) solution to \ref{least-squares-linear-gen}, denoted by [math]\hat{Y}[/math], is characterized by the following normal equations:
- [math]\operatorname{E}[Y] = \hat{a}_0 + \sum_{i}\hat{a}_i \operatorname{E}[X_i]=\operatorname{E}[\hat{Y}][/math] (unbiasedness condition)
- [math] \operatorname{E}[Y X_k]=\operatorname{E}[\hat{Y}X_k]=\sum_{i}\hat{a}_i \operatorname{E}[X_i X_k] \,\, \text{for all} \,\, k[/math].
When the random variables in question all have the same expectation, the normal equations reduce to
- [math]\operatorname{E}[Y] = \hat{a}_0 + \sum_{i}\hat{a}_i \operatorname{E}[Y][/math] (unbiasedness condition)
- [math] \operatorname{Cov}(Y,X_k)=\operatorname{Cov}(\hat{Y},X_k)=\sum_{i}\hat{a}_i \operatorname{Cov}(X_i,X_k) \,\, \text{for all} \,\, k[/math].
The Bühlmann Model
The Bühlmann model generates data as follows:
- [math]\Theta_i,\, i=1,\dots,I[/math] are mutually independent, identically distributed random variables representing risk classes.
- The data is represented by the random variables [math]X_{ij},\,(i=1,\dots,I \,\,, j=1,\dots,n)[/math] and [math](X_i,\Theta_i)\, (i=1,\dots,I)[/math] are mutually independent, identically distributed random variables. Furthermore, the random variables [math]X_{i1},\dots,X_{in}[/math] are mutually independent, identically distributed random variables conditional on knowing the value of [math]\Theta_i[/math]. In other words, the data is generated in a two-step process: generate [math]I[/math] risk classes then generate for each risk class an i.i.d sequence of [math]n[/math] random variables with a common distribution that depends on the corresponding risk class.
- The conditional mean (expectation) of [math]X_{ij}[/math] given [math]\Theta_i[/math] is denoted by [math]\mu(\Theta_i)[/math] and the unconditional mean (the expectation of the conditional mean) is given by [math]\mu[/math].
- The random variables [math]X_{ij}[/math] all have finite variances,
denotes the expected process variance (EPV) and
denotes the variance of the hypothetical mean (VHM).
Credibility Estimator
We are interested in estimating [math]\mu(\Theta_i)[/math] for each [math]i[/math] via credibility estimators (see Credibility Estimators and set [math]Y=\mu(\Theta_i)[/math] ). We start with the simple Bühlmann model and set [math]I = 1[/math]. Since the optimization problem \ref{least-squares-linear-gen} doesn't depend on the ordering of the data [math]X_i[/math], we must add the additional constraint [math]a_i=a_j[/math] for [math]i,j \gt0[/math]. By equation 1 of the normal equations for this model, we must have
for an [math]0 \leq \alpha \leq 1[/math]. By equation 2 of the normal equations, we must also have
with
Using \ref{simple-coveqn-1} and \ref{simple-coveqn-2} in \ref{simple-normaleqn} and then solving for [math]\alpha[/math] yields the following credibility estimator:
The approach to deriving the credibility estimator for the simple Bühlmann model can be imitated to derive the credibility estimator for the general Bühlmann model (arbitrary number of risk classes). The credibility estimator equals
with [math]\overline{X}_i[/math] denoting the average value for the data generated from risk class [math]i[/math]:
Bühlmann-Straub Model
The Bühlmann-Straub Model is similar to the Bühlmann Model except that we introduce time varying exposure levels and the number of observations can vary from one risk class to another. More precisely, think of the index [math]j[/math] in the Bühlmann model as a time index and let [math]v_{ij}[/math] denote the exposure level (volume measure) associated with the average loss per exposure unit [math]X_{ij}[/math]. The total loss associated with risk [math]i[/math] at time [math]j[/math] is thus [math]v_{ij}[/math]. You can think of the Bühlmann model as being a special case of the Bühlmann-Straub model by setting the exposure levels to 1 at all times and for all risk classes, i.e., [math]v_{ij} = 1[/math] for all [math]i[/math] and [math]j[/math]. We also impose the following two conditional moment conditions:
- [math]\operatorname{E}[X_{ij}\,|\,\Theta_i] = \mu(\Theta_i)[/math] (conditional mean is exposure invariant)
- [math]\operatorname{Var}(X_{ij}\,|\,\Theta_i) = \sigma^2(\Theta_i)/v_{ij}[/math] (conditional variance scales inversely with exposure).
Credibility Estimator
As with the Bühlmann model, we can derive the credibility estimator for the Bühlmann-Straub model by solving the normal equations. The credibility estimator is
with [math]\overline{X}_i[/math] denoting the exposure weighted average of the data (losses) generated from risk class [math]\Theta_i[/math]:
Application to Claim Frequencies
We consider the special case when the data represent claim frequencies. More precisely, we assume the same setup as with the Bühlmann-Straub model but we replace the notation a little bit by replacing [math]X_{ij}[/math] with [math]F_{ij}[/math] to denote the fact that we're modelling average claim frequency per exposure unit. We also have the following additional assumption for the model:
- The total number of claims, [math]N_{ij} = v_{ij}F_{ij}[/math], for risk class [math]\Theta_i[/math] at time [math]j[/math] is conditionally (conditional on [math]\Theta_i[/math]) poisson distributed with mean [math]v_{ij}\Theta_i[/math].
Since this model is a special case of the Bühlmann-Straub model, the credibility estimator is (see \ref{bs-cred-estimator})
with
Poisson Gamma Model
If [math]\Theta_i[/math] is Gamma distributed with shape parameter [math]\alpha[/math] and scale parameter [math]\beta[/math], then
The Bayesian credibility estimator for the Poisson-Gamma Model equals:
For the Poisson-Gamma model for claim frequencies, the Bühlmann credibility estimator equals the Bayesian credibility estimator. It should be noted that we didn't need to perform any algebraic manipulations to show the equality of the estimators: the Bühlmann credibility estimator is the best linear approximation to the Bayesian credibility estimator, so if the Bayesian credibility estimator is already linear then it must equal the Bühlmann credibility estimator.
Estimating Parameters of Interest
The credibility estimators presented so far depend on the parameters [math]\kappa[/math] and [math]\mu[/math] which could be unknown or difficult to compute; consequently, it would be useful to estimate these parameters based on the available data. In what follows, we assume the Bühlmann-Straub model holds for the data/observations.
Estimating μ
Since [math]\mu[/math] is the unconditional mean and the expectation of each observation equals [math]\mu[/math], then a suitable unbiased estimator for [math]\mu[/math] is
Even though the estimator above is unbiased, the recommended estimator for the unconditional mean is
where [math]Z_i[/math] are the credibility weights. The estimator above is the best estimator in the following sense:
In other words, the estimator minimizes the mean square error among all convex combinations of the data [math]X_{i,j}[/math]. Since we don't know the parameters [math]\sigma^2 [/math] and [math]\rho^2[/math], we replace [math]Z_i[/math] with [math]\hat Z_i [/math]:
Estimating σ2
Recall that [math]\sigma^2[/math] is the expectation of the conditional variance; consequently, if we can estimate [math]\sigma^2(\Theta_i)[/math] for each [math]i[/math] then we average out all these estimates to get an estimate for [math]\sigma^2[/math]. An unbiased estimator for [math]\sigma^2(\Theta_i)[/math] is
and thus an unbiased estimator for [math]\sigma^2[/math] is given by
Estimating ρ2
Recall that [math]\rho^2[/math] is the variance of the conditional means or simply [math]\operatorname{Var}[\mu(\Theta_i)][/math]. The natural approach would be to estimate [math]\mu(\Theta_i)[/math] for each [math]i[/math] and then calculate a kind of weighted variance of these estimates to get an estimate for [math]\rho^2[/math]. Following estimating μ, we use [math]\overline{X}_i[/math] to estimate [math]\mu(\Theta_i)[/math] and use [math]\overline{X}[/math] to estimate [math]\mu[/math] (or the expectation of each [math]\overline{X}_i[/math]); consequently, if the [math]v_i[/math] are equal across risk classes then the following seems to be a natural choice as an estimator for [math]\rho^2[/math]:
Unfortunately a straightforward calculation shows that [math]\hat{\rho}_1[/math] is a biased estimator for [math]\rho^2[/math]:
Since [math]\hat{\sigma}^2[/math] is an unbiased estimator for [math]\sigma^2[/math], equation \ref{rho-est-exp} shows that
is an unbiased estimator for [math]\rho^2[/math]. The general case is a little more complicated and requires some delicate calculations. We have the following proposition:
We have
and thus if
then
Hence
Since an estimator for [math]\rho^2[/math] should always be nonnegative, the estimator is defined as
Estimating κ
An estimate for [math]\kappa[/math] can be obtained by dividing the estimate for [math]\sigma^2[/math] by the estimate for [math]\rho^2[/math]:
Semiparametric Estimation
As we have seen (recall Application to Claim Frequencies), it is sometimes the case that the conditional distribution of [math]X_{ij}[/math] given [math]\Theta_i[/math] has an explicit (parametric) representation and this can yield a simpler representation for [math]\kappa[/math]. Consider the standard example covered in Application to Claim Frequencies where the claim frequencies are Poisson distributed with mean [math]\Theta_i[/math]. Equation \ref{claim-freq-kappa} shows that a suitable estimator for [math]\kappa[/math] is given by
and thus we don't have to estimate [math]\sigma^2[/math]. The estimator [math]\hat \mu [/math] is set to [math]\overline{X}[/math] since we can't use the usual estimator (\ref{uncond-mean-est}) (the estimator needs [math]\hat k [/math]).
Wikipedia References
- Bühlmann, Hans (1967). "Experience rating and credibility" 4 (3): 99–207. ASTIN Bulletin.
- Wikipedia contributors. "Bühlmann model". Wikipedia. Wikipedia. Retrieved 23 October 2020.