guide:A523054c80: Difference between revisions
mNo edit summary |
mNo edit summary |
||
Line 3: | Line 3: | ||
The '''Mack chain ladder''' method is a statistical method to estimate developmental factors in the chain ladder method. The method assumes the following: | The '''Mack chain ladder''' method is a statistical method to estimate developmental factors in the chain ladder method. The method assumes the following: | ||
*Distinct rows of the array/matrix <math>C_{ | *Distinct rows of the array/matrix <math>C_{i,j}</math> are independent. | ||
*<math>\operatorname{E}[C_{i, | *<math>\operatorname{E}[C_{i,j+1} | C_{i,0},\ldots,C_{i,j}] = f_j C_{i,j}</math> with <math>f_j</math> a constant. | ||
*<math>\operatorname{Var}[C_{i, | *<math>\operatorname{Var}[C_{i,j+1} | C_{i,0},\ldots,C_{i,j}] = \sigma_{j}^2 C_{i,j}</math> with <math>\sigma_j</math> a constant. | ||
The goal of the Mack-method is to estimate the ''factors'' <math> | The goal of the Mack-method is to estimate the ''factors'' <math>f_j</math> using the observable <math>C_{i,j}</math>. The estimators, denoted <math>\hat{f}_j</math>, then become selected age-to-age factors. The estimators are defined as follows: | ||
<math display="block"> | <math display="block"> | ||
\begin{equation} | \begin{equation} | ||
\hat{f} | \hat{f}_j = \frac{\sum_{i=0}^{I-j}C_{i,j+1}}{\sum_{i=0}^{I-j}C_{i,j}}. | ||
\end{equation} | \end{equation} | ||
</math> | </math> | ||
Line 17: | Line 17: | ||
They have the following desirable properties: | They have the following desirable properties: | ||
*<math>\hat{ | *<math>\hat{f_j}</math> is an unbiased estimator for <math>f_j</math>: <math>\operatorname{E}[\hat{f_j}] = f_j</math>. | ||
*The estimator <math>\hat{ | *The estimator <math>\hat{f_j}</math> is a ''minimum variance estimator'' in the following sense: | ||
<math display="block"> | <math display="block"> | ||
\hat{ | \hat{f_j} = \underset{X \in S_j}{\operatorname{argmin}} \operatorname{Var}[ X | A_j ],\, S_j = \{\sum_{i=0}^{I-j}w_i C_{i,j+1}/C_{i,j} | \sum_{i=0}^{I-j}w_i = 1\} | ||
</math> | </math> | ||
with <math> | with <math>A_j = \cup_{i=0}^{I-j}\{C_{i,0},\ldots,C_{i,j}\} </math> the claims ''information'' contained in the first <math>j</math> periods. | ||
<div class="text-right"> | <div class="text-right"> | ||
<proofs page="guide_proofs: | <proofs page="guide_proofs:09c903cda7" section="mack-minvar" label="Mack-Method Estimator" /> | ||
</div> | </div> | ||
== Bühlmann-Straub Credibility Model == | |||
<table class="table"> | Recalling the Bornhuetter-Ferguson method, the expected ultimate claim <math>\mu_i</math> for accident year <math>i</math> can be unknown. In the Bühlmann-Straub Credibility model, we estimate <math>\mu_i</math> using Bühlmann-Straub credibility estimates applied to the developmental triangle <math>C_{i,j}</math>. To implement this method, we recall the Bühlmann-Straub credibility model assumptions when applied to the incremental claim triangle <math>X_{i,j} = C_{i,j+1}-C_{i,j}</math>: | ||
< | |||
< | #<math>\operatorname{E}[X_{i,j}\,|\,\Theta_i] = \gamma_j\mu(\Theta_i)</math> | ||
< | #<math>\operatorname{Var}[X_{i,j}\,|\,\Theta_i] = \gamma_j\sigma^2(\Theta_i)</math> | ||
</ | #<math>\Theta_i</math> are independent and identically distributed | ||
</ | #Conditional on <math>\Theta_i</math>, <math>X_{i,j}</math> is independent and identically distributed | ||
#<math>\sum_{j}\gamma_j = 1</math> for all <math>i</math> | |||
The <math>\gamma_j</math> can be interpreted as ''relative exposure levels'' that depend on the developmental year <math>j</math>. In order to implement this method, we need to specify the relative exposure levels in advance or estimate them using, say, the [[guide:|the standard estimation techniques]]. | |||
Notice that because the <math>\Theta_i</math> are assumed to be identically distributed, the unconditional expected ultimate claim amount is the same from accident year to accident year. The Bühlmann-Straub Credibility model generally allows for the exposure levels to depend on both indices <math>i,j</math>. For stochastic reserving, we typically assume that the relative exposure levels <math>\gamma_j</math> are the same from accident year to accident year, but the base exposure level can vary from accident year to accident year. A typical case is earned premium for accident year. If <math>P_i</math> denotes the earned premium for accident year <math>i</math> then we would simply scale the data <math>C_{i,j}/P_i</math> to get claims per exposure unit. The estimate for the ultimate claim for accident year <math>i</math> is then simply equal to <math>P_i</math> multiplied by the estimate for the ultimate claim for accident year <math>i</math> obtained with the scaled data. | |||
<proc label="Bühlmann-Straub Credibility Model"> | |||
#Estimate the developmental factors <math>f_j</math> using the volume weighted method (Mack method) | |||
#Set <math>\hat{\gamma}_j = \hat{\beta}_j - \hat{\beta}_{j-1}</math> for <math>j>0</math> and <math>\hat{\gamma}_0 = \hat{\beta}_0</math>. | |||
#The estimate for the ultimate claims for accident year <math>i</math> equals <math display = "block">\hat{C}^{\textrm{BS2}}_{i,J} = \hat{\beta}_{I-i}\hat{C}_{i,J} + (1-\hat{\beta}_{I-i})\hat{C}^{\textrm{BS}}_{i,J}</math> where <math>\hat{C}_{i,J} </math> is the estimate for ultimate claims for accident year <math>i</math> based on the developmental factors <math>\hat{f}_j</math> and <math>\hat{C}^{\textrm{BS}}_{i,J}</math> is the Bühlmann-Straub Credibility estimate for <math>\mu_i</math>. | |||
#The Bühlmann-Straub Credibility estimate for <math>\mu_i</math> equals <math display = "block">\hat{C}^{\textrm{BS}}_{i,J} = Z_i\hat{C}_{i,J} + (1-Z_i)\hat{\mu}</math> where <math display = "block">Z_i = \frac{\hat{\beta}_{I-i}}{\hat{\beta}_{I-i} + \hat{v}/\hat{a}}.</math> Here <math>\hat{v}</math> is an estimate for <math>\operatorname{E}[\sigma^2(\Theta_i)]</math>, and <math>\hat{a}</math> is an estimate for <math>\operatorname{Var}[\mu(\Theta_i)]</math>. | |||
#Compute the estimates <math>\hat{\mu}</math>, <math>\hat{a}</math>, and <math>\hat{v}</math> using the formulas below: <math display = "block>\begin{align*}m_i = \hat{\beta}_{I-i}, & \,\, s_i^2 = \frac{1}{I-i}\sum_{j=0}^{I-i}\hat{\gamma}_j \left(\frac{X_{i,j}}{\hat{\gamma}_j} - \hat{C}_{i,J}\right)^2 \\ m = \sum_i m_i, &\,\, \overline{C} = \frac{\sum_{i=0}^I C_{i,I-i}}{m} \\ \hat{v} = \frac{1}{I}\sum_{i=0}^{I-1} s_i^2, &\, \, \hat{a} = \frac{\sum_{i=0}^Im_i(\hat{C}_{i,J}-\overline{C})^2 - I\hat{v}}{m - \frac{1}{m}\sum_{i=0}^I m_i^2}.\end{align*}</math> | |||
</proc> | |||
==Poisson Model== | |||
The Poisson model assumes that the incremental claim counts <math>Y_{i,j+1} = N_{i,j+1}-N_{i,j} </math> have the following properties: | |||
*<math>Y_{i,j}</math>, <math> j = 0,\ldots, J</math>, are independent random variables for all <math>i,j</math>. | |||
*<math>Y_{i,j}</math> is Poisson distributed with mean <math>\gamma_j \mu_i </math> where <math>\sum_j \gamma_j = 1 </math>. | |||
Under this model, it is a simple matter to check that the sequence <math>N_{i,j}</math> satisfies the classical assumptions for the Bornhuetter-Ferguson method: | |||
* <math>\mu_i = \operatorname{E}[N_{i,J}] </math> for all <math> 0 \leq i \leq I </math> | |||
*<math>N_{i,J} - N_{i,j} </math> is independent of <math>N_{i,j}</math> for any <math>0 \leq j < J </math> | |||
*<math>\gamma_j = \mu_i / \operatorname{E}[N_{i,j}] </math> | |||
*<math>\operatorname{Var}[N_{i,j}F_j] = F_j \operatorname{Var}[N_{i,J}]</math> where <math>F_j = \sum_{k=1}^j \gamma_k </math>. | |||
Following the Bornhuetter-Ferguson method, the projection for <math>N_{i,J}</math> equals | |||
<math display = "block"> | |||
\hat{N}_{i,J} = N_{i,I + 1 - i} + (1 - F_i^{-1})\mu_i. | |||
</math> | |||
When the parameters <math>\mu_i</math> and <math>\gamma_j</math> are unknown, they can be estimated using [[wikipedia:maximum_likelihood | maximum likelihood estimation]]. | |||
{{Alert-info| When <math>\mu_i</math> are known, the maximum likelihood estimators for <math>\gamma_j</math> yield estimated development factors <math display = "block">\hat{f}_j = \frac{\sum_{k=0}^{j+1}\hat{\gamma}_k}{\sum_{k=0}^j\hat{\gamma}_k}</math> that are equal to those obtained via the Mack-method. }} | |||
==Overdispersed Poisson Model (ODP) == | |||
Unlike the Poisson model, the overdispersed Poisson model doesn't assume anything about the probability distribution of the incremental claim count developmental triangle <math>Y_{i,j}</math>. The overdispersed model is a special case of a [[wikipedia:wikipedia:Generalized_linear_model|generalized linear model]]: | |||
*<math>Y_{i,j}</math>, <math> j = 0,\ldots, J</math>, are independent random variables, for all <math> i, j </math> | |||
*<math>\operatorname{E}[Y_{i,j}] = \gamma_je^{c+ \alpha_i + \beta_j}</math>, where <math> c> 0, \beta_1 = 0, \alpha_1 = 0</math> and <math>\sum_{j}\gamma_j = 1</math>. | |||
*<math>\operatorname{Var}[Y_{i,j}] = \phi \operatorname{E}[Y_{i,j}]</math>, where <math>\phi > 0</math> is the dispersion parameter. | |||
===Generalized Linear Models=== | |||
In a generalized linear model (GLM), each outcome <math>Y</math> of the [[wikipedia:dependent variable|dependent variable]]s is assumed to be generated from a particular [[wikipedia:probability distribution|distribution]] in an [[wikipedia:exponential family|exponential family]], a large class of [[wikipedia:probability distributions|probability distributions]] that includes the [[wikipedia:normal distribution|normal]], [[wikipedia:binomial distribution|binomial]], [[wikipedia:poisson distribution|Poisson]] and [[wikipedia:gamma distribution|gamma]] distributions, among others. The mean, <math>\mu</math>, of the distribution depends on the independent variables, <math>\mathbf{X}</math>, through: | |||
<math display = "block">\operatorname{E}[Y|\mathbf{X}] = \mu = g^{-1}(\mathbf{X}\boldsymbol{\beta}) </math> | |||
where <math>\operatorname{E}[Y|\mathbf{X}]</math> is the [[wikipedia:expected value|expected value]] of <math>Y</math> [[wikipedia:conditional expectation|conditional]] on <math>\mathbf{X}</math>; <math>\mathbf{X}\boldsymbol{\beta}</math> is the ''linear predictor'', a linear combination of unknown parameters <math>\boldsymbol{\beta}</math>; <math>g</math> is the link function. In this framework, the variance is typically a function, <math>V</math>, of the mean: | |||
<math display="block"> \operatorname{Var}[Y|\mathbf{X}] = \operatorname{V}(g^{-1}(\mathbf{X}\boldsymbol{\beta})). </math> | |||
The unknown parameters, <math>\boldsymbol{\beta}</math>, are typically estimated with [[wikipedia:maximum likelihood|maximum likelihood]], maximum [[wikipedia:quasi-likelihood|quasi-likelihood]], or [[wikipedia:Bayesian probability|Bayesian]] techniques. | |||
== Model components == | |||
The GLM consists of three elements: | |||
: 1. A particular distribution for modeling <math> Y </math> from among those which are considered exponential families of probability distributions, | |||
: 2. A linear predictor <math>\eta = \mathbf{X} \boldsymbol{\beta}</math>, and | |||
: 3. A link function <math>g</math> such that <math>\operatorname{E}[Y \mid \mathbf{X}] = \mu = g^{-1}(\eta)</math>. | |||
=== Probability distribution === | |||
An '''overdispersed exponential family''' of distributions is a generalization of an [[wikipedia:exponential family|exponential family]] and the [[wikipedia:exponential dispersion model|exponential dispersion model]] of distributions and includes those families of probability distributions, parameterized by <math>\theta</math> and <math>\phi</math>, whose density functions <math>f</math>can be expressed in the form | |||
<math display="block"> f_Y(y \mid \theta, \phi) = h(y,\phi) \exp \left(\frac{b(\theta)T(y) - A(\theta)}{d(\phi)} \right). \,\!</math> | |||
The ''dispersion parameter'', <math>\phi</math>, typically is known and is usually related to the variance of the distribution. The functions <math>h(y,\phi)</math>, <math>b(\theta)</math>, <math>T(y)</math>, <math>A(\theta)</math>, and <math>d(\phi)</math> are known. Many common distributions are in this family, including the normal, exponential, gamma, Poisson, Bernoulli, and (for fixed number of trials) binomial, multinomial, and negative binomial. | |||
If <math>b(\theta)</math> is the identity function, then the distribution is said to be in [[wikipedia:canonical form|canonical form]] (or ''natural form''). Note that any distribution can be converted to canonical form by rewriting <math>\boldsymbol\theta</math> as <math>\theta'</math> and then applying the transformation <math>\theta = b(\theta')</math>. It is always possible to convert <math>A(\theta)</math> in terms of the new parametrization, even if <math>b(\theta')</math> is not a [[wikipedia:one-to-one function|one-to-one function]]; see comments in the page on [[wikipedia:exponential families|exponential families]]. If, in addition, <math>T(y)</math> is the identity and <math>\phi</math> is known, then <math>\theta</math> is called the ''canonical parameter'' (or ''natural parameter'') and is related to the mean through | |||
<math display="block"> \mu = A'(\theta). \,\!</math> | |||
Under this scenario, the variance of the distribution can be shown to be<ref>{{harvnb|McCullagh|Nelder|1989}}, Chapter 2.</ref> | |||
<math display="block"> A''(\theta) d(\phi). \,\!</math> | |||
=== Linear predictor === | |||
The linear predictor is the quantity which incorporates the information about the independent variables into the model. The symbol ''η'' ([[wikipedia:Greek alphabet|Greek]] "[[wikipedia:Eta (letter)|eta]]") denotes a linear predictor. It is related to the [[wikipedia:expected value|expected value]] of the data through the link function. | |||
<math>\eta</math> is expressed as linear combinations (thus, "linear") of unknown parameters <math>\boldsymbol{\beta}</math>. The coefficients of the linear combination are represented as the matrix of independent variables <math>\boldsymbol{X}</math>. <math>\eta</math> can thus be expressed as <math> \eta = \mathbf{X}\boldsymbol{\beta}.\,</math> | |||
=== Link function === | |||
The link function provides the relationship between the linear predictor and the [[wikipedia:Expected value|mean]] of the distribution function. There are many commonly used link functions, and their choice is informed by several considerations. There is always a well-defined ''canonical'' link function which is derived from the exponential of the response's [[wikipedia:density function|density function]]. However, in some cases it makes sense to try to match the [[wikipedia:Domain of a function|domain]] of the link function to the [[wikipedia:range of a function|range]] of the distribution function's mean, or use a non-canonical link function for algorithmic purposes, for example [[wikipedia:Probit model#Gibbs sampling|Bayesian probit regression]]. | |||
When using a distribution function with a canonical parameter <math>\theta</math>, the canonical link function is the function that expresses <math>\theta</math> in terms of <math>\mu</math>, i.e. <math>\theta = b(\mu)</math>. For the most common distributions, the mean <math>\mu</math> is one of the parameters in the standard form of the distribution's [[wikipedia:density function|density function]], and then <math>b(\mu)</math> is the function as defined above that maps the density function into its canonical form. When using the canonical link function, <math>b(\mu) = \theta = \mathbf{X}\boldsymbol{\beta}</math>, which allows <math>\mathbf{X}^{\rm T} \mathbf{Y}</math> to be a [[wikipedia:sufficiency (statistics)|sufficient statistic]] for <math>\boldsymbol{\beta}</math>. | |||
Following is a table of several exponential-family distributions in common use and the data they are typically used for, along with the canonical link functions and their inverses (sometimes referred to as the mean function, as done here). | |||
{| class="table table-bordered" style="background:white;" | |||
|+ Common distributions with typical uses and canonical link functions | |||
! Distribution !! Support of distribution !! Typical uses !! Link name !! Link function, <math>\mathbf{X}\boldsymbol{\beta}=g(\mu)\,\!</math> !! Mean function | |||
|- | |||
| [[wikipedia:normal distribution|Normal]] | |||
| real: <math>(-\infty,+\infty)</math> || Linear-response data || Identity | |||
| <math>\mathbf{X}\boldsymbol{\beta}=\mu\,\!</math> || <math>\mu=\mathbf{X}\boldsymbol{\beta}\,\!</math> | |||
|- | |||
| [[wikipedia:exponential distribution|Exponential]] | |||
| rowspan="2" | real: <math>(0,+\infty)</math> || rowspan="2" | Exponential-response data, scale parameters | |||
| rowspan="2" | [[wikipedia:Multiplicative inverse|Negative inverse]] | |||
| rowspan="2" | <math>\mathbf{X}\boldsymbol{\beta}=-\mu^{-1}\,\!</math> | |||
| rowspan="2" | <math>\mu=-(\mathbf{X}\boldsymbol{\beta})^{-1}\,\!</math> | |||
|- | |||
| [[wikipedia:gamma distribution|Gamma]] | |||
|- | |||
| [[wikipedia:Inverse Gaussian distribution|Inverse <br>Gaussian]] | |||
| real: <math>(0, +\infty)</math> || || Inverse <br>squared || <math>\mathbf{X}\boldsymbol{\beta}=\mu^{-2}\,\!</math> || <math>\mu=(\mathbf{X}\boldsymbol{\beta})^{-1/2}\,\!</math> | |||
|- | |||
| [[wikipedia:Poisson distribution|Poisson]] | |||
| integer: <math>0,1,2,\ldots</math> || count of occurrences in fixed amount of time/space || [[wikipedia:Natural logarithm|Log]] || <math>\mathbf{X}\boldsymbol{\beta} = \ln(\mu) \,\!</math> || <math>\mu=\exp (\mathbf{X}\boldsymbol{\beta}) \,\!</math> | |||
|} | |||
===GLMs in Stochastic Reserving=== | |||
The generalized linear models applicable to a triangle of random variables <math>Y_{i,j}</math> are usually restricted to GLMs satisfying the following properties: | |||
*<math>\operatorname{E}[Y_{i,j}] = \mu_{i,j}</math> | |||
*<math>\operatorname{Var}[Y_{i,j}] = \phi \mu_i^p</math>, where <math>p \geq 0</math> | |||
*<math>\eta_{i,j} = g(\mu_{i,j}) = \mathbf{X}_{i,j}\boldsymbol{\beta}</math> | |||
The following table lists three families of distributions satisfying the properties above: | |||
{| class="table table-bordered" style="background:white;" | |||
! Distribution !! Support of distribution !! Link function, <math>\mathbf{X}\boldsymbol{\beta}=g(\mu)\,\!</math> !! Mean function !! <math>p</math> !! <math>\phi</math> | |||
|- | |||
| [[wikipedia:normal distribution|Normal]] | |||
| real: <math>(-\infty,+\infty)</math> | |||
| <math>\mathbf{X}\boldsymbol{\beta}=\mu\,\!</math> || <math>\mu=\mathbf{X}\boldsymbol{\beta}\,\!</math> || 0 || <math>\sigma^2</math> | |||
|- | |||
| [[wikipedia:exponential distribution|Exponential]] | |||
| rowspan="2" | real: <math>(0,+\infty)</math> || rowspan="2" | <math>\mathbf{X}\boldsymbol{\beta}=-\mu^{-1}\,\!</math> | |||
| rowspan="2" | <math>\mu=-(\mathbf{X}\boldsymbol{\beta})^{-1}\,\!</math> || rowspan="2" | 2 || rowspan="2" | <math>\phi </math> | |||
|- | |||
| [[wikipedia:gamma distribution|Gamma]] | |||
|- | |||
| [[wikipedia:Poisson distribution|Poisson]] | |||
| integer: <math>0,1,2,\ldots</math> || <math>\mathbf{X}\boldsymbol{\beta} = \ln(\mu) \,\!</math> || <math>\mu=\exp (\mathbf{X}\boldsymbol{\beta}) \,\!</math> || 1 || 1 | |||
|} | |||
The [[#Overdispersed_Poisson_model | overdispersed Poisson model]], introduced above, is a special case with log-link function <math display="block">\mathbf{X}_{i,j}\boldsymbol{\beta} = \log(\gamma_j) +c + \alpha_i + \beta_j = \log(\mu_{i,j})</math>, <math>p = 1 </math>, and a free dispersion parameter <math>\phi > 0 </math>. To prevent overfitting, we typically set <math>\alpha _0 </math> and <math>\beta_0 </math> to zero. | |||
===Estimation of Model Parameters=== | |||
Suppose we have a random sample <math>y_1,\ldots,y_n</math> where <math>y_i</math> is sampled from a distribution with density function | |||
<math display = "block"> | |||
f(y; \, \theta_i, \phi) = \exp(\frac{y\theta_i - b(\theta_i)}{(\phi/p_i)} + c(y,\phi)). | |||
</math> | |||
The log-likelihood function for the random sample equals | |||
<math display = "block">\begin{equation}\label{glm-log-lik} l = \sum_{i=1}^n \frac{y_i\theta_i - b(\theta_i)}{(\phi/p_i)} + c(y_i,\phi).\end{equation}</math> | |||
If we assume a ''canonical'' link function of the form <math>\theta_i = g(\mu_i) </math> where <math>g</math> denotes the link function, then the log-likelihood function depends on the unknown parameters <math>\boldsymbol{\beta}</math>. To estimate these unknown parameters, we use the maximum likelihood estimator. The [[wikipedia:maximum likelihood|maximum likelihood]] estimates can be found using an [[wikipedia:iteratively reweighted least squares|iteratively reweighted least squares]] algorithm or a [[wikipedia:Newton's method in optimization|Newton's method]] with updates of the form: | |||
<math display = "block"> \boldsymbol\beta^{(t+1)} = \boldsymbol\beta^{(t)} + \mathcal{J}^{-1}(\boldsymbol\beta^{(t)}) u(\boldsymbol\beta^{(t)}), </math> | |||
where <math>\mathcal{J}(\boldsymbol\beta^{(t)})</math> is the [[wikipedia:observed information|observed information matrix]] (the negative of the [[wikipedia:Hessian matrix|Hessian matrix]]) and <math>u(\boldsymbol\beta^{(t)})</math> is the [[wikipedia:score (statistics)|score function]]; or a [[wikipedia:Scoring algorithm|Fisher's scoring]] method: | |||
<math display="block"> \boldsymbol\beta^{(t+1)} = \boldsymbol\beta^{(t)} + \mathcal{I}^{-1}(\boldsymbol\beta^{(t)}) u(\boldsymbol\beta^{(t)}), </math> | |||
where <math>\mathcal{I}(\boldsymbol\beta^{(t)})</math> is the [[wikipedia:Fisher information|Fisher information]] matrix. When the canonical link function is in effect, the following algorithm is equivalent to the Fisher scoring algorithm given above: | |||
<proc label="MLE estimates for GLM: iterative least squares method"> | |||
We iterate the following algorithm: | |||
#Create the sequence <math display = "block">z_i = \hat{\eta}_i + (y_i - \hat{\mu}_i)\frac{d\eta_i}{d\mu_i}</math> where <math>\hat{\eta}_i = g(\hat{\mu}_i) </math> and <math>\hat{\mu_i}</math> is the current best estimate for <math>\mu_i.</math> | |||
#Compute the weights <math display = "block">w_i = \frac{p_i}{b''(\theta_i) \left(\frac{d\eta_i}{d\mu_i}\right)^2}.</math> | |||
#Estimate <math>\boldsymbol{\beta}</math> using weighted least-squares regression where <math>z_i</math> is the dependent variable, <math>\mathbf{x}_i</math> are the predictor variables, and <math>w_i</math> are the weights: <math display = "block">\hat{\boldsymbol{\beta}} = (X^T W X)^{-1}X^TWz, \, W_{ii} = w_i. </math> | |||
</proc> | |||
For the [[#Overdispersed_Poisson_model | overdispersed Poisson model]] we do not assume anything about the distribution of the random sample; consequently, we can't obtain maximum likelihood estimates. Instead we obtain [[wikipedia:Quasi-maximum_likelihood_estimate| quasi-maximum likelihood]] estimates by maximizing the likelihood function \ref{glm-log-lik} associated with the log-link function <math>\eta_i = g(\mu_i) = \log(\mu_i)</math>. In this special case, we have <math>b''(\theta_i) = \mu_i</math> and <math>\frac{d\eta_i}{d\mu_i} = 1/\mu_i</math>. | |||
<proc label="QMLE estimates for overdispersed Poisson model"> | |||
We iterate the following algorithm: | |||
#Create the sequence <math display = "block">z_i = \log(\hat{\mu}_i) + \frac{y_i - \hat{\mu}_i}{\hat{\mu}_i}</math> where <math>\hat{\mu_i}</math> is the current best estimate for <math>\mu_i.</math> | |||
#Estimate <math>\boldsymbol{\beta}</math> using weighted least-squares regression where <math>z_i</math> is the dependent variable and <math>\mathbf{x}_i</math> are the predictor variables: <math display = "block">\hat{\boldsymbol{\beta}} = (X^T W X)^{-1}X^TWz, \, W_{ii} = \hat{\mu}_i. </math> | |||
</proc> | |||
==References== | |||
{{reflist}} | |||
==Wikipedia References== | |||
*{{cite web |url = https://en.wikipedia.org/w/index.php?title=Generalized_linear_model&oldid=1118341387 | title = Generalized linear model | author = Wikipedia contributors | website= Wikipedia | publisher= Wikipedia |access-date = 18 February 2023 }} |
Revision as of 23:11, 18 February 2023
Mack-Method
The Mack chain ladder method is a statistical method to estimate developmental factors in the chain ladder method. The method assumes the following:
- Distinct rows of the array/matrix [math]C_{i,j}[/math] are independent.
- [math]\operatorname{E}[C_{i,j+1} | C_{i,0},\ldots,C_{i,j}] = f_j C_{i,j}[/math] with [math]f_j[/math] a constant.
- [math]\operatorname{Var}[C_{i,j+1} | C_{i,0},\ldots,C_{i,j}] = \sigma_{j}^2 C_{i,j}[/math] with [math]\sigma_j[/math] a constant.
The goal of the Mack-method is to estimate the factors [math]f_j[/math] using the observable [math]C_{i,j}[/math]. The estimators, denoted [math]\hat{f}_j[/math], then become selected age-to-age factors. The estimators are defined as follows:
They have the following desirable properties:
- [math]\hat{f_j}[/math] is an unbiased estimator for [math]f_j[/math]: [math]\operatorname{E}[\hat{f_j}] = f_j[/math].
- The estimator [math]\hat{f_j}[/math] is a minimum variance estimator in the following sense:
with [math]A_j = \cup_{i=0}^{I-j}\{C_{i,0},\ldots,C_{i,j}\} [/math] the claims information contained in the first [math]j[/math] periods.
<proofs page="guide_proofs:09c903cda7" section="mack-minvar" label="Mack-Method Estimator" />
Bühlmann-Straub Credibility Model
Recalling the Bornhuetter-Ferguson method, the expected ultimate claim [math]\mu_i[/math] for accident year [math]i[/math] can be unknown. In the Bühlmann-Straub Credibility model, we estimate [math]\mu_i[/math] using Bühlmann-Straub credibility estimates applied to the developmental triangle [math]C_{i,j}[/math]. To implement this method, we recall the Bühlmann-Straub credibility model assumptions when applied to the incremental claim triangle [math]X_{i,j} = C_{i,j+1}-C_{i,j}[/math]:
- [math]\operatorname{E}[X_{i,j}\,|\,\Theta_i] = \gamma_j\mu(\Theta_i)[/math]
- [math]\operatorname{Var}[X_{i,j}\,|\,\Theta_i] = \gamma_j\sigma^2(\Theta_i)[/math]
- [math]\Theta_i[/math] are independent and identically distributed
- Conditional on [math]\Theta_i[/math], [math]X_{i,j}[/math] is independent and identically distributed
- [math]\sum_{j}\gamma_j = 1[/math] for all [math]i[/math]
The [math]\gamma_j[/math] can be interpreted as relative exposure levels that depend on the developmental year [math]j[/math]. In order to implement this method, we need to specify the relative exposure levels in advance or estimate them using, say, the [[guide:|the standard estimation techniques]].
Notice that because the [math]\Theta_i[/math] are assumed to be identically distributed, the unconditional expected ultimate claim amount is the same from accident year to accident year. The Bühlmann-Straub Credibility model generally allows for the exposure levels to depend on both indices [math]i,j[/math]. For stochastic reserving, we typically assume that the relative exposure levels [math]\gamma_j[/math] are the same from accident year to accident year, but the base exposure level can vary from accident year to accident year. A typical case is earned premium for accident year. If [math]P_i[/math] denotes the earned premium for accident year [math]i[/math] then we would simply scale the data [math]C_{i,j}/P_i[/math] to get claims per exposure unit. The estimate for the ultimate claim for accident year [math]i[/math] is then simply equal to [math]P_i[/math] multiplied by the estimate for the ultimate claim for accident year [math]i[/math] obtained with the scaled data.
- Estimate the developmental factors [math]f_j[/math] using the volume weighted method (Mack method)
- Set [math]\hat{\gamma}_j = \hat{\beta}_j - \hat{\beta}_{j-1}[/math] for [math]j\gt0[/math] and [math]\hat{\gamma}_0 = \hat{\beta}_0[/math].
- The estimate for the ultimate claims for accident year [math]i[/math] equals [[math]]\hat{C}^{\textrm{BS2}}_{i,J} = \hat{\beta}_{I-i}\hat{C}_{i,J} + (1-\hat{\beta}_{I-i})\hat{C}^{\textrm{BS}}_{i,J}[[/math]]where [math]\hat{C}_{i,J} [/math] is the estimate for ultimate claims for accident year [math]i[/math] based on the developmental factors [math]\hat{f}_j[/math] and [math]\hat{C}^{\textrm{BS}}_{i,J}[/math] is the Bühlmann-Straub Credibility estimate for [math]\mu_i[/math].
- The Bühlmann-Straub Credibility estimate for [math]\mu_i[/math] equals [[math]]\hat{C}^{\textrm{BS}}_{i,J} = Z_i\hat{C}_{i,J} + (1-Z_i)\hat{\mu}[[/math]]where[[math]]Z_i = \frac{\hat{\beta}_{I-i}}{\hat{\beta}_{I-i} + \hat{v}/\hat{a}}.[[/math]]Here [math]\hat{v}[/math] is an estimate for [math]\operatorname{E}[\sigma^2(\Theta_i)][/math], and [math]\hat{a}[/math] is an estimate for [math]\operatorname{Var}[\mu(\Theta_i)][/math].
- Compute the estimates [math]\hat{\mu}[/math], [math]\hat{a}[/math], and [math]\hat{v}[/math] using the formulas below: [[math]]\begin{align*}m_i = \hat{\beta}_{I-i}, & \,\, s_i^2 = \frac{1}{I-i}\sum_{j=0}^{I-i}\hat{\gamma}_j \left(\frac{X_{i,j}}{\hat{\gamma}_j} - \hat{C}_{i,J}\right)^2 \\ m = \sum_i m_i, &\,\, \overline{C} = \frac{\sum_{i=0}^I C_{i,I-i}}{m} \\ \hat{v} = \frac{1}{I}\sum_{i=0}^{I-1} s_i^2, &\, \, \hat{a} = \frac{\sum_{i=0}^Im_i(\hat{C}_{i,J}-\overline{C})^2 - I\hat{v}}{m - \frac{1}{m}\sum_{i=0}^I m_i^2}.\end{align*}[[/math]]
Poisson Model
The Poisson model assumes that the incremental claim counts [math]Y_{i,j+1} = N_{i,j+1}-N_{i,j} [/math] have the following properties:
- [math]Y_{i,j}[/math], [math] j = 0,\ldots, J[/math], are independent random variables for all [math]i,j[/math].
- [math]Y_{i,j}[/math] is Poisson distributed with mean [math]\gamma_j \mu_i [/math] where [math]\sum_j \gamma_j = 1 [/math].
Under this model, it is a simple matter to check that the sequence [math]N_{i,j}[/math] satisfies the classical assumptions for the Bornhuetter-Ferguson method:
- [math]\mu_i = \operatorname{E}[N_{i,J}] [/math] for all [math] 0 \leq i \leq I [/math]
- [math]N_{i,J} - N_{i,j} [/math] is independent of [math]N_{i,j}[/math] for any [math]0 \leq j \lt J [/math]
- [math]\gamma_j = \mu_i / \operatorname{E}[N_{i,j}] [/math]
- [math]\operatorname{Var}[N_{i,j}F_j] = F_j \operatorname{Var}[N_{i,J}][/math] where [math]F_j = \sum_{k=1}^j \gamma_k [/math].
Following the Bornhuetter-Ferguson method, the projection for [math]N_{i,J}[/math] equals
When the parameters [math]\mu_i[/math] and [math]\gamma_j[/math] are unknown, they can be estimated using maximum likelihood estimation.
Overdispersed Poisson Model (ODP)
Unlike the Poisson model, the overdispersed Poisson model doesn't assume anything about the probability distribution of the incremental claim count developmental triangle [math]Y_{i,j}[/math]. The overdispersed model is a special case of a generalized linear model:
- [math]Y_{i,j}[/math], [math] j = 0,\ldots, J[/math], are independent random variables, for all [math] i, j [/math]
- [math]\operatorname{E}[Y_{i,j}] = \gamma_je^{c+ \alpha_i + \beta_j}[/math], where [math] c\gt 0, \beta_1 = 0, \alpha_1 = 0[/math] and [math]\sum_{j}\gamma_j = 1[/math].
- [math]\operatorname{Var}[Y_{i,j}] = \phi \operatorname{E}[Y_{i,j}][/math], where [math]\phi \gt 0[/math] is the dispersion parameter.
Generalized Linear Models
In a generalized linear model (GLM), each outcome [math]Y[/math] of the dependent variables is assumed to be generated from a particular distribution in an exponential family, a large class of probability distributions that includes the normal, binomial, Poisson and gamma distributions, among others. The mean, [math]\mu[/math], of the distribution depends on the independent variables, [math]\mathbf{X}[/math], through:
where [math]\operatorname{E}[Y|\mathbf{X}][/math] is the expected value of [math]Y[/math] conditional on [math]\mathbf{X}[/math]; [math]\mathbf{X}\boldsymbol{\beta}[/math] is the linear predictor, a linear combination of unknown parameters [math]\boldsymbol{\beta}[/math]; [math]g[/math] is the link function. In this framework, the variance is typically a function, [math]V[/math], of the mean:
The unknown parameters, [math]\boldsymbol{\beta}[/math], are typically estimated with maximum likelihood, maximum quasi-likelihood, or Bayesian techniques.
Model components
The GLM consists of three elements:
- 1. A particular distribution for modeling [math] Y [/math] from among those which are considered exponential families of probability distributions,
- 2. A linear predictor [math]\eta = \mathbf{X} \boldsymbol{\beta}[/math], and
- 3. A link function [math]g[/math] such that [math]\operatorname{E}[Y \mid \mathbf{X}] = \mu = g^{-1}(\eta)[/math].
Probability distribution
An overdispersed exponential family of distributions is a generalization of an exponential family and the exponential dispersion model of distributions and includes those families of probability distributions, parameterized by [math]\theta[/math] and [math]\phi[/math], whose density functions [math]f[/math]can be expressed in the form
The dispersion parameter, [math]\phi[/math], typically is known and is usually related to the variance of the distribution. The functions [math]h(y,\phi)[/math], [math]b(\theta)[/math], [math]T(y)[/math], [math]A(\theta)[/math], and [math]d(\phi)[/math] are known. Many common distributions are in this family, including the normal, exponential, gamma, Poisson, Bernoulli, and (for fixed number of trials) binomial, multinomial, and negative binomial.
If [math]b(\theta)[/math] is the identity function, then the distribution is said to be in canonical form (or natural form). Note that any distribution can be converted to canonical form by rewriting [math]\boldsymbol\theta[/math] as [math]\theta'[/math] and then applying the transformation [math]\theta = b(\theta')[/math]. It is always possible to convert [math]A(\theta)[/math] in terms of the new parametrization, even if [math]b(\theta')[/math] is not a one-to-one function; see comments in the page on exponential families. If, in addition, [math]T(y)[/math] is the identity and [math]\phi[/math] is known, then [math]\theta[/math] is called the canonical parameter (or natural parameter) and is related to the mean through
Under this scenario, the variance of the distribution can be shown to be[1]
Linear predictor
The linear predictor is the quantity which incorporates the information about the independent variables into the model. The symbol η (Greek "eta") denotes a linear predictor. It is related to the expected value of the data through the link function.
[math]\eta[/math] is expressed as linear combinations (thus, "linear") of unknown parameters [math]\boldsymbol{\beta}[/math]. The coefficients of the linear combination are represented as the matrix of independent variables [math]\boldsymbol{X}[/math]. [math]\eta[/math] can thus be expressed as [math] \eta = \mathbf{X}\boldsymbol{\beta}.\,[/math]
Link function
The link function provides the relationship between the linear predictor and the mean of the distribution function. There are many commonly used link functions, and their choice is informed by several considerations. There is always a well-defined canonical link function which is derived from the exponential of the response's density function. However, in some cases it makes sense to try to match the domain of the link function to the range of the distribution function's mean, or use a non-canonical link function for algorithmic purposes, for example Bayesian probit regression.
When using a distribution function with a canonical parameter [math]\theta[/math], the canonical link function is the function that expresses [math]\theta[/math] in terms of [math]\mu[/math], i.e. [math]\theta = b(\mu)[/math]. For the most common distributions, the mean [math]\mu[/math] is one of the parameters in the standard form of the distribution's density function, and then [math]b(\mu)[/math] is the function as defined above that maps the density function into its canonical form. When using the canonical link function, [math]b(\mu) = \theta = \mathbf{X}\boldsymbol{\beta}[/math], which allows [math]\mathbf{X}^{\rm T} \mathbf{Y}[/math] to be a sufficient statistic for [math]\boldsymbol{\beta}[/math].
Following is a table of several exponential-family distributions in common use and the data they are typically used for, along with the canonical link functions and their inverses (sometimes referred to as the mean function, as done here).
Distribution | Support of distribution | Typical uses | Link name | Link function, [math]\mathbf{X}\boldsymbol{\beta}=g(\mu)\,\![/math] | Mean function |
---|---|---|---|---|---|
Normal | real: [math](-\infty,+\infty)[/math] | Linear-response data | Identity | [math]\mathbf{X}\boldsymbol{\beta}=\mu\,\![/math] | [math]\mu=\mathbf{X}\boldsymbol{\beta}\,\![/math] |
Exponential | real: [math](0,+\infty)[/math] | Exponential-response data, scale parameters | Negative inverse | [math]\mathbf{X}\boldsymbol{\beta}=-\mu^{-1}\,\![/math] | [math]\mu=-(\mathbf{X}\boldsymbol{\beta})^{-1}\,\![/math] |
Gamma | |||||
Inverse Gaussian |
real: [math](0, +\infty)[/math] | Inverse squared |
[math]\mathbf{X}\boldsymbol{\beta}=\mu^{-2}\,\![/math] | [math]\mu=(\mathbf{X}\boldsymbol{\beta})^{-1/2}\,\![/math] | |
Poisson | integer: [math]0,1,2,\ldots[/math] | count of occurrences in fixed amount of time/space | Log | [math]\mathbf{X}\boldsymbol{\beta} = \ln(\mu) \,\![/math] | [math]\mu=\exp (\mathbf{X}\boldsymbol{\beta}) \,\![/math] |
GLMs in Stochastic Reserving
The generalized linear models applicable to a triangle of random variables [math]Y_{i,j}[/math] are usually restricted to GLMs satisfying the following properties:
- [math]\operatorname{E}[Y_{i,j}] = \mu_{i,j}[/math]
- [math]\operatorname{Var}[Y_{i,j}] = \phi \mu_i^p[/math], where [math]p \geq 0[/math]
- [math]\eta_{i,j} = g(\mu_{i,j}) = \mathbf{X}_{i,j}\boldsymbol{\beta}[/math]
The following table lists three families of distributions satisfying the properties above:
Distribution | Support of distribution | Link function, [math]\mathbf{X}\boldsymbol{\beta}=g(\mu)\,\![/math] | Mean function | [math]p[/math] | [math]\phi[/math] |
---|---|---|---|---|---|
Normal | real: [math](-\infty,+\infty)[/math] | [math]\mathbf{X}\boldsymbol{\beta}=\mu\,\![/math] | [math]\mu=\mathbf{X}\boldsymbol{\beta}\,\![/math] | 0 | [math]\sigma^2[/math] |
Exponential | real: [math](0,+\infty)[/math] | [math]\mathbf{X}\boldsymbol{\beta}=-\mu^{-1}\,\![/math] | [math]\mu=-(\mathbf{X}\boldsymbol{\beta})^{-1}\,\![/math] | 2 | [math]\phi [/math] |
Gamma | |||||
Poisson | integer: [math]0,1,2,\ldots[/math] | [math]\mathbf{X}\boldsymbol{\beta} = \ln(\mu) \,\![/math] | [math]\mu=\exp (\mathbf{X}\boldsymbol{\beta}) \,\![/math] | 1 | 1 |
The overdispersed Poisson model, introduced above, is a special case with log-link function
, [math]p = 1 [/math], and a free dispersion parameter [math]\phi \gt 0 [/math]. To prevent overfitting, we typically set [math]\alpha _0 [/math] and [math]\beta_0 [/math] to zero.
Estimation of Model Parameters
Suppose we have a random sample [math]y_1,\ldots,y_n[/math] where [math]y_i[/math] is sampled from a distribution with density function
The log-likelihood function for the random sample equals
If we assume a canonical link function of the form [math]\theta_i = g(\mu_i) [/math] where [math]g[/math] denotes the link function, then the log-likelihood function depends on the unknown parameters [math]\boldsymbol{\beta}[/math]. To estimate these unknown parameters, we use the maximum likelihood estimator. The maximum likelihood estimates can be found using an iteratively reweighted least squares algorithm or a Newton's method with updates of the form:
where [math]\mathcal{J}(\boldsymbol\beta^{(t)})[/math] is the observed information matrix (the negative of the Hessian matrix) and [math]u(\boldsymbol\beta^{(t)})[/math] is the score function; or a Fisher's scoring method:
where [math]\mathcal{I}(\boldsymbol\beta^{(t)})[/math] is the Fisher information matrix. When the canonical link function is in effect, the following algorithm is equivalent to the Fisher scoring algorithm given above:
We iterate the following algorithm:
- Create the sequence [[math]]z_i = \hat{\eta}_i + (y_i - \hat{\mu}_i)\frac{d\eta_i}{d\mu_i}[[/math]]where [math]\hat{\eta}_i = g(\hat{\mu}_i) [/math] and [math]\hat{\mu_i}[/math] is the current best estimate for [math]\mu_i.[/math]
- Compute the weights [[math]]w_i = \frac{p_i}{b''(\theta_i) \left(\frac{d\eta_i}{d\mu_i}\right)^2}.[[/math]]
- Estimate [math]\boldsymbol{\beta}[/math] using weighted least-squares regression where [math]z_i[/math] is the dependent variable, [math]\mathbf{x}_i[/math] are the predictor variables, and [math]w_i[/math] are the weights: [[math]]\hat{\boldsymbol{\beta}} = (X^T W X)^{-1}X^TWz, \, W_{ii} = w_i. [[/math]]
For the overdispersed Poisson model we do not assume anything about the distribution of the random sample; consequently, we can't obtain maximum likelihood estimates. Instead we obtain quasi-maximum likelihood estimates by maximizing the likelihood function \ref{glm-log-lik} associated with the log-link function [math]\eta_i = g(\mu_i) = \log(\mu_i)[/math]. In this special case, we have [math]b''(\theta_i) = \mu_i[/math] and [math]\frac{d\eta_i}{d\mu_i} = 1/\mu_i[/math].
We iterate the following algorithm:
- Create the sequence [[math]]z_i = \log(\hat{\mu}_i) + \frac{y_i - \hat{\mu}_i}{\hat{\mu}_i}[[/math]]where [math]\hat{\mu_i}[/math] is the current best estimate for [math]\mu_i.[/math]
- Estimate [math]\boldsymbol{\beta}[/math] using weighted least-squares regression where [math]z_i[/math] is the dependent variable and [math]\mathbf{x}_i[/math] are the predictor variables: [[math]]\hat{\boldsymbol{\beta}} = (X^T W X)^{-1}X^TWz, \, W_{ii} = \hat{\mu}_i. [[/math]]
References
- McCullagh & Nelder 1989 , Chapter 2.
Wikipedia References
- Wikipedia contributors. "Generalized linear model". Wikipedia. Wikipedia. Retrieved 18 February 2023.