guide:915b4a5603: Difference between revisions
No edit summary |
mNo edit summary |
||
Line 1: | Line 1: | ||
'''Survival analysis''' is a branch of [[wikipedia:statistics|statistics]] for analyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems. This topic is called '''reliability theory''' or '''reliability analysis''' in [[wikipedia:engineering|engineering]], '''duration analysis''' or '''duration modelling''' in [[wikipedia:economics|economics]], and '''event history analysis''' in [[wikipedia:sociology|sociology]]. Survival analysis attempts to answer certain questions, such as what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of [[wikipedia:survival|survival]]? | |||
To answer such questions, it is necessary to define "lifetime". In the case of biological survival, [[wikipedia:death|death]] is unambiguous, but for mechanical reliability, [[wikipedia:failure|failure]] may not be well-defined, for there may well be mechanical systems in which failure is partial, a matter of degree, or not otherwise localized in [[wikipedia:time|time]]. Even in biological problems, some events (for example, [[wikipedia:myocardial infarction|heart attack]] or other organ failure) may have the same ambiguity. The [[wikipedia:theory|theory]] outlined below assumes well-defined events at specific times; other cases may be better treated by models which explicitly account for ambiguous events. | |||
More generally, survival analysis involves the modelling of time to event data; in this context, death or failure is considered an "event" in the survival analysis literature – traditionally only a single event occurs for each subject, after which the organism or mechanism is dead or broken. ''Recurring event'' or ''repeated event'' models relax that assumption. The study of recurring events is relevant in [[wikipedia:systems reliability|systems reliability]], and in many areas of social sciences and medical research. | |||
==Survival Function== | |||
The '''survival function''' is a [[wikipedia:function (mathematics)|function]] that gives the [[wikipedia:probability|probability]] that a patient, device, or other object of interest will [[wikipedia:survival analysis|survive]] past a certain time.<ref name="KleinbaumKlein2012">{{Citation | |||
|last1=Kleinbaum | |||
|first1=David G. | |||
|last2=Klein | |||
|first2= Mitchel | |||
|title= Survival analysis: A Self-learning text | |||
|edition=Third | |||
|year=2012 | |||
|publisher= Springer | |||
|isbn= 978-1441966452 | |||
}} | |||
</ref> | |||
The survival function is also known as the '''survivor function'''<ref name="TablemanKim2003">{{Citation | |||
|last1= Tableman | |||
|first1= Mara | |||
|last2= Kim | |||
|first2= Jong Sung | |||
|title= Survival Analysis Using S | |||
|edition=First | |||
|year=2003 | |||
|publisher= Chapman and Hall/CRC | |||
|isbn= 978-1584884088 | |||
}} | |||
</ref> or '''reliability function'''.<ref name="Ebeling2010">{{Citation | |||
|last1= Ebeling | |||
|first1= Charles | |||
|title= An Introduction to Reliability and Maintainability Engineering | |||
|edition=Second | |||
|year=2010 | |||
|publisher= Waveland Press | |||
|isbn= 978-1577666257 | |||
}} | |||
</ref> | |||
The term ''reliability function'' is common in [[wikipedia:engineering|engineering]] while the term ''survival function'' is used in a broader range of applications, including human mortality. The survival function is the [[wikipedia:complementary cumulative distribution function|complementary cumulative distribution function]] of the lifetime. Sometimes complementary cumulative distribution functions are called survival functions in general. | |||
==Definition== | |||
Let the lifetime <math>T</math> be a continuous random variable with [[wikipedia:cumulative hazard function|cumulative hazard function]] <math>F(t)</math> and [[wikipedia:hazard function|hazard function]] <math>f(t)</math> on the interval <math>[0,\infty)</math>. Its ''survival function'' or ''reliability function'' is: | |||
<math display = "block">S(t) = P(\{T > t\}) = \int_t^{\infty} f(u)\,du = 1-F(t).</math> | |||
== Examples of survival functions == | |||
The graphs below show examples of hypothetical survival functions. The x-axis is time. The y-axis is the proportion of subjects surviving. The graphs show the probability that a subject will survive beyond time t. | |||
[[File:Four survival functions.svg|thumb|600px|Four survival functions]] | |||
For example, for survival function 1, the probability of surviving longer than t = 2 months is 0.37. That is, 37% of subjects survive more than 2 months. | |||
[[File:Survival function 1.svg|thumb|400px|Survival function 1]] | |||
For survival function 2, the probability of surviving longer than t = 2 months is 0.97. That is, 97% of subjects survive more than 2 months. | |||
[[File:Survival function 2.svg|thumb|400px|Survival function 2]] | |||
Median survival may be determined from the survival function: The median survival is the point where the survival function intersects the value 0.5.<ref>Machin, D., Cheung, Y. B., Parmar, M. (2006). Survival Analysis: A Practical Approach. Deutschland: Wiley. Page 36 and following [https://books.google.com/books?id=z6_Hr9NGjr0C&pg=PA36 Google Books]</ref> For example, for survival function 2, 50% of the subjects survive 3.72 months. Median survival is thus 3.72 months. | |||
[[File:Survival function 2 median survival.svg|thumb|400px|Survival function with indicated median survival]] | |||
In some cases, median survival cannot be determined from the graph. For example, for survival function 4, more than 50% of the subjects survive longer than the observation period of 10 months. | |||
[[File:Median survival greater than 10 months.svg|thumb|400px|Median survival greater than 10 months]] | |||
The survival function is one of several ways to describe and display survival data. Another useful way to display data is a graph showing the distribution of survival times of subjects. Olkin,<ref name="OlkinGleserDerman1994">{{Citation | |||
|last1=Olkin | |||
|first1=Ingram | |||
|last2=Gleser | |||
|first2= Leon | |||
|last3=Derman | |||
|first3= Cyrus | |||
|title= Probability Models and Applications | |||
|edition=Second | |||
|year=1994 | |||
|publisher= Macmillan | |||
|isbn= 0-02-389220-X | |||
}} | |||
</ref> page 426, gives the following example of survival data. The number of hours between successive failures of an air-conditioning system were recorded. The time between successive failures are 1, 3, 5, 7, 11, 11, 11, 12, 14, 14, 14, 16, 16, 20, 21, 23, 42, 47, 52, 62, 71, 71, 87, 90, 95, 120, 120, 225, 246, and 261 hours. The mean time between failures is 59.6. This mean value will be used shortly to fit a theoretical curve to the data. The figure below shows the distribution of the time between failures. The blue tick marks beneath the graph are the actual hours between successive failures. | |||
[[File:Distribution of AC failure times.svg|thumb|400px|Distribution of AC failure times]] | |||
The distribution of failure times is over-laid with a curve representing an exponential distribution. For this example, the [[wikipedia:exponential distribution|exponential distribution]] approximates the distribution of failure times. The exponential curve is a theoretical distribution fitted to the actual failure times. This particular exponential curve is specified by the parameter lambda, λ= 1/(mean time between failures) = 1/59.6 = 0.0168. The distribution of failure times is called the probability density function (pdf), if time can take any positive value. In equations, the pdf is specified as f(t). If time can only take discrete values (such as 1 day, 2 days, and so on), the distribution of failure times is called the [[wikipedia:probability mass function|probability mass function]] (pmf). Most survival analysis methods assume that time can take any positive value, and f(t) is the pdf. If the time between observed air conditioner failures is approximated using the exponential function, then the exponential curve gives the probability density function, f(t), for air conditioner failure times. | |||
Another useful way to display the survival data is a graph showing the cumulative failures up to each time point. These data may be displayed as either the cumulative number or the cumulative proportion of failures up to each time. The graph below shows the cumulative probability (or proportion) of failures at each time for the air conditioning system. The stairstep line in black shows the cumulative proportion of failures. For each step there is a blue tick at the bottom of the graph indicating an observed failure time. The smooth red line represents the exponential curve fitted to the observed data. | |||
[[File:CDF for AC failures.svg|400px|CDF for AC failures]] | |||
A graph of the cumulative probability of failures up to each time point is called the [[wikipedia:cumulative distribution function|cumulative distribution function]], or CDF. In survival analysis, the cumulative distribution function gives the probability that the survival time is less than or equal to a specific time, t. | |||
Let <math>T</math> be survival time, which is any positive number. A particular time is designated by the lower case letter <math>t</math>. The cumulative distribution function of <math>T</math> is the function | |||
<math display = "block">F(t) = \operatorname{P}(T\leq t),</math> | |||
where the right-hand side represents the [[wikipedia:probability|probability]] that the random variable <math>T</math> is less than or equal to <math>t</math>. If time can take on any positive value, then the cumulative distribution function <math>F(t)</math> is the integral of the probability density function <math>f(t)</math>. | |||
For the air conditioning example, the graph of the CDF below illustrates that the probability that the time to failure is less than or equal to 100 hours is 0.81, as estimated using the exponential curve fit to the data. | |||
[[File:AC Time to failure LT 100 hours.svg|400px|AC Time to failure LT 100 hours]] | |||
An alternative to graphing the probability that the failure time is ''less'' than or equal to 100 hours is to graph the probability that the failure time is ''greater'' than 100 hours. The probability that the failure time is greater than 100 hours must be 1 minus the probability that the failure time is less than or equal to 100 hours, because total probability must sum to 1. | |||
This gives | |||
<math display = "block"> | |||
P(\textrm{failure time} > 100 \, \textrm{hours}) = 1 - P(\textrm{failure time} < 100 \, \textrm{hours}) = 1 – 0.81 = 0.19. | |||
</math> | |||
This relationship generalizes to all failure times: | |||
<math display = "block"> | |||
P(T > t) = 1 - P(T < t) = 1 – \textrm{cumulative distribution function}. | |||
</math> | |||
This relationship is shown on the graphs below. The graph on the left is the cumulative distribution function, which is <math>P(T < t)</math>. The graph on the right is <math>P(T > t) = 1 - P(T < t)</math>. The graph on the right is the survival function, <math>S(t)</math>. The fact that the <math>S(t) = 1 – CDF</math> is the reason that another name for the survival function is the complementary cumulative distribution function. | |||
[[File:Survival function is 1 - CDF.svg|400px|Survival function is 1 - CDF]] | |||
==Force of Mortality== | |||
In [[wikipedia:actuarial science|actuarial science]], '''force of mortality''' represents the instantaneous [[wikipedia:Mortality rate|rate of mortality]] at a certain age measured on an annualized basis. It is identical in concept to [[wikipedia:failure rate|failure rate]], also called [[wikipedia:hazard function|hazard function]], in [[wikipedia:reliability theory|reliability theory]]. | |||
===Motivation and definition=== | |||
In a [[wikipedia:life table|life table]], we consider the probability of a person dying from age <math>x</math> to <math>x + 1</math>, called <math>q_x</math>. In the continuous case, we could also consider the [[wikipedia:conditional probability|conditional probability]] of a person who has attained age (<math>x</math>) dying between ages <math>x</math> and <math>x + \Delta x</math>, which is | |||
<math display = "block">P_{x}(\Delta x)=P(x < X < x+\Delta\;x\mid\;X > x)=\frac{F_X(x+\Delta\;x)-F_X(x)}{(1-F_X(x))}</math> | |||
where <math>F_X</math> is the [[wikipedia:cumulative distribution function|cumulative distribution function]] of the continuous age-at-death [[wikipedia:random variable|random variable]], <math>X</math>. As <math>\Delta x</math> tends to zero, so does this probability in the continuous case. The approximate force of mortality is this probability divided by <math>\Delta x</math>. If we let <math>\Delta x</math> tend to zero, we get the function for '''force of mortality''', denoted by <math>\mu(x)</math>: | |||
<math display = "block">\mu\,(x)= \lim_{\Delta x \rightarrow 0} \frac{F_X(x+\Delta\;x)-F_X(x)}{\Delta x (1-F_X(x))} = \frac{F'_X(x)}{1-F_X(x)}</math> | |||
Since <math>f_X(x) = F'_X(x)</math> is the probability density function of <math>X</math>, and <math>S(x)= 1 - F_X(x)</math> is the survival function, the force of mortality can also be expressed variously as: | |||
<math display = "block">\mu\,(x)=\frac{f_X(x)}{1-F_X(x)}=-\frac{S'(x)}{S(x)}=-{\frac{d}{dx}}\ln[S(x)].</math> | |||
To understand conceptually how the force of mortality operates within a population, consider that the ages, <math>x</math>, where the probability density function <math>f_X(x)</math>, there is no chance of dying. Thus the force of mortality at these ages is zero. The force of mortality <math>\mu(x)</math> uniquely defines a probability density function <math>f_X(x)</math>. | |||
The force of mortality <math>\mu(x)</math> can be interpreted as the ''conditional'' density of failure at age <math>x</math>, while <math>f(x)</math> is the ''unconditional'' density of failure at age <math>x</math>.<ref name=MQR>R. Cunningham, T. Herzog, R. London (2008). ''Models for Quantifying Risk, 3rd Edition'', Actex.</ref> The unconditional density of failure at age <math>x</math> is the product of the probability of survival to age <math>x</math>, and the conditional density of failure at age <math>x</math>, given survival to age <math>x</math>. | |||
This is expressed in symbols as | |||
<math display = "block">\,\mu(x)S(x) = f_X(x)</math> | |||
or equivalently | |||
<math display = "block">\mu(x) = \frac{f_X(x)}{S(x)}.</math> | |||
In many instances, it is also desirable to determine the survival probability function when the force of mortality is known. To do this, integrate the force of mortality over the interval <math>x</math> to <math>x + t</math> | |||
<math display = "block"> \int_{x}^{x+t} \mu(y) \, dy = \int_{x}^{x+t} -\frac{d}{dy} \ln[S(y)]\, dy </math>. | |||
By the [[wikipedia:fundamental theorem of calculus|fundamental theorem of calculus]], this is simply | |||
<math display = "block"> -\int_{x}^{x+t} \mu(y) \, dy = \ln[S(x + t)] - \ln[S(x)]. </math> | |||
Let us denote | |||
<math display = "block"> S_x(t) = \frac{S(x+t)}{S(x)}, </math> | |||
then taking the exponent to the base ''e'', the survival probability of an individual of age <math>x</math> in terms of the force of mortality is | |||
<math display = "block"> S_x(t) = \exp \left(-\int_x^{x+t}\mu(y)\, dy\, \right). </math> | |||
===Examples=== | |||
{| class="table table-bordered" | |||
! Type !! Force of mortality !! Survival function | |||
|- | |||
| Exponential || <math display = "block">\mu(y) = \lambda</math> || <math display = "block">S_x(t) = e^{-\int_x^{x+t} \lambda dy} = e^{-\lambda t}</math> | |||
|- | |||
| Gamma || <math display = "block">\mu(y) = \frac{y^{\alpha-1} e^{-y}}{\Gamma(\alpha) - \gamma(\alpha, y)}, </math> || <math display = "block">f(x) = \frac{x^{\alpha - 1} e^{-x}}{\Gamma(\alpha)}</math> | |||
|- | |||
| Weibull || <math display = "block"> \mu(y) = \alpha \lambda^\alpha y^{\alpha-1},</math> || <math display = "block">S_x(t) = e^{-\int_x^{x+t}\mu(y) dy} = A(x) e^{ - (\lambda (x+t))^\alpha }, </math> where <math>A(x) = e^{(\lambda x)^{\alpha}}.</math> | |||
|} | |||
==Gompertz–Makeham law of mortality== | |||
{{Probability distribution | |||
| name = Gompertz–Makeham | |||
| type = continuous | |||
| pdf_image = | |||
| cdf_image = | |||
| notation = | |||
| parameters = <math>\alpha \in \mathbb{R}^+</math><br/><math>\beta \in \mathbb{R}^+</math><br/> <math>\lambda \in \mathbb{R}^+</math> | |||
| support = <math>x \in \mathbb{R}^+</math> | |||
| pdf = <math>\left( \alpha e^{\beta x} + \lambda \right) \cdot \exp \left[ -\lambda x-\frac{\alpha}{\beta} \left( e^{\beta x} -1\right) \right]</math> | |||
| cdf = <math>1-\exp \left[-\lambda x-\frac{\alpha}{\beta} \left( e^{\beta x}-1\right) \right]</math> | |||
| mean = | |||
| median = | |||
| mode = | |||
| variance = | |||
| skewness = | |||
| kurtosis = | |||
| entropy = | |||
| mgf = | |||
| cf = | |||
| pgf = | |||
| fisher = | |||
}} | |||
The '''Gompertz–Makeham law''' states that the human death rate is the sum of an age-dependent component (the [[wikipedia:Gompertz function|Gompertz function]], named after [[wikipedia:Benjamin Gompertz|Benjamin Gompertz]]),<ref name="Gompertz1825">{{cite journal |last=Gompertz |first=B. |year=1825 |title=On the Nature of the Function Expressive of the Law of Human Mortality, and on a New Mode of Determining the Value of Life Contingencies |journal=[[wikipedia:Philosophical Transactions of the Royal Society|Philosophical Transactions of the Royal Society]] |volume=115 |pages=513–585 |url=http://visualiseur.bnf.fr/Visualiseur?Destination=Gallica&O=NUMM-55920 |doi=10.1098/rstl.1825.0026|jstor=107756|s2cid=145157003 }}</ref> which [[wikipedia:exponential growth|increases exponentially]] with age<ref name="Leonid">{{citation |last1=Gavrilov|first1=Leonid A.|last2=Gavrilova|first2=Natalia S.|year=1991 |title=The Biology of Life Span: A Quantitative Approach. |publisher=Harwood Academic Publisher |location=New York|isbn=3-7186-4983-7}}</ref> and an age-independent component (the Makeham term, named after [[wikipedia:William Makeham|William Makeham]]).<ref name="Makeham1860">{{cite journal|last=Makeham|first=W. M.|year=1860|title=On the Law of Mortality and the Construction of Annuity Tables|url=https://archive.org/details/jstor-41134925|journal=J. Inst. Actuaries and Assur. Mag.|volume=8|issue=6|pages=301–310|doi=10.1017/S204616580000126X|jstor=41134925}}</ref> In a protected environment where external causes of death are rare (laboratory conditions, low mortality countries, etc.), the age-independent mortality component is often negligible. In this case the formula simplifies to a Gompertz law of mortality. In 1825, Benjamin Gompertz proposed an exponential increase in death rates with age. | |||
The Gompertz–Makeham law of mortality describes the age dynamics of human mortality rather accurately in the age window from about 30 to 80 years of age. At more advanced ages, some studies have found that death rates increase more slowly – a phenomenon known as the [[wikipedia:late-life mortality deceleration|late-life mortality deceleration]]<ref name="Leonid" /> – but more recent studies disagree.<ref>{{cite journal|last1=Gavrilov|first1=Leonid A.|last2=Gavrilova|first2=Natalia S.|title=Mortality Measurement at Advanced Ages: A Study of the Social Security Administration Death Master File|journal=North American Actuarial Journal|date=2011|volume=15|issue=3|pages=432–447|url=http://longevity-science.org/pdf/Mortality-NAAJ-2011.pdf|doi=10.1080/10920277.2011.10597629|pmid=22308064|pmc=3269912}}</ref> | |||
[[Image:USGompertzCurve.svg|thumb|Estimated probability of a person dying at each age, for the U.S. in 2003 [https://www.cdc.gov/nchs/data/nvsr/nvsr54/nvsr54_14.pdf]. Mortality rates increase exponentially with age after age 30.]] | |||
The decline in the human [[wikipedia:mortality rate|mortality rate]] before the 1950s was mostly due to a decrease in the age-independent (Makeham) mortality component, while the age-dependent (Gompertz) mortality component was surprisingly stable.<ref name="Leonid" /><ref>{{cite journal |last1=Gavrilov |first1=L. A. |last2=Gavrilova |first2=N. S. |last3=Nosov |first3=V. N. |year=1983 |title=Human life span stopped increasing: Why? |journal=[[wikipedia:Gerontology (journal)|Gerontology]] |volume=29 |issue=3 |pages=176–180 |doi=10.1159/000213111 |pmid=6852544 }}</ref> Since the 1950s, a new mortality trend has started in the form of an unexpected decline in mortality rates at advanced ages and "rectangularization" of the survival curve.<ref name="Gavrilov1985">{{cite journal |last=Gavrilov |first=L. A. |author2=Nosov, V. N. |year=1985 |title=A new trend in human mortality decline: derectangularization of the survival curve [Abstract]|journal=Age |volume=8 |issue=3 |pages=93|doi=10.1007/BF02432075|s2cid=41318801 }}</ref><ref>{{cite journal |last1=Gavrilova |first1=N. S. |last2=Gavrilov |first2=L. A. |year=2011 |trans-title=Ageing and Longevity: Mortality Laws and Mortality Forecasts for Ageing Populations |language=cs |title=Stárnutí a dlouhovekost: Zákony a prognózy úmrtnosti pro stárnoucí populace |journal=Demografie |volume=53 |issue=2 |pages=109–128 |pmid=25242821 |pmc=4167024 }}</ref> | |||
The [[#Force_of_Mortality|hazard function]] for the Gompertz-Makeham distribution is most often characterised as <math>h(x)=\alpha e^{\beta x} + \lambda </math>. The empirical magnitude of the beta-parameter is about .085, implying a doubling of mortality every .69/.085 = 8 years (Denmark, 2006). | |||
The [[wikipedia:quantile function|quantile function]] can be expressed in a [[wikipedia:closed-form expression|closed-form expression]] using the [[wikipedia:Lambert W function|Lambert W function]]:<ref name="Jodra2009">{{cite journal |last=Jodrá |first=P. |year=2009 |title=A closed-form expression for the quantile function of the Gompertz–Makeham distribution |journal=Mathematics and Computers in Simulation |volume=79 |issue= 10|pages=3069–3075 |doi=10.1016/j.matcom.2009.02.002}}</ref> | |||
<math display = "block">Q(u)=\frac{\alpha}{\beta\lambda}-\frac{1}{\lambda} \ln(1-u)-\frac{1}{\beta}W_0\left[\frac{\alpha e^{\alpha/\lambda}(1-u)^{-(\beta/\lambda)}}{\lambda}\right]</math> | |||
The Gompertz law is the same as a [[wikipedia:Fisher–Tippett distribution |Fisher–Tippett distribution ]] for the negative of age, restricted to negative values for the [[wikipedia:random variable|random variable]] (positive values for age). | |||
==Wikipedia References== | |||
*{{cite web |url = https://en.wikipedia.org/w/index.php?title=Survival_analysis&oldid=1194580049|title=Survival analysis | author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 14 January 2024 }} | |||
*{{cite web |url = https://en.wikipedia.org/w/index.php?title=Failure_rate&oldid=1194580971|title=Failure rate | author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 14 January 2024 }} | |||
*{{cite web |url = https://en.wikipedia.org/w/index.php?title=Force_of_mortality&oldid=962004802|title=Force of mortality | author = Wikipedia contributors |website= Wikipedia |publisher= Wikipedia |access-date = 14 January 2024 }} | |||
==References== |
Revision as of 03:10, 15 January 2024
Survival analysis is a branch of statistics for analyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysis in engineering, duration analysis or duration modelling in economics, and event history analysis in sociology. Survival analysis attempts to answer certain questions, such as what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?
To answer such questions, it is necessary to define "lifetime". In the case of biological survival, death is unambiguous, but for mechanical reliability, failure may not be well-defined, for there may well be mechanical systems in which failure is partial, a matter of degree, or not otherwise localized in time. Even in biological problems, some events (for example, heart attack or other organ failure) may have the same ambiguity. The theory outlined below assumes well-defined events at specific times; other cases may be better treated by models which explicitly account for ambiguous events.
More generally, survival analysis involves the modelling of time to event data; in this context, death or failure is considered an "event" in the survival analysis literature – traditionally only a single event occurs for each subject, after which the organism or mechanism is dead or broken. Recurring event or repeated event models relax that assumption. The study of recurring events is relevant in systems reliability, and in many areas of social sciences and medical research.
Survival Function
The survival function is a function that gives the probability that a patient, device, or other object of interest will survive past a certain time.[1] The survival function is also known as the survivor function[2] or reliability function.[3] The term reliability function is common in engineering while the term survival function is used in a broader range of applications, including human mortality. The survival function is the complementary cumulative distribution function of the lifetime. Sometimes complementary cumulative distribution functions are called survival functions in general.
Definition
Let the lifetime [math]T[/math] be a continuous random variable with cumulative hazard function [math]F(t)[/math] and hazard function [math]f(t)[/math] on the interval [math][0,\infty)[/math]. Its survival function or reliability function is:
Examples of survival functions
The graphs below show examples of hypothetical survival functions. The x-axis is time. The y-axis is the proportion of subjects surviving. The graphs show the probability that a subject will survive beyond time t.
For example, for survival function 1, the probability of surviving longer than t = 2 months is 0.37. That is, 37% of subjects survive more than 2 months.
For survival function 2, the probability of surviving longer than t = 2 months is 0.97. That is, 97% of subjects survive more than 2 months.
Median survival may be determined from the survival function: The median survival is the point where the survival function intersects the value 0.5.[4] For example, for survival function 2, 50% of the subjects survive 3.72 months. Median survival is thus 3.72 months.
In some cases, median survival cannot be determined from the graph. For example, for survival function 4, more than 50% of the subjects survive longer than the observation period of 10 months.
The survival function is one of several ways to describe and display survival data. Another useful way to display data is a graph showing the distribution of survival times of subjects. Olkin,[5] page 426, gives the following example of survival data. The number of hours between successive failures of an air-conditioning system were recorded. The time between successive failures are 1, 3, 5, 7, 11, 11, 11, 12, 14, 14, 14, 16, 16, 20, 21, 23, 42, 47, 52, 62, 71, 71, 87, 90, 95, 120, 120, 225, 246, and 261 hours. The mean time between failures is 59.6. This mean value will be used shortly to fit a theoretical curve to the data. The figure below shows the distribution of the time between failures. The blue tick marks beneath the graph are the actual hours between successive failures.
The distribution of failure times is over-laid with a curve representing an exponential distribution. For this example, the exponential distribution approximates the distribution of failure times. The exponential curve is a theoretical distribution fitted to the actual failure times. This particular exponential curve is specified by the parameter lambda, λ= 1/(mean time between failures) = 1/59.6 = 0.0168. The distribution of failure times is called the probability density function (pdf), if time can take any positive value. In equations, the pdf is specified as f(t). If time can only take discrete values (such as 1 day, 2 days, and so on), the distribution of failure times is called the probability mass function (pmf). Most survival analysis methods assume that time can take any positive value, and f(t) is the pdf. If the time between observed air conditioner failures is approximated using the exponential function, then the exponential curve gives the probability density function, f(t), for air conditioner failure times.
Another useful way to display the survival data is a graph showing the cumulative failures up to each time point. These data may be displayed as either the cumulative number or the cumulative proportion of failures up to each time. The graph below shows the cumulative probability (or proportion) of failures at each time for the air conditioning system. The stairstep line in black shows the cumulative proportion of failures. For each step there is a blue tick at the bottom of the graph indicating an observed failure time. The smooth red line represents the exponential curve fitted to the observed data.
A graph of the cumulative probability of failures up to each time point is called the cumulative distribution function, or CDF. In survival analysis, the cumulative distribution function gives the probability that the survival time is less than or equal to a specific time, t.
Let [math]T[/math] be survival time, which is any positive number. A particular time is designated by the lower case letter [math]t[/math]. The cumulative distribution function of [math]T[/math] is the function
where the right-hand side represents the probability that the random variable [math]T[/math] is less than or equal to [math]t[/math]. If time can take on any positive value, then the cumulative distribution function [math]F(t)[/math] is the integral of the probability density function [math]f(t)[/math].
For the air conditioning example, the graph of the CDF below illustrates that the probability that the time to failure is less than or equal to 100 hours is 0.81, as estimated using the exponential curve fit to the data.
An alternative to graphing the probability that the failure time is less than or equal to 100 hours is to graph the probability that the failure time is greater than 100 hours. The probability that the failure time is greater than 100 hours must be 1 minus the probability that the failure time is less than or equal to 100 hours, because total probability must sum to 1.
This gives
This relationship generalizes to all failure times:
This relationship is shown on the graphs below. The graph on the left is the cumulative distribution function, which is [math]P(T \lt t)[/math]. The graph on the right is [math]P(T \gt t) = 1 - P(T \lt t)[/math]. The graph on the right is the survival function, [math]S(t)[/math]. The fact that the [math]S(t) = 1 – CDF[/math] is the reason that another name for the survival function is the complementary cumulative distribution function.
Force of Mortality
In actuarial science, force of mortality represents the instantaneous rate of mortality at a certain age measured on an annualized basis. It is identical in concept to failure rate, also called hazard function, in reliability theory.
Motivation and definition
In a life table, we consider the probability of a person dying from age [math]x[/math] to [math]x + 1[/math], called [math]q_x[/math]. In the continuous case, we could also consider the conditional probability of a person who has attained age ([math]x[/math]) dying between ages [math]x[/math] and [math]x + \Delta x[/math], which is
where [math]F_X[/math] is the cumulative distribution function of the continuous age-at-death random variable, [math]X[/math]. As [math]\Delta x[/math] tends to zero, so does this probability in the continuous case. The approximate force of mortality is this probability divided by [math]\Delta x[/math]. If we let [math]\Delta x[/math] tend to zero, we get the function for force of mortality, denoted by [math]\mu(x)[/math]:
Since [math]f_X(x) = F'_X(x)[/math] is the probability density function of [math]X[/math], and [math]S(x)= 1 - F_X(x)[/math] is the survival function, the force of mortality can also be expressed variously as:
To understand conceptually how the force of mortality operates within a population, consider that the ages, [math]x[/math], where the probability density function [math]f_X(x)[/math], there is no chance of dying. Thus the force of mortality at these ages is zero. The force of mortality [math]\mu(x)[/math] uniquely defines a probability density function [math]f_X(x)[/math].
The force of mortality [math]\mu(x)[/math] can be interpreted as the conditional density of failure at age [math]x[/math], while [math]f(x)[/math] is the unconditional density of failure at age [math]x[/math].[6] The unconditional density of failure at age [math]x[/math] is the product of the probability of survival to age [math]x[/math], and the conditional density of failure at age [math]x[/math], given survival to age [math]x[/math].
This is expressed in symbols as
or equivalently
In many instances, it is also desirable to determine the survival probability function when the force of mortality is known. To do this, integrate the force of mortality over the interval [math]x[/math] to [math]x + t[/math]
.
By the fundamental theorem of calculus, this is simply
Let us denote
then taking the exponent to the base e, the survival probability of an individual of age [math]x[/math] in terms of the force of mortality is
Examples
Type | Force of mortality | Survival function |
---|---|---|
Exponential | [[math]]\mu(y) = \lambda[[/math]] |
[[math]]S_x(t) = e^{-\int_x^{x+t} \lambda dy} = e^{-\lambda t}[[/math]]
|
Gamma | [[math]]\mu(y) = \frac{y^{\alpha-1} e^{-y}}{\Gamma(\alpha) - \gamma(\alpha, y)}, [[/math]] |
[[math]]f(x) = \frac{x^{\alpha - 1} e^{-x}}{\Gamma(\alpha)}[[/math]]
|
Weibull | [[math]] \mu(y) = \alpha \lambda^\alpha y^{\alpha-1},[[/math]] |
[[math]]S_x(t) = e^{-\int_x^{x+t}\mu(y) dy} = A(x) e^{ - (\lambda (x+t))^\alpha }, [[/math]] where [math]A(x) = e^{(\lambda x)^{\alpha}}.[/math]
|
Gompertz–Makeham law of mortality
Parameters |
[math]\alpha \in \mathbb{R}^+[/math] [math]\beta \in \mathbb{R}^+[/math] [math]\lambda \in \mathbb{R}^+[/math] | ||
---|---|---|---|
Support | [math]x \in \mathbb{R}^+[/math] | ||
[math]\left( \alpha e^{\beta x} + \lambda \right) \cdot \exp \left[ -\lambda x-\frac{\alpha}{\beta} \left( e^{\beta x} -1\right) \right][/math] | |||
CDF | [math]1-\exp \left[-\lambda x-\frac{\alpha}{\beta} \left( e^{\beta x}-1\right) \right][/math] |
The Gompertz–Makeham law states that the human death rate is the sum of an age-dependent component (the Gompertz function, named after Benjamin Gompertz),[7] which increases exponentially with age[8] and an age-independent component (the Makeham term, named after William Makeham).[9] In a protected environment where external causes of death are rare (laboratory conditions, low mortality countries, etc.), the age-independent mortality component is often negligible. In this case the formula simplifies to a Gompertz law of mortality. In 1825, Benjamin Gompertz proposed an exponential increase in death rates with age.
The Gompertz–Makeham law of mortality describes the age dynamics of human mortality rather accurately in the age window from about 30 to 80 years of age. At more advanced ages, some studies have found that death rates increase more slowly – a phenomenon known as the late-life mortality deceleration[8] – but more recent studies disagree.[10]
The decline in the human mortality rate before the 1950s was mostly due to a decrease in the age-independent (Makeham) mortality component, while the age-dependent (Gompertz) mortality component was surprisingly stable.[8][11] Since the 1950s, a new mortality trend has started in the form of an unexpected decline in mortality rates at advanced ages and "rectangularization" of the survival curve.[12][13]
The hazard function for the Gompertz-Makeham distribution is most often characterised as [math]h(x)=\alpha e^{\beta x} + \lambda [/math]. The empirical magnitude of the beta-parameter is about .085, implying a doubling of mortality every .69/.085 = 8 years (Denmark, 2006).
The quantile function can be expressed in a closed-form expression using the Lambert W function:[14]
The Gompertz law is the same as a Fisher–Tippett distribution for the negative of age, restricted to negative values for the random variable (positive values for age).
Wikipedia References
- Wikipedia contributors. "Survival analysis". Wikipedia. Wikipedia. Retrieved 14 January 2024.
- Wikipedia contributors. "Failure rate". Wikipedia. Wikipedia. Retrieved 14 January 2024.
- Wikipedia contributors. "Force of mortality". Wikipedia. Wikipedia. Retrieved 14 January 2024.
References
- Kleinbaum, David G.; Klein, Mitchel (2012), Survival analysis: A Self-learning text (Third ed.), Springer, ISBN 978-1441966452
- Tableman, Mara; Kim, Jong Sung (2003), Survival Analysis Using S (First ed.), Chapman and Hall/CRC, ISBN 978-1584884088
- Ebeling, Charles (2010), An Introduction to Reliability and Maintainability Engineering (Second ed.), Waveland Press, ISBN 978-1577666257
- Machin, D., Cheung, Y. B., Parmar, M. (2006). Survival Analysis: A Practical Approach. Deutschland: Wiley. Page 36 and following Google Books
- Olkin, Ingram; Gleser, Leon; Derman, Cyrus (1994), Probability Models and Applications (Second ed.), Macmillan, ISBN 0-02-389220-X
- R. Cunningham, T. Herzog, R. London (2008). Models for Quantifying Risk, 3rd Edition, Actex.
- Gompertz, B. (1825). "On the Nature of the Function Expressive of the Law of Human Mortality, and on a New Mode of Determining the Value of Life Contingencies". Philosophical Transactions of the Royal Society 115: 513–585. doi: .
- 8.0 8.1 8.2 Gavrilov, Leonid A.; Gavrilova, Natalia S. (1991), The Biology of Life Span: A Quantitative Approach., New York: Harwood Academic Publisher, ISBN 3-7186-4983-7
- Makeham, W. M. (1860). "On the Law of Mortality and the Construction of Annuity Tables". J. Inst. Actuaries and Assur. Mag. 8 (6): 301–310. doi: .
- "Mortality Measurement at Advanced Ages: A Study of the Social Security Administration Death Master File" (2011). North American Actuarial Journal 15 (3): 432–447. doi: . PMID 22308064. PMC:3269912.
- "Human life span stopped increasing: Why?" (1983). Gerontology 29 (3): 176–180. doi: . PMID 6852544.
- Gavrilov, L. A. (1985). "A new trend in human mortality decline: derectangularization of the survival curve [Abstract]". Age 8 (3): 93. doi: .
- "Stárnutí a dlouhovekost: Zákony a prognózy úmrtnosti pro stárnoucí populace" (in cs) (2011). Demografie 53 (2): 109–128. PMID 25242821.
- Jodrá, P. (2009). "A closed-form expression for the quantile function of the Gompertz–Makeham distribution". Mathematics and Computers in Simulation 79 (10): 3069–3075. doi: .