Elements of Combinatorial Analysis and Simple Random Walks

[math] \newcommand{\R}{\mathbb{R}} \newcommand{\A}{\mathcal{A}} \newcommand{\B}{\mathcal{B}} \newcommand{\N}{\mathbb{N}} \newcommand{\C}{\mathbb{C}} \newcommand{\Rbar}{\overline{\mathbb{R}}} \newcommand{\Bbar}{\overline{\mathcal{B}}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\E}{\mathbb{E}} \newcommand{\p}{\mathbb{P}} \newcommand{\one}{\mathds{1}} \newcommand{\0}{\mathcal{O}} \newcommand{\mat}{\textnormal{Mat}} \newcommand{\sign}{\textnormal{sign}} \newcommand{\CP}{\mathcal{P}} \newcommand{\CT}{\mathcal{T}} \newcommand{\CY}{\mathcal{Y}} \newcommand{\F}{\mathcal{F}} \newcommand{\mathds}{\mathbb}[/math]

In this chapter we are going to give an introduction to combinatorial aspects of probability theory and describe several simple methods for the computation of different problems, where the usage of these is an important aspect. We will also already give some of the most basic examples of discrete distributions and consider combinatorial aspects of random events such as a first look at a simple random walk. We will look at the reflection principle and basic terminology of random walks as a combinatorial theory. Finally, we will consider probabilities of events for special random walks.

Summability conditions

Let [math]I[/math] be an index set, and [math](a_i)_{i\in I}\subset \Rbar_+^I[/math] (i.e. [math]a_i\in[0,\infty][/math], for all [math] i\in I[/math]). We will write

[[math]] \sum_{i\in I}a_i:=\sup_{J\subset I\atop J\text{finite}}\sum_{j\in J}a_j. [[/math]]

The family [math](a_i)_{i\in I}[/math] is said to be summable if [math]\sum_{i\in I}a_i[/math] is finite. The following elementary properties are important for such a family [math](a_i)_{i\in I}[/math].

  • For [math]H\subseteq I[/math] we have
    [[math]] \sum_{h\in H}a_h\leq \sum_{i\in I}a_i [[/math]]
    and [math](a_h)_{h\in H}[/math] is summable if [math](a_i)_{i\in I}[/math] is summable.
  • For a family [math](a_i)_{i\in I}[/math] to be summable, it is necessary and sufficient that the Cauchy criterion is satisfied, i.e. for all [math]\epsilon \gt 0[/math], there is a finite set [math]J\subset I[/math], with all finite subsets [math]K\subset I[/math] satisfying [math]K\cap J=\varnothing[/math], we get
    [[math]] \sum_{i\in K}a_i \lt \epsilon. [[/math]]
  • If the family [math](a_i)_{i\in I}[/math] is summable, the set [math]\{i\in I\mid a_i\not=0\}[/math] is at most countable, as the union of the sets [math]I_n=\left\{i\in I\mid a_i\geq \frac{1}{n}\right\}[/math]. Moreover, [math]I_n[/math] has at most [math]\left[n\left(\sum_{i\in I}a_i\right)\right][/math] elements, where [math][x][/math] denotes the integer part of [math]x[/math].

To avoid confusion we want to emphasize that If [math]A_1,...,A_n[/math] are disjoint sets, we will sometimes write [math]\sum_{n\geq 1}A_n[/math] instead of [math]\bigsqcup_{n\geq 1}A_n[/math].

Lemma

Let [math](a_i)_{i\in I}\subset\bar\R_+^I[/math] be a family of numbers in [math]\bar\R_+[/math]. If [math]I=\sum_{\lambda\in L}H_{\lambda}[/math] and [math]q_\lambda=\sum_{i\in H_{\lambda}}a_i[/math], then

[[math]] \sum_{i\in I}a_i=\sum_{\lambda\in L}q_\lambda, [[/math]]
where [math]L[/math] is an Index set and [math](H_\lambda)_{\lambda\in L}[/math] is a partition of [math]I[/math].


Show Proof

Exercise.

In the case [math]I=\N[/math], we say that [math]\sum_{n\in\N}u_n[/math] converges absolutely if [math]\sum_{n\in\N}\vert u_n\vert[/math] converges. In this case we can modify the order in which the terms are taken without changing the convergence nor the sum of the series.

Theorem (Stirling)

For all [math]n\in\N\setminus\{0\}[/math] we have

[[math]] n!=\kappa n^{n+\frac{1}{2}}\exp\left(-n+\frac{\theta(n)}{12n}\right), [[/math]]
where [math]1-\frac{1}{12n+1}\leq \theta(n)\leq 1[/math] and [math]\kappa=\int_{-\infty}^\infty e^{-\frac{1}{2}w^2}dw=\sqrt{2\pi}[/math].

Finite Probability Spaces

General Probability Spaces

Definition (Probability measure)

Let [math](\Omega,\A)[/math] be a measurable space. A probability measure defined on [math]\A[/math] is an application

[[math]] \p:\A\to[0,1], [[/math]]

such that

  • [math]\p[\Omega]=1[/math].
  • for all [math](A_n)_{n\in \N}[/math], with [math]A_n\cap A_m=\varnothing[/math] for [math]n\not=m[/math], we have
    [[math]] \p\left[\bigcup_{n\in\N}A_n\right]=\sum_{n\in \N}\p[A_n]. [[/math]]

We want to recall the following properties without a proof.

  • For [math]A,B\in\A[/math] with [math]A\subset B[/math] we get [math]\p[A]\leq \p[b][/math].
  • For [math]A\in\A[/math] we get [math]\p[A^C]=1-\p[A][/math].
  • If [math](A_n)_{n\in\N}\subset\A[/math] and [math]A_n\uparrow A[/math] as [math]n\to\infty[/math], then [math]\p[A_n]\uparrow \p[A][/math] as [math]n\to\infty[/math].
  • If [math](A_n)_{n\in\N}\subset\A[/math] and [math]A_n\downarrow A[/math] as [math]n\to\infty[/math], then [math]\p[A_n]\downarrow \p[A][/math] as [math]n\to\infty[/math].
Theorem

Let [math]\mu[/math] be a measure on a measureable space [math](\Omega,\A)[/math].

  • If [math](A_n)_{n\in\N}\subset\A[/math] is an increasing family of measurable sets, then
    [[math]] \mu\left( \bigcup_{n\in\N}A_n \right)=\lim_{n\to\infty}\mu(A_n). [[/math]]
    Conversely, if an additive function satisfies the property above, then it is a measure.
  • Let [math]\mu[/math] be an additive, positive and bounded function on [math]\A[/math]. Then [math]\sigma[/math]-additivity is equivalent to say that for all decreasing sequences [math](B_n)_{n\in\N}\subset\A[/math], we get the implication
    [[math]] \bigcap_{n\in\N}B_n=\varnothing\Longrightarrow \lim_{n\to\infty}\mu(B_n)=0. [[/math]]

Probability measures on finite spaces

There are two cases where [math](\Omega,\A)[/math] is a measurable space, either [math]\Omega[/math] is finite or [math]\A[/math] is finite. An important question is how to define probability measures on finite or countable spaces. If [math]I[/math] is at most countable, then having a probability measure on [math]\mathcal{P}(I)[/math] is equivalent to having a family [math](a_i)_{i\in I}[/math] of positive real numbers such that [math]\sum_{i\in I}a_i=1[/math]. Moreover, for [math]A\subset I[/math], we can define a measure [math]\mu[/math] by

[[math]] \mu(A)=\sum_{i\in A}a_i, [[/math]]

where [math]\mu(\{i\})=a_i[/math] for all [math]i\in I[/math].

Example

We got the following examples:

The Poisson distribution

The Poisson distribution with parameter [math]\lambda \gt 0[/math], is a probability measure [math]\Pi_\lambda[/math] on [math](\N,\mathcal{P}(\N))[/math], defined for all [math]n\geq0[/math] by

[[math]] \Pi_{\lambda}(\{ n\})=e^{-\lambda}\frac{\lambda^n}{n!}. [[/math]]

For [math]A\subset \N[/math] we thus get

[[math]] \Pi_\lambda(A)=\sum_{n\in A}e^{-\lambda}\frac{\lambda^n}{n!}. [[/math]]

In order to check that it is indeed a probability distribution, we need to check that [math]\sum_{i\geq 0}\Pi_\lambda(\{i\})=1[/math]. indeed, we have

[[math]] \sum_{i\geq 0}e^{-\lambda}\frac{\lambda^i}{i!}=e^{-\lambda}\sum_{i\geq 0}\frac{\lambda^i}{i!}=1. [[/math]]

The Riemann [math]\zeta[/math]-function

Another example can be obtained by considering the function [math]\zeta(s)=\sum_{n\geq 1}n^{-s}[/math] for [math]s \gt 1[/math], which we can deform to a probability measure. Indeed, if we set [math]a_n=\frac{1}{\zeta(s)}\cdot \frac{1}{n^s}[/math] for all [math]n\geq 1[/math], we get

[[math]] \sum_{n\geq 1}a_n=\frac{1}{\zeta(s)}\sum_{n\geq 1}n^{-s}=\frac{1}{\zeta(s)}\cdot \zeta(s)=1. [[/math]]

The Geometric distribution

Similarly one can define the geometric distribution with parameter [math]r\in(0,1)[/math] on [math](\N\setminus\{0\},\mathcal{P}(\N\setminus\{0\}))[/math] by

[[math]] \gamma_r(\{n\})=(1-r)r^{n-1}. [[/math]]

The Uniform distribution

For a finite state space [math]\Omega[/math], we can define the uniform distribution for any measurable set [math]A\in\A[/math] by

[[math]] \p[A]=\frac{\vert A\vert}{\vert\Omega\vert}. [[/math]]

For [math]\omega\in\Omega[/math] we get [math]\p[\{\omega\}]=\frac{1}{\vert\Omega\vert}.[/math] Take for example [math]\Omega=\{1,2,...,N\}[/math]. Then [math]\p[\{k\}]=\frac{1}{N}[/math].

There is no uniform measure on [math]\N[/math], since [math]\N[/math] is not finite. Thus from now on we shall only consider the case of a finite state space [math]\Omega[/math] or a finite [math]\sigma[/math]-Algebra [math]\A[/math].

Theorem

Let [math]\A[/math] be a finite [math]\sigma[/math]-Algebra on a state space [math]\Omega[/math]. Then there exists a partition [math]A_1,...,A_n[/math] of [math]\Omega[/math], such that [math]A_k\in\A[/math] for all [math]k\in\{1,...,n\}[/math] and with the property that any [math]A\in \A[/math] can be written as a union of the events [math]A_1,...,A_n[/math]. The sets are called the atoms of [math]\A[/math].


Show Proof

Fix [math]\omega\in\Omega[/math]. Define the set [math]C(\omega):=\bigcap_{B\in\A,\atop\omega\in B}B[/math]. Then by finite intersection it follows that [math]C(\omega)\in\A[/math]. In fact [math]C(\omega)[/math] is the smallest element of [math]\A[/math], which contains [math]\omega[/math]. Moreover, it is easy to see that [math]\omega\sim\omega'[/math] if and only if [math]C(\omega)=C(\omega')[/math] defines an equivalence relation on [math]\Omega[/math] and that [math]\omega'\in C(\omega)[/math] if and only if [math]C(\omega)=C(\omega')[/math]. Hence if [math]A[/math] is an equivalence class and [math]\omega\in A[/math], then [math]A=C(\omega)[/math]. Therefore, one can take the events [math]A_1,...,A_n[/math] as the equivalence classes.

Basics of combinatorial Analysis

Sampling

Let us consider a population of size [math]n\in\N[/math], i.e. a set [math]S=\{S_1,...,S_n\}[/math] with [math]n[/math] elements. We call any ordered sequence [math](S_{i_1},...,S_{i_r})[/math] of [math]r[/math] elements of [math]S[/math] a sample of size [math]r[/math], drawn from this population. Then there are two possible procedures.

  • Sampling with replacement, i.e. the same element can be drawn more than once.
  • Sampling without replacement. Here an element one choses is removed from the population. In this case [math]r\leq n[/math].

In either case, our experiment is described by a sample space [math]\Omega[/math] in which each individual points represents a sample of size [math]r[/math]. In the first case, we get [math]\vert\Omega\vert=n^r[/math], where an element [math]\omega\in\Omega[/math] is of the form [math]\omega=(S_{i_1},...,S_{i_r})[/math]. In the second case we get

[[math]] \vert\Omega\vert=n(n-1)\dotsm(n-r+1)=\frac{n!}{(n-r)!}. [[/math]]

Example

We got the following examples:

  • If by a "ten letter word" is meant a (possibly meaningless) sequence of ten letters, then such a word represents a sample from the population of 26 letters. There are [math]26^{10}[/math] such words.
  • Tossing a coin [math]r[/math] times is one way of obtaining a sample of size [math]r[/math] drawn from the population of the letters [math]H[/math] and [math]T[/math]. Usually we assign equal probabilities to all samples, namely [math]n^{-r}[/math] is sampling with replacement and [math]\frac{1}{n(n-1)...(n-r+1)}[/math] is a sampling without replacement.
  • A random sample of size [math]r[/math] with replacement is taken from a population of size [math]n[/math], and we assume that [math]r\leq n[/math]. The probability [math]p[/math] that in the sample no element appears twice is
    [[math]] p=\frac{n(n-1)\dotsm (n-r+1)}{n^r}\left(\p[A]=\frac{\vert A\vert}{\vert\Omega\vert}\right). [[/math]]
    Consequently, the probability of having four different numbers chosen from [math]\{0,1,2,3,...,9\}[/math] is (here [math]n=10[/math], [math]r=4[/math])
    [[math]] \frac{10\times 9\times 8\times 7}{10^4}\approx 0.5. [[/math]]
    In sampling without replacement the probability for any fixed element of the population to be included in a random sample of size [math]r[/math] is
    [[math]] 1-\frac{(n-1)(n-2)\dotsm (n-r)}{\underbrace{n(n-1)\dotsm (n-r+1)}_{\text{probability of the complementary event}}}=1-\frac{n-r}{n}=\frac{r}{n}. [[/math]]
    We have used that for [math]A\in\A[/math], [math]\p[A]=1-\p[A^C][/math]. Similarly, in sampling with replacement the probability that a fixed element of the population to be included in a random sample of size [math]r[/math] is
    [[math]] 1-\left(\frac{n-1}{n}\right)^r=1-\left(1-\frac{1}{n}\right)^2 [[/math]]

Subpopulations

Let [math]S=\{S_1,...,S_n\}[/math] be a population of size [math]n[/math]. We call subpopulations of size [math]r[/math] any set of [math]r[/math] elements, distinct or not, chosen from [math]S[/math]. Elements of a subpopulation are not ordered, i.e. two samples of size [math]r[/math] from [math]S[/math] correspond to the same subpopulation if they only differ by the order of the elements. Here we have two cases as well. The case [math]\underline{without}[/math] replacement: We must have [math]r\leq n[/math]. There are [math]\binom{n}{r}=\frac{n!}{r!(n-r)!}[/math]. We have that [math]\binom{n}{r}=\binom{n}{n-r}[/math]. We always use the convention [math]0!=1[/math].

Example

We got the following examples:

Poker

There are [math]\binom{52}{5}=2'598'960[/math] hands at poker. Let us calculate the probability [math]p[/math] that a hand at poker contains 5 different face values. The face values can be chosen in [math]\binom{13}{5}[/math] different ways, and corresponding to each case we can choose one of the four suits. It follows that

[[math]] p=\frac{4^5\cdot\binom{13}{5}}{\binom{52}{5}}\approx 0.5. [[/math]]

Senator problem

Each of the 50 states has two senators. We consider the event that in a committee of 50 senators chose at random

  • a given state is represented.
  • all states are represented.

In the first case, it is better to calculate the probability [math]q[/math] of the complementary event that the given state is not represented

[[math]] q=\frac{\binom{98}{50}}{\binom{100}{50}}=\frac{50\times 49}{100\times 99}. [[/math]]

For the second case we note that a comittee including a senator of all states can be chosen in [math]2^{50}[/math] different ways. The probability that all states are included is

[[math]] \frac{2^{50}}{\binom{100}{50}}\approx 4.126\cdot 10^{-14}. [[/math]]

An occupancy problem

Consider a random distribution of [math]r[/math] balls in [math]n[/math] cells. To find the probability [math]p_k[/math] that a specified cell contains exactly [math]k[/math] balls [math](k=0,1,...,r)[/math]. We note that [math]k[/math] balls can be chosen in [math]\binom{r}{k}[/math] different ways and the remaining [math](r-k)[/math] balls can be placed into the remaining [math](n-1)[/math] cells in [math](n-1)^{r-k}[/math] so we get

[[math]] p_k=\binom{r}{k}(n-1)^{r-k}\frac{1}{n^r}\left(\p[A]=\frac{\vert A\vert}{\vert \Omega\vert}\right) [[/math]]

We call [math]\tilde p=\frac{1}{n}[/math], then

[[math]] p_k=\binom{r}{k}\tilde p^k(1-\tilde p)^{r-k}. [[/math]]

This is called the binomial distribution of parameters [math]p[/math] and [math]r[/math]. It is a probability distribution on [math]\{0,1,...,r\}[/math] and

[[math]] \sum_{k=0}^rp_k=\sum_{k=0}^r\binom{r}{k}\tilde p^k(1-\tilde p)^{r-k}=(\tilde p+1-\tilde p)^r=1. [[/math]]

For example in a coin tossing game we have [math]r[/math] tosses and we set [math]H\leftrightarrow 1[/math] and [math]T\leftrightarrow 0[/math]. We have the state space

[[math]] \Omega=\{(\omega_j)_{1\leq j\leq r}\},\omega_j\in\{0,1\}. [[/math]]

Moreover, we have that

[[math]] S_r=\sum_{j=1}^r\omega_j, [[/math]]

where [math]S_r[/math] is the number of heads. Assume that [math]\p(H)=p\in[0,1][/math], then

[[math]] \p[S_r=k]=\binom{r}{k}p^k(1-p)^{r-k}. [[/math]]

Theorem

Let [math]r_1,...,r_k[/math] be integers such that [math]r_1+...+r_k=n[/math] [math](r_i\geq 0,r_i\in\N)[/math]. The number of ways in which a population of [math]n[/math] elements can be partitioned into [math]k[/math] subpopulations of which the first contains [math]r_1[/math] elements, the second contains [math]r_2[/math] elements etc., is given by

[[math]] \frac{n!}{r_1!r_2!\dotsm r_k!} [[/math]]


Show Proof

For the first part we note that

[[math]] \frac{n!}{r_1!\dotsm r_k!}=\binom{n}{r_1}\binom{n-r_1}{r_2}\binom{n-r_1-r_2}{r_3}\dotsm\binom{n-r_1-r_2-...-r_{k-2}}{r_k-1}. [[/math]]
For the second part we see that an induction on [math]n[/math] shows that

[[math]] \begin{equation} (U_1+...+U_k)^n=\sum_{\substack{r_1\geq 0,...,r_k\geq 0\\ r_1+...+r_k=n}}\frac{n!}{r_1!\dotsm r_k!}U_1^{r_1}\dotsm U_k^{r_k}. \end{equation} [[/math]]
But the left hand side is also

[[math]] \sum_{1\leq k_1,...,k_n\leq k}U_{k_1}\dotsm U_{k_n}. [[/math]]
Now ordering the terms and compering with [math](1)[/math] yields the result.

Example

A throw of twelve dice can result in [math]6^{12}[/math] different outcomes. The event that each face can appear twice is

[[math]] \frac{12!}{2^6\cdot 6^{12}}. [[/math]]

The case [math]\underline{with}[/math] replacement: We take the point of view of occupancy problems. Our model is that of placing randomly [math]r[/math] balls into [math]n[/math] cells. Such an event is completely described by its occupancy numbers [math]r_1,...,r_n[/math] where [math]r_k[/math] stands for the number of balls in the [math]k[/math]-th cell. Every [math]n[/math]-tupel of integers satisfying [math]r_1+...+r_k=r[/math] describes a possible configuration. Two distributions are distinguishable if the occupancy numbers are different. We denote by [math]A_{r,n}[/math] the number of distinguishable distributions. It is also the number of subpopulations of size [math]r[/math] with replacement from a population [math]S=\{S_1,...,S_n\}[/math] of size [math]n[/math] and it is characterized by the number [math]r_k[/math] of appearances of the individual [math]S_k[/math] with the restriction [math]r_1+...+r_n=r[/math].

Theorem


[[math]] A_{r,n}=\binom{n+r-1}{r}=\binom{n+r-1}{n-r}. [[/math]]


Show Proof

There are two different proofs.

  • We represent the balls by [math]\bigotimes[/math] and the cells by the [math]n[/math] spaces between [math]n+1[/math] bars. Then
    [[math]] \mid \bigotimes \bigotimes \bigotimes \mid\bigotimes\mid\mid\mid\mid \bigotimes \bigotimes \bigotimes \bigotimes\mid [[/math]]
    is used as a symbol for a distribution of [math]r=8[/math] balls in [math]n=6[/math] cells. The occupancy numbers are
    [[math]] 3,1,0,0,0,4\Longrightarrow 3+1+0+0+0+4=8. [[/math]]
    Such a symbol necessarily starts and ends with a bar, but the remaining [math](n-1)[/math] bars and [math]r[/math] balls can appear in an arbitrary order. We have that
    [[math]] \binom{n+r-1}{r} [[/math]]
    is the number of ways of selecting [math]r[/math] places (for the balls) out of [math]n+r-1[/math].
  • For [math]|t_i| \lt 1[/math] with [math]1\leq i\leq n[/math]
    [[math]] \begin{align*} \prod_{1\leq i\leq n}\frac{1}{1-t_i}=\prod_{1\leq i\leq n}\left(\sum_{m_i\geq 0}t_i^{m_i}\right)&=\sum_{0\leq m_1,...,m_n}t_1^{m_1}\cdot t_2^{m_2}\dotsm t_n^{m_n}\\ &=\sum_{r\geq 0}\sum_{m_1+...+m_n=r\atop m_1\geq 0,...,m_n\geq 0}t_1^{m_1}\cdot t_2^{m_2}\dotsm t_n^{m_n} \end{align*} [[/math]]
    For [math]t_1=t_2=\dotsm=t_n=t[/math] we have
    [[math]] \frac{1}{(1-t)^n}=\sum_{r\geq 0}\sum_{m_1+...+m_n=r\atop m_1\geq 0,...,m_n\geq 0}t^r=\sum_{r\geq 0}A_{r,n}\cdot t^r. [[/math]]
    Taylor's formula gives us that
    [[math]] \frac{1}{(1-r)^n}=\sum_{r\geq 0}\binom{n+r-1}{r}t^n\Longrightarrow A_{r,n}=\binom{n+r-1}{r}. [[/math]]

Example

The partial derivatives of order [math]r[/math] of a [math]C^{\infty}[/math] function [math]f(x_1,...,x_n)[/math] of [math]n[/math] variables do not depend on the order of differentiation but only on the number of times that each variable appears. Hence there exists [math]\binom{n+r-1}{r}[/math] different partial derivatives of order [math]r[/math].

[math]A_{r,n}[/math] is also the number of different integer solutions to the equation [math]r_1+...+r_n=r[/math].

Combination of events

We write a permutation [math]\sigma\in\Sigma_N[/math] in the following way

[[math]] \sigma=\begin{pmatrix}1&2&\dotsm &N\\\sigma(1)&\sigma(2)&\dotsm&\sigma(N)\end{pmatrix}. [[/math]]

We know that the permutation group has cardinality [math]\vert\Sigma_N\vert=N![/math]. The probability of a permutation is given by

[[math]] \p[\sigma]=\frac{1}{N!}. [[/math]]

We go back to a probability space [math](\Omega,\A,\p)[/math]. If [math]A_1,A_2\in\A[/math] with [math]A=A_1\cup A_2[/math], we get

[[math]] \p[A_1\cup A_2]=\p[A_1]+\p[A_2]-\p[A_1\cap A_2]. [[/math]]

What about [math]\p[A][/math] when [math]A=\bigcup_{i=1}^NA_i[/math], where [math]A_1,...,A_N\subset \A[/math]? We shall note [math]\p_i=\p[A_i][/math], [math]\forall 1\leq i\leq N[/math] and [math]\p_{ij}=\p[A_i\cap A_j][/math] for [math]i \lt j[/math]. For [math]i_1 \lt ... \lt i_k[/math] we have that

[[math]] \p_{i_1,...,i_k}=\p\left[\bigcap_{j=1}^kA_{i_j}\right]. [[/math]]

We note that

[[math]] \begin{align*} S_1&=\sum_{1\leq i\leq N}\p_i\\ S_2&=\sum_{1\leq i \lt j\leq N}\p_{ij}\\ &\vdots\\ S_r&=\sum_{1\leq i_1 \lt ... \lt i_r\leq N}\p_{i_1,...,i_r} \end{align*} [[/math]]

Moreover, [math]S_r[/math] has [math]\binom{N}{r}[/math] terms. For [math]N=2[/math] we get

[[math]] \p[A]=S_1-S_2. [[/math]]

Theorem (Inclusion-Exclusion)


[[math]] \p\left[\bigcup_{i=1}^nA_i\right]=S_1-S_2+S_3-S_4+...\pm S_N\Longrightarrow \p\left[\bigcup_{i=1}^NA_i\right]=\sum_{r=1}^N(-1)^{r-1}S_r [[/math]]

Example

Two equivalent decks of [math]N[/math] cards. Each are put into random order and matched against each other. If a card occupies the same place in both decks we speack of a match. We want to compute the probability of having at least one match. Let us number the cards [math]1,...,N[/math] with

[[math]] \sigma=\begin{pmatrix}1&2&\dotsm &N\\\sigma(1)&\sigma(2)&\dotsm&\sigma(N)\end{pmatrix}, [[/math]]

where the first line denotes the first deck and the second line denotes the second deck. A match for [math]k[/math] corresponds to [math]k=\sigma(k)[/math]. In that case we call [math]k[/math] a fix point of the permutation [math]\sigma[/math]. We look for the number of permutations of [math]\{1,...,N\}[/math] (out of the [math]N[/math]) which have at least 1 fixed point. Let [math]A_k[/math] be the event [math]k=\sigma(k)[/math]. Clearly

[[math]] \p[A_k]=\p_k=\frac{(N-1)!}{N!}=\frac{1}{N}. [[/math]]

similarly if [math]i \lt j[/math],

[[math]] \p_{ij}=\p[A_i\cap A_j]=\frac{(N-2)!}{N!}=\frac{1}{N(N-1)}. [[/math]]

More generally

[[math]] \p_{i_1,...,i_r}=\frac{(N-r)!}{N!} [[/math]]

and hence

[[math]] S_r=\sum_{1\leq i_1 \lt ... \lt i_r\leq N}\p_{i_1,...,i_r}=\sum_{1\leq i_1 \lt ... \lt i_r\leq N}\frac{(N-r)!}{N!}=\frac{(N-r)!}{N!}\sum_{1\leq i_1 \lt ... \lt i_r\leq N}1=\frac{(N-r)!}{N!}\binom{N}{r}=\frac{1}{r!}. [[/math]]

If [math]\p_1[/math] is the probability of the least fixed point, then

[[math]] \p_1=\p\left[\bigcup_{i=1}^NA_i\right]=1-\frac{1}{2!}+\frac{1}{3!}-...\pm\frac{1}{N!}. [[/math]]

Random Walks

The Reflection Principle

From a formal point of view, we shall be concerned with arrangements of finitely many [math]+1[/math] and [math]-1[/math]. Consider [math]n=p+q[/math] symbols [math]\epsilon_1,...,\epsilon_n[/math], where [math]\epsilon_j\in\{-1,+1\}[/math] for all [math]1\leq j\leq n[/math]. Suppose that there are [math]p[/math] [math]\{+1\}[/math]'s and [math]q[/math] [math]\{-1\}[/math]'s. Then [math]S_k=\epsilon_1+...+\epsilon_k[/math] represents the difference between the number of [math]\{+1\}[/math]'s and [math]\{-1\}[/math]'s at the first [math]k[/math] places.

[[math]] \begin{equation} S_k-S_{k-1}=\epsilon_k=\pm1S_0=0,S_n=p-q \end{equation} [[/math]]

Example for a random walk

The arrangement [math](\epsilon_1,...,\epsilon_n)[/math] will be represented by a polygonal line whose [math]k[/math]'th side has slope [math]\epsilon_k[/math], and whose [math]k[/math]'th vertex has ordinate [math]S_k[/math]. Such lines will be called a path. We shall use [math](t,x)[/math] for coordinates.

Definition

Let [math]n \gt 0[/math] and [math]x[/math] be integers. A path [math](S_0,S_1,...,S_n)[/math] from the origin to the point [math](n,x)[/math] is a polygonal line whose vertices have abscissas [math]0,1,2,...,n[/math] and ordinates [math]S_0,S_1,...,S_n[/math] satisfying (10) with [math]S_n=x[/math]. We shall refer to [math]n[/math] as the length of the path. There are [math]2^n[/math] paths of length [math]n[/math]. Moreover, we have

[[math]] \begin{align*} n&=p+q\\ x&=p-q \end{align*} [[/math]]

A path from the origin to an arbitrary point [math](n,x)[/math] exists only if [math]n[/math] and [math]x[/math] are of the form as in the definition. In this case the [math]p[/math] places for the positive [math]\epsilon_k[/math]'s can be chosen from the [math]n=p+q[/math] available places in

[[math]] N_{n,x}=\binom{p+q}{p}=\binom{p+q}{q} [[/math]]

different ways. {[math]Convention:[/math]} [math]N_{n,x}=0[/math] whenever [math]n[/math] and [math]x[/math] are not of the form as in the definition. this implies that, [math]N_{n,x}[/math] represents the number of different paths from the origin to an arbitrary point [math](n,x)[/math].

Example

[Ballot theorem] Suppose that in a ballot candidate [math]P[/math] scores [math]p[/math] votes and candidate [math]Q[/math] scores [math]q[/math] votes, where [math]p \gt q[/math]. the probability that throughout the counting there are always more votes for [math]P[/math] than for [math]Q[/math] equals [math]\frac{p-q}{p+q}[/math]. The whole voting record may be represented by a path of length [math]p+q[/math] in which [math]\epsilon_k=+1[/math], if the [math]k[/math]'th vote is for [math]P[/math] and [math]\epsilon_k=-1[/math] otherwise. Conversely every path from the origin to the point [math](p+q,p-q)[/math] can be interpreted as a voting with the given totals [math]p[/math] and [math]q[/math]. [math]S_k[/math] is the number of votes by which [math]P[/math] leads just after the [math]k[/math]'th vote. The candidate [math]P[/math] leads throughout the voting if [math]S_1 \gt 0,S_2 \gt 0,...,S_n \gt 0[/math] (in the ballot theorem, it is implicitly assumed that all paths are equally probable).

Example for the reflection principle

Let [math]A=(a,\alpha)[/math] and [math]B=(b,\beta)[/math] be two independent points with [math]b \gt a\geq 0[/math], [math]a,b\in\N[/math], [math]\alpha \gt 0[/math], [math]\beta \gt 0[/math], [math]\alpha,\beta\in\N[/math]. By reflection of [math]A[/math] on the [math]t[/math]-axis we mean the point [math]A'=(a,-\alpha)[/math].

Lemma (The reflection principle)

The number of paths from [math]A[/math] to [math]B[/math] which touch or cross the [math]t[/math]-axis equals the number of paths from [math]A'[/math] to [math]B[/math].


Show Proof

Consider a path [math](S_a=\alpha,S_{a+1},...,S_b=B)[/math] from [math]A[/math] to [math]B[/math] having one or more vertices on that axis. Let [math]t[/math] be the abscissa of the first such vertex that is [math]S_a \gt 0,S_{a+1} \gt 0,...,S_{k-1} \gt 0,S_k \gt 0[/math]. Then [math](-S_a,-S_{a+1},...,-S_{k-1},S_k,S_{k+1},...,S_b)[/math] is a path leading from [math]A'[/math] to [math]B[/math] and having [math]T=(t,0)[/math] as its vertex on the [math]t[/math]-axis. This gives a one to one correspondence between all paths from [math]A'[/math] to [math]B[/math] and paths from [math]A[/math] to [math]B[/math] that have a vertex on the [math]t[/math]-axis.

Let us now prove the ballot theorem. Let [math]n[/math] and [math]x[/math] be positive integers. There are exactly [math]\frac{x}{n}N_{n,x}[/math] paths [math](S_1,...,S_n=x)[/math] such that

[[math]] S_1 \gt 0,S_2 \gt 0,...,S_n \gt 0. [[/math]]

Indeed clearly there exists exactly as many admissible paths as there are paths from [math](1,1)[/math] to [math](n,x)[/math], which neither or cross the [math]t[/math]-axis. From the previous lemma the number of such paths equals

[[math]] N_{n-1,x-1}-N_{n-1,x+1}=\binom{p+q-1}{p-1}-\binom{p+q-1}{p}, [[/math]]

where we used that [math]N_{n,x}=\binom{p+q}{p}[/math]. So we get

[[math]] \begin{align*} \binom{p+q-1}{p-1}-\binom{p+q-1}{p}&=\frac{(p+q-1)!}{(p-1)!q!}-\frac{(p+q-1)!}{p!(q-1)!}\\ &=\frac{p}{p+q}\cdot\frac{(p+q)!}{p!q!}-\frac{q}{p+q}\cdot \frac{(p+q)!}{p!q!}\\ &=\frac{p-q}{p+q}N_{n,x}=\frac{x}{n}N_{n,x}. \end{align*} [[/math]]

Random Walk terminology

We set [math]S_0=0[/math], [math]S_n=X_1+...+X_n[/math], [math]X_j\in\{-1,+1\}[/math]. Our state space would be [math]\Omega_n[/math], containing all possible paths. Therefore we get that [math]\vert\Omega\vert=2^n[/math]. We set our [math]\sigma[/math]-Algebra [math]\A=\mathcal{P}(\Omega_n)[/math]. We set [math]\p[/math] to be our uniform probability measure. Consider the event [math]\{S_n=r\}[/math] (at time [math]n[/math] the particle is at the point [math]r[/math]). We shall also speak about a visit to [math]r[/math] at time [math]n[/math]. the number [math]N_{n,x}[/math] of paths from the origin to [math]N_{n,r}[/math] is given by

[[math]] \binom{n}{\frac{n+r}{2}}, [[/math]]

with [math]n=p+q[/math], [math]r=p-q[/math] and hence [math]p=\frac{n+r}{2}[/math]. Here we interpret [math]\binom{n}{\frac{n+r}{2}}[/math] as 0 if [math]n+r[/math] is not an even integer between 0 and [math]n[/math]. Hence we get that

[[math]] \p_{n,r}=\p[S_n=r]=\binom{n}{\frac{n+r}{2}}\cdot 2^n. [[/math]]

A return to the origin occurs at time [math]k[/math] if [math]S_k=0[/math]. Here [math]k[/math] is necessarily an even integer, that we note [math]k=2\nu[/math] with [math]\nu[/math] an integer. The probability of return to the origin is

[[math]] \p_{2\nu,0}. [[/math]]

We shall denote it by [math]U_{2\nu}[/math], which is now given by

[[math]] \boxed{U_{2\nu}=\p[S_{2\nu}=0]=\binom{2\nu}{\nu}\cdot 2^{-2\nu}} [[/math]]

Stirling's formula implies that [math]U_{2\nu}\sim\frac{1}{\sqrt{\pi r}}[/math]. Among the returns to the origin, the first return receives special attention. A first return occurs at [math]2\nu[/math] if

[[math]] S_1\not=0,S_2\not=0,...,S_{2\nu-1}\not=0,S_{2\nu}=0. [[/math]]

We denote the probability of this event by [math]f_{2\nu}[/math].

Lemma

For all [math]n\geq 0[/math] we have

[[math]] \p[S_1\not=0,S_2\not=0,...,S_{2n-1}\not=0,S_{2n}\not=0]=\p[S_{2n}=0]=U_{2n}. [[/math]]

When the event on the left hand side occurs, either all [math]S_j \gt 0[/math] or all [math]S_j \lt 0[/math]. Since these events are equally probable it follows that

[[math]] \p[S_1 \gt 0,...,S_{2n} \gt 0]=\frac{1}{2}U_{2n}. [[/math]]


Show Proof

We have

[[math]] \p[S_1 \gt 0,...,S_{2n} \gt 0]=\sum_{r=1}^\infty\p[S_1 \gt 0,...,S_{2n}=2r], [[/math]]
where all the terms with [math]r \gt n[/math] are zero. By the ballot theorem, the number of paths [math][S_1 \gt 0,...,S_{2n-1} \gt 0,S_{2n}=2r][/math] is equal to [math]N_{2n-1,2r+1}[/math] and thus

[[math]] \begin{align*} \p[S_1 \gt 0,...,S_{2n-1} \gt 0,S_{2n}=2r]&=\frac{1}{2}\left(\p_{2n-1,2r-1}-\p_{2n-1,2r+1}\right)\\ \p[S_1 \gt 0,S_2 \gt 0,...,S_{2n} \gt 0]&=\frac{1}{2}\sum_{r=1}^n\left(\p_{2n-1,2r-1}-\p_{2n-1,2r+1}\right) \end{align*} [[/math]]

[[math]] \frac{1}{2}\left(\p_{2n-1,1}-\p_{2n-1,3}+\p_{2n-1,3}-\p_{2n-1,5}\pm...+\p_{2n-1,2n-1}-\underbrace{\p_{2n-1,2n+1}}_{=0}\right) [[/math]]

[[math]] =\frac{1}{2}\p_{2n-1,1}=\frac{1}{2}U_{2n}. [[/math]]
Therefore we get

[[math]] \p_{2n-1,1}=\p[S_{2n-1}=1]=\binom{2n-1}{\frac{2n-1+1}{2}}2^{-(2n-1)}=2^{-2n}\cdot 2\cdot \frac{(2n-1)!}{n!(n-1)!}=2^{2n}\binom{2n}{2}=U_{2n} [[/math]]

Saying that the first return to the origin occurs at [math]2^n[/math] amounts [math]S_1\not=0,S_2\not=0,...,S_{2n-1}\not=0,S_{2n}=0[/math], we have

[[math]] \p[S_1\not=0,S_2\not=0,...,S_{2n-1}\not=0,S_{2n}=0]=f_{2n}, [[/math]]

and

[[math]] \{S_1\not=0,S_2\not=0,...,S_{2n-1}\not=0\}=\{S_1\not=0,...,S_{2n-1}\not=0,S_{2n}\not=0\}\cup\{S_1\not=0,...,S_{2n-1}\not=0,S_{2n}=0\}, [[/math]]

which implies that

[[math]] U_{2n-2}=U_{2n}+f_{2n}, [[/math]]

and hence for all [math]n\geq 1[/math] we have

[[math]] f_{2n}=U_{2n-2}-U_{2n}. [[/math]]

Therefore we get the relation

[[math]] \boxed{f_{2n}=\frac{1}{2n-1}U_{2n}} [[/math]]

Theorem

The probability that up to time [math]2n[/math], the last visit to the origin occurs at [math]2k[/math] is

[[math]] \alpha_{2k,2n}=U_{2k}N_{2n-2k},k=0,1,...,n [[/math]]


Show Proof

We are concerned with paths satisfying [math]S_{2k}=0,S_{2k+1}\not=0,...,S_{2n}\not=0[/math]. The first [math]2k[/math] vertices of such paths can be chosen in [math]2^{2k}N_{2k}[/math] different ways. Taking the point [math](2k,0)[/math] as new origin and using the last lemma, we see that the next [math](2n-k)[/math] vertices can be chosen in [math]2^{2n-2k}N_{2n-2k}[/math] different ways. Therefore we get

[[math]] \alpha_{2k,2n}=\frac{1}{2^{2n}}\left(2^{2k}N_{2k}2^{2n-2k}N_{2n-2k}\right)=U_{2k}N_{2n-2k}. [[/math]]

General references

Moshayedi, Nima (2020). "Lectures on Probability Theory". arXiv:2010.16280 [math.PR].