Lecture 24: Gamma Distribution and Poisson Process

Harvard Statistics 110 (Joe Blitzstein)
Watch on YouTube

1. Extending the Factorial

The lecture opens with a puzzle: what comes next in the sequence $0, 1, 2, \ldots$? Different people guess different things — $3$, $4$, $720$, $42/e$ — and that is the point. A "next number" puzzle has no well-defined answer without a complexity measure that picks the simplest continuation, and two people rarely agree on what is simplest.

The continuation Blitzstein had in mind: $0, 1, 2, 6, 24, 120, 720, \ldots$ — that is, $0, 1!, 2!, 3!, 4!, 5!, 6!, \ldots$ (zero with no factorials, one with one factorial, and so on). The broader goal is to interpolate the factorial function: it is defined only on non-negative integers, so how do we "connect the dots" and ask, say, what $\pi!$ should mean?

Stirling's formula

Before interpolating, it helps to know how fast $n!$ grows. Stirling's formula gives an approximation everyone should know:

$$n! \approx \sqrt{2\pi n}\,\left(\frac{n}{e}\right)^{n}$$

This is an excellent approximation even for modest $n$ (around $12$ or $20$). More than that, the ratio of the two sides converges to $1$ as $n \to \infty$. The dominant growth comes from the $(n/e)^n$ term: roughly $n^n$ behavior, discounted by $e^n$. (The formula can also be proved using probability, via a Poisson interpretation — not done in this lecture.)

The interpolation problem

There are infinitely many smooth curves through the points $(n, n!)$. The question is whether one is canonical. There is:

Key idea

The gamma function is the standard, most widely applicable extension of the factorial to the positive reals. It is the function we need to define the gamma distribution.

· · ·

2. The Gamma Function

Definition — Gamma function

For any real $a > 0$,

$$\Gamma(a) = \int_0^{\infty} x^{a}\, e^{-x}\, \frac{dx}{x}.$$

The factor $dx/x$ means the integrand is $x^{a-1} e^{-x}$; the two forms are identical. Keeping one $x$ "on hold" in the $dx/x$ is a notational convenience that pays off repeatedly below.

Why $a > 0$ is required

The integral exists only for real $a > 0$. Two limits matter:

Near $x = 0$: $e^{-x} \approx 1$, so it contributes nothing. The integrand behaves like $x^{a-1}$. If $a = 0$, this is $1/x$, whose integral is $\log x$, and $\log 0 = -\infty$ — the integral diverges. For $a > 0$, $x^{a-1}$ integrates cleanly near zero and converges.
Near $x = \infty$: $e^{-x}$ (exponential decay) dominates any power $x^{a-1}$, even for huge $a$. So the tail converges.

Key properties

Property	Statement
Factorial link	$\Gamma(n) = (n-1)!$ for positive integer $n$
Recursion	$\Gamma(x+1) = x\,\Gamma(x)$
Base case	$\Gamma(1) = 1$
Half-integer	$\Gamma(1/2) = \sqrt{\pi}$

The offset in $\Gamma(n) = (n-1)!$ is a historical convention, nothing deeper.

The recursion is the heart of the function. Check it against the factorials:

$\Gamma(1) = 1$ (the integral of $e^{-x}$ from $0$ to $\infty$ is $1$, seen several times before).
$\Gamma(2) = 1 \cdot \Gamma(1) = 1 = 1!$.
$\Gamma(3) = 2 \cdot \Gamma(2) = 2 = 2!$.
$\ldots$ and so on.

This is exactly the recursion the factorials obey, with the same starting point, so the two sequences agree forever. The recursion itself follows from a single integration by parts on the defining integral — the calculation reproduces the gamma function again and hands back the identity for free.

The half-integer value and the normal

$\Gamma(1/2) = \sqrt{\pi}$ connects to the normalizing constant of the normal distribution, where $\sqrt{2\pi}$ appears. The link is a substitution. Take the (un-normalized) normal integrand $e^{-x^2/2}$ and substitute $u = x^2/2$, so $du = x\,dx$. There is no spare $x$ in the integrand, so $x$ must be re-expressed as roughly $\sqrt{u}$, and the integral reduces to gamma-function form with $a = 1/2$. So either:

knowing $\Gamma(1/2) = \sqrt{\pi}$ gives another route to the normal's normalizing constant, or
knowing the normal's normalizing constant (from the standard "write it down twice" trick) proves $\Gamma(1/2) = \sqrt{\pi}$.

From $\Gamma(1/2)$, the recursion generates all half-integers: $\Gamma(3/2) = \tfrac{1}{2}\Gamma(1/2) = \tfrac{\sqrt{\pi}}{2}$, then $\Gamma(5/2)$, $\Gamma(7/2)$, and so on.

· · ·

3. From Function to Distribution

The gamma function immediately suggests a probability density. The defining integral equals $\Gamma(a)$; divide both sides by $\Gamma(a)$ and the integrand integrates to $1$. Since it is also non-negative, it is a valid PDF:

$$1 = \int_0^{\infty} \frac{1}{\Gamma(a)}\, x^{a}\, e^{-x}\, \frac{dx}{x}.$$

Definition — Gamma$(a, 1)$ density

$$f(x) = \frac{1}{\Gamma(a)}\, x^{a}\, e^{-x}\, \frac{1}{x}, \qquad x > 0.$$

Writing the trailing $1/x$ (rather than folding it into $x^{a-1}$) avoids the common slip of confusing whether the exponent should be $a$ or $a-1$.

This construction is "a cute math trick," not yet a probabilistic story. By itself, normalizing a famous function does not explain why this distribution matters. The real motivation comes from its connection to the exponential distribution.

Adding the rate parameter

The two-parameter family Gamma$(a, \lambda)$ comes from rescaling, exactly as a general Exponential$(\lambda)$ comes from an Exponential$(1)$. Here $\lambda$ is a rate; $1/\lambda$ is a scale.

Let $X \sim$ Gamma$(a, 1)$ and define $Y = X/\lambda$ (equivalently $X = \lambda Y$). By the change-of-variables formula, $f_Y(y) = f_X(x)\,\bigl|\tfrac{dx}{dy}\bigr|$ with $\tfrac{dx}{dy} = \lambda$. Substituting $x = \lambda y$ into the Gamma$(a, 1)$ density:

$$f_Y(y) = \frac{1}{\Gamma(a)}\,(\lambda y)^{a}\, e^{-\lambda y}\, \frac{1}{\lambda y}\,\lambda.$$

The two factors of $\lambda$ from $1/(\lambda y)$ and $dx/dy$ cancel one each, giving the Gamma$(a, \lambda)$ density on $y > 0$. Like the exponential, the gamma is a continuous distribution on the positive reals.

· · ·

4. The Poisson Process

The substantive story behind the gamma is the Poisson process, the standard model for arrivals over time — emails landing in an inbox, phone calls, radioactive decays. Picture a timeline with an $\times$ marked at each arrival. Let $N_t$ = number of arrivals up to time $t$.

× — × —— × — × ———→

arrivals on a timeline; gaps between $\times$'s are the inter-arrival times

Definition — Poisson process of rate $\lambda$

Two assumptions, and only two:

Counts are Poisson. The number of arrivals in any time interval of length $t$ is Poisson$(\lambda t)$. (Stated for $[0, t]$, but it holds for any interval of that length.)
Disjoint intervals are independent. The counts in non-overlapping time intervals are independent.

Inter-arrival times are exponential

A few weeks earlier the class established the link to the exponential — before the exponential had even been defined. Let $T_1$ be the time of the first arrival. Then $T_1 \sim$ Exponential$(\lambda)$, shown by a one-line calculation:

$$P(T_1 > t) = P(N_t = 0) = e^{-\lambda t}$$

No arrival by time $t$ is the same event as $N_t = 0$, and the Poisson$(\lambda t)$ PMF at $0$ is $e^{-\lambda t}$. That tail probability is exactly $1$ minus the Exponential$(\lambda)$ CDF, so $T_1 \sim$ Exponential$(\lambda)$.

The same argument repeats for every gap. By the memoryless property of the exponential, once the first arrival happens, waiting for the second is a fresh copy of the same problem, independent of the first. So every gap between consecutive arrivals is Exponential$(\lambda)$, and the gaps are independent. These gaps are the inter-arrival times.

Equivalent definition

A Poisson process of rate $\lambda$ is equivalently a process whose inter-arrival times are i.i.d. Exponential$(\lambda)$.

Arrival times are gamma

Let $T_n$ be the actual time of the $n$-th arrival (not a gap, but the cumulative time on the line). Summing the gaps:

$$T_n = X_1 + X_2 + \cdots + X_n, \qquad X_j \stackrel{\text{i.i.d.}}{\sim} \text{Exponential}(\lambda).$$

For integer $n$, this sum is Gamma$(n, \lambda)$. This is the central story of the gamma distribution. (The parameter $a$ in the density need not be an integer in general, but the arrival-time interpretation requires an integer $n$.)

The discrete-time analogy makes it memorable:

Discrete time	Continuous time
Geometric — wait for $1$ success	Exponential — wait for $1$ arrival
Negative Binomial — wait for $n$ successes	Gamma — wait for $n$ arrivals

A Negative Binomial is a sum of i.i.d. Geometrics; a Gamma$(n, \lambda)$ is a sum of $n$ i.i.d. Exponentials. "Success" here just means an arrival.

· · ·

5. Proof: Sum of Exponentials is Gamma

The claim "sum of $n$ i.i.d. exponentials is Gamma$(n, \lambda)$" is asserted, not yet proved. Two routes are available:

Convolution. Convolve $X_1$ with $X_2$, then that with $X_3$, and so on. It is workable (terms cancel) and a pattern emerges that could be finished by induction — but it gets tedious.
MGF (used here). Since the $X_j$ are i.i.d. and the exponential's MGF is known, multiply MGFs. Much cleaner.

It suffices to prove the $\lambda = 1$ case; the general $\lambda$ follows by rescaling. The exponential MGF (derived earlier in the course, where it gave all the moments at once via the geometric series) is:

$$M_{X_j}(t) = \frac{1}{1 - t}, \qquad t < 1.$$

Since $T_n$ is a sum of $n$ independent $X_j$ with identical MGFs, the MGF of the sum is the product:

$$M_{T_n}(t) = \left(\frac{1}{1 - t}\right)^{n} = \frac{1}{(1 - t)^{n}}, \qquad t < 1.$$

That was almost no work. Now confirm this is the MGF of a Gamma$(n, 1)$. Let $Y \sim$ Gamma$(n, 1)$ (fresh notation, to avoid circular reasoning) and compute its MGF by LOTUS:

$$M_Y(t) = E\!\left(e^{tY}\right) = \frac{1}{\Gamma(n)}\int_0^{\infty} e^{ty}\, y^{n}\, e^{-y}\, \frac{dy}{y}.$$

Collect the two exponentials, $e^{ty} e^{-y} = e^{-(1-t)y}$:

$$M_Y(t) = \frac{1}{\Gamma(n)}\int_0^{\infty} y^{n}\, e^{-(1-t)y}\, \frac{dy}{y}.$$

This is almost the gamma integral, just with an extra constant in the exponent. The skill the course actually rewards is pattern recognition, not heavy calculus: substitute to turn the exponent back into $e^{-x}$. Let $x = (1-t)y$, so $dx = (1-t)\,dy$. Since $t < 1$, the factor $1 - t$ is positive and the limits stay $0$ to $\infty$. Then $y = x/(1-t)$, so $y^{n}$ contributes $(1-t)^{-n}$, and the convenient $dy/y$ equals $dx/x$ (the substitution factor cancels):

$$M_Y(t) = \frac{(1-t)^{-n}}{\Gamma(n)}\int_0^{\infty} x^{n}\, e^{-x}\, \frac{dx}{x} = (1-t)^{-n}\,\frac{\Gamma(n)}{\Gamma(n)} = \frac{1}{(1 - t)^{n}}.$$

Conclusion

The MGFs match. Since an MGF determines the distribution (no "masquerading" distributions share an MGF without being equal), $T_n \sim$ Gamma$(n, 1)$, and by rescaling the sum of $n$ i.i.d. Exponential$(\lambda)$ is Gamma$(n, \lambda)$.

A bonus: nothing in this computation actually used that $n$ is an integer. So $1/(1-t)^{a}$ is the correct MGF of Gamma$(a, 1)$ for any positive real $a$ — the integer assumption is only needed for the sum-of-exponentials interpretation.

· · ·

6. Moments of the Gamma

The MGF could deliver the moments, but LOTUS is even quicker. Let $X \sim$ Gamma$(a, 1)$ and find $E(X^c)$ for an arbitrary constant $c$ (not necessarily an integer; existence must be checked):

$$E(X^c) = \frac{1}{\Gamma(a)}\int_0^{\infty} x^{c}\, x^{a}\, e^{-x}\, \frac{dx}{x} = \frac{1}{\Gamma(a)}\int_0^{\infty} x^{a+c}\, e^{-x}\, \frac{dx}{x}.$$

The remaining integral is again a gamma integral — it equals $\Gamma(a+c)$. (Equivalently, multiply and divide by $\Gamma(a+c)$ and recognize the Gamma$(a+c, 1)$ density, which integrates to $1$.) No real calculus needed; it is algebra plus recognition:

$$E(X^c) = \frac{\Gamma(a+c)}{\Gamma(a)}, \qquad \text{valid when } a + c > 0.$$

The constraint $a + c > 0$ is required because the gamma function is defined only on positive arguments; otherwise the moment does not exist.

Mean and variance

Set $c = 1$ and $c = 2$ and apply the recursion $\Gamma(a+1) = a\,\Gamma(a)$:

Mean: $E(X) = \dfrac{\Gamma(a+1)}{\Gamma(a)} = \dfrac{a\,\Gamma(a)}{\Gamma(a)} = a$.
Second moment: $E(X^2) = \dfrac{\Gamma(a+2)}{\Gamma(a)}$. Since $\Gamma(a+2) = (a+1)\,\Gamma(a+1) = (a+1)\,a\,\Gamma(a)$, this is $a^2 + a$.
Variance: $\operatorname{Var}(X) = E(X^2) - (E(X))^2 = (a^2 + a) - a^2 = a$.

Key result

A Gamma$(a, 1)$ has mean $a$ and variance $a$. (Independent check in the integer case: an Exponential$(1)$ has mean $1$, and summing $n$ of them gives mean $n$ by linearity.)

The coincidence that mean equals variance here is not a deep gamma property — it is an artifact of setting $\lambda = 1$. (The Poisson is the distribution where mean always equals variance.) Restore the rate by $Y = X/\lambda$:

$$E(Y) = \frac{a}{\lambda}, \qquad \operatorname{Var}(Y) = \frac{a}{\lambda^2}.$$

The variance picks up $\lambda^2$ because multiplying by a constant pulls the constant out squared in the variance. So Gamma$(a, \lambda)$ has mean $a/\lambda$ and variance $a/\lambda^2$. Doing the algebra at $\lambda = 1$ first and restoring $\lambda$ at the end keeps the work uncluttered.

· · ·

7. Where the Gamma Sits

The gamma is a hub among the named distributions. The course has now covered all the discrete distributions it needs and nearly all the continuous ones; the remaining continuous distributions are essentially variations built from the gamma and the normal. The gamma relates to:

the exponential (Gamma$(1, \lambda)$ is Exponential$(\lambda)$; a sum of exponentials is gamma),
the Poisson (via the Poisson process and the mean-equals-variance echo),
the normal (the half-integer values; deeper connections come next lecture), and
the beta (closely linked; covered next lecture).

Because the gamma and normal together generate the rest, once both are in hand the remaining distributions follow.