Lecture 16: Exponential Distribution

Harvard Statistics 110 (Joe Blitzstein)
Watch on YouTube

1. Setting the Stage

All the discrete distributions the course needs for the semester are now in hand, but only two continuous distributions have appeared so far: the Uniform and the Normal. The exponential distribution is the next one to add, and it is one of the most important distributions overall.

It has shown up informally already — e.g., modeling the wait for a book to be released, and as an example of the universality of the uniform — but this is its formal introduction.

· · ·

2. Definition: PDF and CDF

The exponential distribution has a single parameter, $\lambda$, traditionally called the rate parameter. Intuitively, $\lambda$ is the rate at which some type of event occurs. The notation is $X \sim \text{Expo}(\lambda)$.

PDF

The probability density function is:

$$f(x) = \lambda e^{-\lambda x} \quad \text{for } x > 0, \qquad f(x) = 0 \text{ otherwise.}$$

It is a continuous, strictly positive random variable: an exponentially decaying function with a couple of constants in place. The name comes directly from the exponential function in the density.

It is a valid PDF

Integrating the density over its support gives $1$:

$$\int_0^{\infty} \lambda e^{-\lambda x}\, dx = 1.$$

This is an easy exponential integral, confirming the density is valid.

CDF

Integrating the PDF gives the cumulative distribution function. Starting the integral at $0$ (since the variable is positive):

$$F(x) = \int_0^{x} \lambda e^{-\lambda t}\, dt = 1 - e^{-\lambda x} \quad \text{for } x > 0.$$

Checks that this is a valid CDF:

Differentiating brings $\lambda$ back down and recovers the PDF.
It is increasing.
As $x \to \infty$, $e^{-\lambda x}$ vanishes and $F(x) \to 1$.
As $x \to 0$, $F(x) \to 0$ (and stays $0$ for $x < 0$).
It is continuous.

· · ·

3. Standardization to Exponential(1)

Before computing the mean and variance, it helps to standardize, much as any normal reduces to a standard normal. The normal has two parameters ($\mu$ and $\sigma^2$) and standardizing subtracts the mean and divides by the standard deviation. The exponential has only one parameter, so standardization is simpler: just rescale.

Claim

If $Y = \lambda X$, then $Y \sim \text{Expo}(1)$.

This is worth doing once, because afterward many calculations can be done with the cleaner $\text{Expo}(1)$ and $\lambda$ reinserted only when needed.

Proof via the CDF

$$P(Y \le y) = P(\lambda X \le y) = P\!\left(X \le \tfrac{y}{\lambda}\right) = F_X\!\left(\tfrac{y}{\lambda}\right)$$

Plugging $y / \lambda$ into the exponential CDF:

$$F_X\!\left(\tfrac{y}{\lambda}\right) = 1 - e^{-\lambda \cdot (y / \lambda)} = 1 - e^{-y}.$$

The $\lambda$'s cancel, and $1 - e^{-y}$ is exactly the CDF of an $\text{Expo}(1)$ random variable. One short equation proves the claim.

· · ·

4. Mean and Variance

Work first with $Y \sim \text{Expo}(1)$, then transform back.

Mean of Expo(1)

By definition:

$$E(Y) = \int_0^{\infty} y\, e^{-y}\, dy.$$

This is a standard integration by parts (the kind seen in AP Calculus). Let $u = y$ and $dv = e^{-y}\, dy$, so $du = dy$ and $v = -e^{-y}$. Then $E(Y) = [uv] - \int v\, du$:

$$E(Y) = \Big[-y\, e^{-y}\Big]_0^{\infty} + \int_0^{\infty} e^{-y}\, dy.$$

The boundary term is $0$: at $y = 0$ it is $0$, and as $y \to \infty$, $e^{-y}$ decays exponentially while $y$ only grows linearly, so the product goes to $0$.
The remaining integral equals $1$, either as an easy exponential integral or — more elegantly — by recognizing it as the integral of the $\text{Expo}(1)$ PDF, which must equal $1$.

Therefore $E(Y) = 1$.

Recurring shortcut

Recognizing an integrand as a PDF (so its integral is automatically $1$) is a recurring trick in this course, letting you skip the calculus.

Variance of Expo(1)

Use $\operatorname{Var}(Y) = E(Y^2) - \big(E(Y)\big)^2$. By LOTUS, the second moment is:

$$E(Y^2) = \int_0^{\infty} y^2\, e^{-y}\, dy.$$

($E(Y)$ is the first moment; $E(Y^2)$ is the second moment.) This integration by parts with $u = y^2$ lowers the power from $2$ to $1$, reducing it to the integral already done. Carrying it out gives $E(Y^2) = 2$, so:

$$\operatorname{Var}(Y) = 2 - 1^2 = 1.$$

(Better methods for getting moments without repeated integration by parts come later in the course.)

General Expo(lambda)

With $Y = \lambda X$, we have $X = Y / \lambda$. Pulling out the constant $1/\lambda$:

Quantity	Result	Why
$E(X)$	$\dfrac{1}{\lambda}$	$\tfrac{1}{\lambda}\, E(Y) = \tfrac{1}{\lambda}$
$\operatorname{Var}(X)$	$\dfrac{1}{\lambda^2}$	$\tfrac{1}{\lambda^2}\operatorname{Var}(Y)$; constants leave variance squared

· · ·

5. The Memoryless Property

The mean and variance are known, but they do not yet explain why the exponential is so important. The most important reason is the memoryless property. (It was seen in the discrete case — the geometric — in a practice midterm problem; here is the continuous version.)

Intuition

Think of $X$ as a waiting time: you are waiting for something to happen — say, a phone call that can arrive at any time in continuous time. (The geometric is the discrete analogue: waiting for a success across discrete Bernoulli trials.) Memorylessness says that no matter how long you have already waited, the elapsed time gives you no progress toward the event — it is as if you start fresh each moment.

Definition — Memoryless property

A random variable $X$ is memoryless if, for all $s, t \ge 0$:

$$P(X \ge s + t \mid X \ge s) = P(X \ge t).$$

Here $s$ is the time already waited with no event yet; the equation says the chance of having to wait at least $t$ more is the same as the original chance of waiting at least $t$ — you have restarted with a fresh exponential of the same parameter.

The survival function

To prove the exponential is memoryless, first compute $P(X \ge s)$. Because $X$ is continuous, strict vs. non-strict inequalities do not matter, so this is just $1 - F(x)$:

$$P(X \ge s) = 1 - \left(1 - e^{-\lambda s}\right) = e^{-\lambda s}.$$

This is called the survival function: thinking of $X$ as a lifetime, it is the probability of living longer than $s$. Survival functions are central in survival analysis in biostatistics and in engineering reliability (e.g., how long a component will last).

Proof of memorylessness

By the definition of conditional probability:

$$P(X \ge s + t \mid X \ge s) = \frac{P(X \ge s + t \text{ and } X \ge s)}{P(X \ge s)}.$$

The numerator's second condition is redundant: if $X \ge s + t$ (with $s, t \ge 0$), then automatically $X \ge s$. So the numerator is just $P(X \ge s + t)$, and the whole expression is a ratio of survival functions:

$$\frac{e^{-\lambda (s + t)}}{e^{-\lambda s}} = e^{-\lambda t} = P(X \ge t).$$

This holds for all $s, t \ge 0$, so the exponential is memoryless.

Uniqueness

One might expect many memoryless distributions, but the exponential is the only continuous one — memorylessness completely characterizes it. (Proof deferred to next time.)

· · ·

6. Corollary: Conditional Expectation $E(X \mid X > a)$

A useful consequence of memorylessness. For $X \sim \text{Expo}(\lambda)$, what is the expected value of $X$ given $X > a$?

Conditional expectation is defined just like ordinary expectation but using conditional probability in place of probability. Write, by linearity:

$$E(X \mid X > a) = a + E(X - a \mid X > a).$$

Given $X > a$, the surplus $X - a$ is the additional waiting time. By memorylessness, once you have waited $a$, you start over — so $X - a$ is a fresh $\text{Expo}(\lambda)$. Its expectation is therefore $1/\lambda$, giving:

Result

$$E(X \mid X > a) = a + \frac{1}{\lambda}.$$

The memoryless property delivers this immediately; otherwise it would require setting up and evaluating an integral.