All the discrete distributions the course needs for the semester are now in hand, but only two continuous distributions have appeared so far: the Uniform and the Normal. The exponential distribution is the next one to add, and it is one of the most important distributions overall.
It has shown up informally already — e.g., modeling the wait for a book to be released, and as an example of the universality of the uniform — but this is its formal introduction.
The exponential distribution has a single parameter, $\lambda$, traditionally called the rate parameter. Intuitively, $\lambda$ is the rate at which some type of event occurs. The notation is $X \sim \text{Expo}(\lambda)$.
The probability density function is:
It is a continuous, strictly positive random variable: an exponentially decaying function with a couple of constants in place. The name comes directly from the exponential function in the density.
Integrating the density over its support gives $1$:
This is an easy exponential integral, confirming the density is valid.
Integrating the PDF gives the cumulative distribution function. Starting the integral at $0$ (since the variable is positive):
Checks that this is a valid CDF:
Before computing the mean and variance, it helps to standardize, much as any normal reduces to a standard normal. The normal has two parameters ($\mu$ and $\sigma^2$) and standardizing subtracts the mean and divides by the standard deviation. The exponential has only one parameter, so standardization is simpler: just rescale.
If $Y = \lambda X$, then $Y \sim \text{Expo}(1)$.
This is worth doing once, because afterward many calculations can be done with the cleaner $\text{Expo}(1)$ and $\lambda$ reinserted only when needed.
Plugging $y / \lambda$ into the exponential CDF:
$$F_X\!\left(\tfrac{y}{\lambda}\right) = 1 - e^{-\lambda \cdot (y / \lambda)} = 1 - e^{-y}.$$
The $\lambda$'s cancel, and $1 - e^{-y}$ is exactly the CDF of an $\text{Expo}(1)$ random variable. One short equation proves the claim.
Work first with $Y \sim \text{Expo}(1)$, then transform back.
By definition:
This is a standard integration by parts (the kind seen in AP Calculus). Let $u = y$ and $dv = e^{-y}\, dy$, so $du = dy$ and $v = -e^{-y}$. Then $E(Y) = [uv] - \int v\, du$:
Therefore $E(Y) = 1$.
Recognizing an integrand as a PDF (so its integral is automatically $1$) is a recurring trick in this course, letting you skip the calculus.
Use $\operatorname{Var}(Y) = E(Y^2) - \big(E(Y)\big)^2$. By LOTUS, the second moment is:
($E(Y)$ is the first moment; $E(Y^2)$ is the second moment.) This integration by parts with $u = y^2$ lowers the power from $2$ to $1$, reducing it to the integral already done. Carrying it out gives $E(Y^2) = 2$, so:
(Better methods for getting moments without repeated integration by parts come later in the course.)
With $Y = \lambda X$, we have $X = Y / \lambda$. Pulling out the constant $1/\lambda$:
| Quantity | Result | Why |
|---|---|---|
| $E(X)$ | $\dfrac{1}{\lambda}$ | $\tfrac{1}{\lambda}\, E(Y) = \tfrac{1}{\lambda}$ |
| $\operatorname{Var}(X)$ | $\dfrac{1}{\lambda^2}$ | $\tfrac{1}{\lambda^2}\operatorname{Var}(Y)$; constants leave variance squared |
The mean and variance are known, but they do not yet explain why the exponential is so important. The most important reason is the memoryless property. (It was seen in the discrete case — the geometric — in a practice midterm problem; here is the continuous version.)
Think of $X$ as a waiting time: you are waiting for something to happen — say, a phone call that can arrive at any time in continuous time. (The geometric is the discrete analogue: waiting for a success across discrete Bernoulli trials.) Memorylessness says that no matter how long you have already waited, the elapsed time gives you no progress toward the event — it is as if you start fresh each moment.
A random variable $X$ is memoryless if, for all $s, t \ge 0$:
$$P(X \ge s + t \mid X \ge s) = P(X \ge t).$$
Here $s$ is the time already waited with no event yet; the equation says the chance of having to wait at least $t$ more is the same as the original chance of waiting at least $t$ — you have restarted with a fresh exponential of the same parameter.
To prove the exponential is memoryless, first compute $P(X \ge s)$. Because $X$ is continuous, strict vs. non-strict inequalities do not matter, so this is just $1 - F(x)$:
This is called the survival function: thinking of $X$ as a lifetime, it is the probability of living longer than $s$. Survival functions are central in survival analysis in biostatistics and in engineering reliability (e.g., how long a component will last).
By the definition of conditional probability:
The numerator's second condition is redundant: if $X \ge s + t$ (with $s, t \ge 0$), then automatically $X \ge s$. So the numerator is just $P(X \ge s + t)$, and the whole expression is a ratio of survival functions:
$$\frac{e^{-\lambda (s + t)}}{e^{-\lambda s}} = e^{-\lambda t} = P(X \ge t).$$
This holds for all $s, t \ge 0$, so the exponential is memoryless.
One might expect many memoryless distributions, but the exponential is the only continuous one — memorylessness completely characterizes it. (Proof deferred to next time.)
A useful consequence of memorylessness. For $X \sim \text{Expo}(\lambda)$, what is the expected value of $X$ given $X > a$?
Conditional expectation is defined just like ordinary expectation but using conditional probability in place of probability. Write, by linearity:
Given $X > a$, the surplus $X - a$ is the additional waiting time. By memorylessness, once you have waited $a$, you start over — so $X - a$ is a fresh $\text{Expo}(\lambda)$. Its expectation is therefore $1/\lambda$, giving:
$$E(X \mid X > a) = a + \frac{1}{\lambda}.$$
The memoryless property delivers this immediately; otherwise it would require setting up and evaluating an integral.