Lecture 11: The Poisson Distribution

Harvard Statistics 110 (Joe Blitzstein)
Watch on YouTube

1. Sympathetic Magic: Confusing a Random Variable with Its Distribution

The single most common and most fundamental mistake in probability is confusing a random variable with its distribution. Blitzstein calls this "sympathetic magic." It is subtle at first, so it takes repeated practice to internalize.

How the mistake shows up

A typical instance: given a sum of two random variables $X + Y$, blindly writing the PMF of the sum as the sum of the PMFs. Adding random variables is not the same as adding PMFs.

Why $P(X = x) + P(Y = y)$ cannot be the PMF of $X + Y$:

It can exceed 1. Adding two probabilities gives no guarantee the result stays $\le 1$.
It has the wrong arguments. It is a function of $x$ and $y$. The PMF of $X + Y$ must be a function of a single value $t$, namely $P(X + Y = t)$.

The same error appears with transformations: cubing a random variable $X$ to get $X^3$ does not mean cubing its distribution or its PMF. You cannot operate on the distribution as a stand-in for operating on the random variable.

We do have legitimate machinery for sums of random variables (e.g., the sum of independent Binomials with the same $p$ is Binomial, proved several ways; handling $X + Y$ by conditioning on $X$), revisited in depth later in the course. None of those methods adds PMFs.

Two analogies

The map is not the territory

A famous saying in semantics: the word is not the thing. Nobody lays a map on the floor and walks around on it expecting to explore the terrain. But because random variables and distributions are abstract and mathematical, people make exactly this mistake constantly.

House and blueprint

A random variable is a (random) house; its distribution is the blueprint. You do not live inside the blueprint. The analogy is especially apt because one blueprint can build many houses: many random variables can share the same distribution. They might be IID (independent, identically distributed) or dependent but identically distributed. The blueprint specifies all the probabilities for the random choices (e.g., a blue door with some probability, a red door otherwise); each built house is one realized random variable.

· · ·

2. The Poisson Distribution: PMF and Mean

The Poisson is the last of the famous discrete distributions needed for the semester, and arguably the single most important discrete distribution in all of statistics. It is named after Siméon Denis Poisson, a French mathematician who was among the first to work with it, in the 1830s.

Definition

Poisson PMF

A random variable $X$ is Poisson with parameter $\lambda$, written $X \sim \text{Pois}(\lambda)$, if it takes non-negative integer values with

$$P(X = k) = \frac{e^{-\lambda}\,\lambda^k}{k!}, \qquad k = 0, 1, 2, \ldots$$

and $0$ otherwise. The parameter $\lambda$ is any positive real number, called the rate parameter.

Unlike the Binomial, which is bounded between $0$ and $n$, the Poisson is unbounded: it can take any non-negative integer value.

Valid PMF

The terms are non-negative, so the only thing to check is that they sum to $1$. Pulling the constant $e^{-\lambda}$ out of the sum:

$$\sum_{k=0}^{\infty} \frac{e^{-\lambda}\,\lambda^k}{k!} = e^{-\lambda} \sum_{k=0}^{\infty} \frac{\lambda^k}{k!} = e^{-\lambda}\, e^{\lambda} = 1$$

The remaining sum is exactly the Taylor series for $e^{\lambda}$. In effect, the construction is: take the Taylor series for $e^{\lambda}$ and divide by $e^{\lambda}$ so the terms add to $1$.

Mean

$$E(X) = \lambda \quad \text{for } X \sim \text{Pois}(\lambda)$$

By definition $E(X)$ is the sum of value times probability:

$$E(X) = \sum_{k=0}^{\infty} k \cdot \frac{e^{-\lambda}\,\lambda^k}{k!}$$

The $k = 0$ term is $0$, so start at $k = 1$. Then $\dfrac{k}{k!} = \dfrac{1}{(k-1)!}$, leaving

$$E(X) = e^{-\lambda} \sum_{k=1}^{\infty} \frac{\lambda^k}{(k-1)!}.$$

Factor out one $\lambda$ so the exponent matches the factorial index (substitute $j = k - 1$):

$$E(X) = \lambda\, e^{-\lambda} \sum_{j=0}^{\infty} \frac{\lambda^j}{j!} = \lambda\, e^{-\lambda}\, e^{\lambda} = \lambda.$$

Worth remembering

Memorizing formulas is not emphasized in this course, but this one is useful and easy: the mean of a Poisson is its rate parameter $\lambda$.

· · ·

3. Why the Poisson Matters: The Poisson Paradigm

In practice the Poisson is the single most widely used distribution for modeling discrete (count) data in the real world. The general setting is counting the number of "successes" when:

there are a large number of trials (many things that could each succeed or fail), and
the probability of success for each one is small.

"Success" is in quotes because, as with the Binomial, it can be defined very generally. Example: $10{,}000$ trials, each with probability $1/10{,}000$ of success. By linearity of expectation (via indicator random variables), the expected number of successes is $1$.

Examples

Count	Why Poisson is a reasonable first model
Emails received in an hour	Many people could email you; any specific person is unlikely to in that hour, but there are many of them.
Chocolate chips in a cookie	Each tiny bit of dough is probably not a chip, but there are many locations where a chip could land.
Earthquakes in a region per year	On any given day an earthquake is unlikely, but there are many days, so a few may occur.

In each case Poisson is a starting-point model, not an exact law. Whether the count is actually Poisson is an empirical question, settled by collecting data, not by mathematics. Many real examples also have an obvious upper bound (whereas the Poisson runs to infinity), so they cannot be exactly Poisson — but the approximation is extremely useful.

The Poisson paradigm

Poisson paradigm (Poisson approximation)

Suppose we have events $A_1, A_2, \ldots, A_n$ with $P(A_j) = p_j$, where

$n$ is large (many things could happen),
the $p_j$ are all small (each one is unlikely),
the events are independent — or, more generally, only weakly dependent.

Then the number of events that occur is approximately Poisson with

$$\lambda = \sum_{j=1}^{n} p_j.$$

The value of $\lambda$ follows from the mean: since $\lambda$ is the expected value of a Poisson, and by linearity of expectation (which holds even under dependence) the expected number of events that occur is $\sum_j p_j$, we set $\lambda = \sum_j p_j$.

On weak dependence

Independence has a precise mathematical definition, but "weakly dependent" is hard to define formally. Intuitively, independence means knowing whether $A_1, \ldots, A_3$ occurred gives no information about $A_4, A_5, A_6, \ldots$. Weak dependence allows a little information — small deviations from independence. For example, knowing $A_1$ occurred tells us nothing about $A_3$, but knowing both $A_1$ and $A_2$ occurred makes $A_3$ slightly more or less likely.

Relation to the Binomial

The special case where all events are independent and all $p_j$ equal the same $p$ is just Bernoulli trials with common $p$, so the count is exactly $\text{Binomial}(n, p)$. The paradigm is more general in two ways: the $p_j$ may differ, and the events may be at least slightly dependent, and the approximation still works. As shown next, $\text{Binomial}(n, p)$ itself converges to a Poisson as $n$ grows large and $p$ grows small in a coordinated way.

· · ·

4. Binomial Converges to Poisson

The goal is to show the Binomial PMF converges to the Poisson PMF in a suitable limit. Start with $X \sim \text{Binomial}(n, p)$ and take:

$n \to \infty$ (capturing "a large number of events"),
$p \to 0$,
with the product $\lambda = np$ held constant.

Holding $np$ constant is the natural coupling because $E(\text{Binomial}(n,p)) = np$ and $E(\text{Pois}(\lambda)) = \lambda$, so setting $\lambda = np$ lines up the means. It forces $p = \lambda/n$, i.e. $p \to 0$ at the same rate $n \to \infty$. We hold $k$ fixed and ask what happens to the PMF at that value.

Setup

$$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}.$$

Rewrite everything in terms of $n$ using $p = \lambda/n$, and expand the binomial coefficient by its story (choose $k$ people in order, then divide by $k!$ since order does not matter):

$$\binom{n}{k} = \frac{n(n-1)(n-2)\cdots(n-k+1)}{k!}$$

Substituting $p = \lambda/n$:

$$P(X = k) = \frac{n(n-1)\cdots(n-k+1)}{k!} \cdot \frac{\lambda^k}{n^k} \cdot \left(1 - \frac{\lambda}{n}\right)^{n} \left(1 - \frac{\lambda}{n}\right)^{-k}.$$

The $\left(1 - \frac{\lambda}{n}\right)^{-k}$ factor is separated out because $k$ is fixed, which makes the limit easier to read.

Taking the limit

Handle each piece as $n \to \infty$ with $k$ and $\lambda$ fixed:

$\dfrac{\lambda^k}{k!}$ is fixed — it stays, and this is exactly the $\lambda^k / k!$ wanted for the Poisson.
The product over $n$: $k$ descending factors $n, n-1, \ldots, n-k+1$ on top and $n^k = \underbrace{n \cdots n}_{k}$ on the bottom. Matching term by term, each ratio (e.g. $(n-7)/n$) goes to $1$, so this whole block goes to $1$.
$\left(1 - \frac{\lambda}{n}\right)^{-k} \to 1^{-k} = 1$, since the base $\to 1$ and the exponent is fixed.
$\left(1 - \frac{\lambda}{n}\right)^{n} \to e^{-\lambda}$, by the limit $(1 + x/n)^n \to e^x$.

Conclusion

Multiplying the surviving pieces:

$$P(X = k) \;\longrightarrow\; \frac{e^{-\lambda}\,\lambda^k}{k!},$$

exactly the $\text{Pois}(\lambda)$ PMF at $k$. So the Binomial converges to the Poisson in this regime.

The limit $(1 + x/n)^n \to e^x$

This is one of the most important limits for the exponential, sometimes taken as the definition of $e^x$. Blitzstein calls it the compound interest formula: money compounded more and more frequently per year approaches continuous compounding (exponential growth) in the limit. To verify it, take logs and apply L'Hôpital's rule.

Raindrops: intuition for Binomial $\to$ Poisson

Count the raindrops hitting a sheet of paper in one minute. Break the paper into millions of tiny squares. If each square is small enough it is unlikely to be hit, but there are a huge number of squares, so some drops land. The intensity of the rain is captured by $\lambda$.

If we assumed every square is independent of the others and each has the same hit probability $p$, the count would be exactly Binomial. Two reasons it is not exactly Binomial:

Independence is only approximate (rain on nearby squares is plausibly correlated), though independence is a reasonable approximation.
Each square could receive two drops, not just zero or one, violating the Binomial's per-trial $0/1$ assumption.

Even if it were exactly Binomial, the parameters would be unwieldy — something like $\text{Binomial}(10^{12}, \text{tiny})$. That is hard by hand and numerically troublesome even for a computer (factorials like $1000!$ already cause difficulty). The Poisson is far simpler to work with, so the Poisson approximation is the practical choice.

· · ·

5. Worked Example: Triple Birthday Matches

The standard birthday problem asks the probability that at least two people in a group share a birthday, and has an exact answer. Now ask a harder variant: with $n$ people, find the approximate probability that some group of three people all share the same birthday. (Normally every answer in the course must be exact unless approximation is explicitly requested; here it is.)

Solving this exactly, by analogy to the basic birthday problem, is very difficult. The Poisson approximation makes it easy.

Does the paradigm apply?

The relevant quantity is not $n$ itself but $\binom{n}{3}$, the number of triplets — just as the basic birthday problem hinges on $\binom{n}{2}$ (e.g. $\binom{23}{2} = 253$ is already large, which is why $23$ people suffice against $365$ days). So $n$ need not be huge; even $n \approx 10\text{–}20$ makes $\binom{n}{3}$ large. Check the paradigm:

Many trials: $\binom{n}{3}$ triplets, large even for modest $n$.
Small per-trial probability: any specific triplet sharing a birthday is very unlikely.
Weak dependence: overlapping triplets are not independent (e.g. $I_{123}$ and $I_{124}$ share two people), but the overlap gives only a small "head start," and disjoint triplets like $I_{123}$ and $I_{456}$ are independent.

Indicators and $\lambda$

Label the people $1$ through $n$. For each triplet $\{i, j, k\}$ with $i < j < k$, define an indicator $I_{ijk} = 1$ if all three share a birthday, $0$ otherwise. By symmetry each triplet has the same match probability: the first person can have any birthday, the second must match (probability $1/365$), and the third must match (another $1/365$), giving $1/365^2$.

$$E(X) = \binom{n}{3} \cdot \frac{1}{365^2}$$

By linearity of expectation and symmetry, the expected number of triple matches is exact: $\binom{n}{3}$ indicators, each with mean $1/365^2$.

(Note this counts a group of four matching birthdays as $\binom{4}{3} = 4$ triple matches.) Set $\lambda$ equal to this expected value.

Approximation

Let $X$ be the number of triple matches. $X$ cannot be exactly Poisson — it is bounded above by the number of triplets, while the Poisson is unbounded — but it is approximately $\text{Pois}(\lambda)$ with the $\lambda$ above.

We want $P(\text{at least one triple match}) = P(X \ge 1)$. Use the complement and the Poisson PMF at $0$:

Result

$$P(X \ge 1) = 1 - P(X = 0) \;\approx\; 1 - \frac{e^{-\lambda}\lambda^0}{0!} = 1 - e^{-\lambda}, \qquad \lambda = \frac{\binom{n}{3}}{365^2}.$$

This is trivial to evaluate on a calculator — no large sums of binomial coefficients required — which is exactly why the Poisson approximation is so useful for quick, accurate estimates.