Lecture 3: Birthday Problem, Properties of Probability

Harvard Statistics 110 (Joe Blitzstein)
Watch on YouTube

1. The Birthday Problem

One of the most famous problems in probability, and one that surprises almost everyone the first time they see it.

Statement

You have a group of $K$ people (at a party, in a room). What is the probability that at least one pair shares the same birthday? Not a specific pair, and not necessarily exactly one pair — just that somewhere in the group two people match. A natural follow-up: how many people do you need for at least a 50/50 chance of a match?

Assumptions

365 days, no Feb 29. Leap years can be handled but add bookkeeping for little payoff; Feb 29 is genuinely rarer than other days, so treating the year as 366 equally likely days would be wrong, and 365 is close enough.
The 365 days are equally likely. An empirical claim, not a mathematical one. Real data show small seasonal effects (e.g., more births about 9 months after holidays), and curiously the pattern differs by country. The differences are small, so we assume uniformity.
Births are independent. Informally: one person's birthday tells you nothing about another's. (Twins would break this; the formal definition of independence comes later.)

Easy case: $K > 365$ (pigeonhole)

Picture 365 boxes, one per day, and drop one labeled dot per person into the box for their birthday. People are labeled $1$ through $K$ — distinguishable individuals, not interchangeable particles.

••• | • | ··· | ••

Jan 1 (3 people), Jan 2 (1), ..., Dec 31 (2)

Pigeonhole principle

If there are more people than boxes ($K > 365$), some box must hold at least two dots: more items than containers forces a shared container. So for $K > 365$ the probability of a match is exactly $1$.

This collision structure is central in computer science. When a data structure (e.g., a hash table) tries to store two items in the same slot, that is a "collision," and the probability of collisions is exactly a birthday-problem calculation.

Main case: $K \le 365$

Most people, asked how many are needed for a 50/50 chance, guess well over 100 — often 150 or 180 (roughly $365/2$). The actual answer is 23.

Strategy: use the complement

It is far easier to compute $P(\text{no match})$ and subtract from $1$. Choosing whether to attack an event or its complement is a recurring strategic decision.

Using the naive definition (justified because outcomes are equally likely):

Denominator — total ways to assign birthdays to $K$ labeled people, repetition allowed: $365^K$ (multiplication rule).
Numerator — ways with no repeat. People enter in ID order: person 1 has any of $365$ days, person 2 any of the remaining $364$, person 3 any of $363$, and so on.

$$P(\text{no match}) = \frac{365 \cdot 364 \cdot 363 \cdots (365 - K + 1)}{365^K}$$

Off-by-one warning

The numerator must have exactly $K$ terms; the last is $365 - K + 1$, not $365 - K$. Check $K = 1$: $P(\text{no match})$ should be $365/365 = 1$ (one person has no one to match), and the formula gives the single term $365 - 1 + 1 = 365$. Dropping the "$+1$" is the classic mistake.

$$P(\text{at least one match}) = 1 - P(\text{no match})$$

Numerical results

$K$ (people)	$P(\text{at least one match})$
$23$	$50.7\%$ (first $K$ to exceed 50%)
$50$	$97\%$
$100$	$> 99.9999\%$

With only 50 people the chance is already 97%, and with 100 (still under a third of 365) it is essentially certain — wildly higher than the "over 100 needed" intuition.

Why 23? Think in pairs

The intuitive but misleading quantity is $K$ — "only 23 people." The relevant quantity is the number of pairs, $\binom{K}{2} = \frac{K(K-1)}{2}$, because any pair can be the matching pair.

$$\binom{23}{2} = \frac{23 \cdot 22}{2} = 23 \cdot 11 = 253$$

(Trick for multiplying by 11: add the two digits of 23 and place the sum in the middle, $2\_3 \to 2(5)3 = 253$.) So 23 people generate 253 pairs — on the same order of magnitude as 365. Seen this way, a match is no longer surprising: there are 253 separate chances for a coincidence.

Near-match variant

If we relax "same birthday" to "same birthday or within one day," the threshold for a 50/50 chance drops from 23 to 14 — again surprising. Proving the 14 is much harder: the naive instinct of replacing 365 with 362 (excluding the day, the day before, and the day after) and multiplying fails, because the forbidden days overlap and leave gaps that no clean product can handle. Approximation methods for such problems come later in the course.

"The biggest coincidence of all would be if there were no coincidences." Among the mind-boggling number of possible coincidences in the world, some are bound to occur. The birthday result is a clean instance: 23 people quietly create 253 opportunities for a match, so a match is expected rather than remarkable. Stanford's "Probability by Surprise" applets (and similar simulations online) help build this intuition by running the experiment repeatedly.

· · ·

2. The Axioms of Probability (Recap)

A probability space is a sample space $S$ together with a function $P$ (P for probability) that assigns numbers to events. Only two axioms are needed.

Axiom 1 — Extremes

$$P(\varnothing) = 0 \qquad P(S) = 1$$

The empty set is an event that can never occur, so by convention it gets probability $0$; the full sample space always occurs, so it gets probability $1$.

Axiom 2 — Countable Additivity

If $A_1, A_2, A_3, \ldots$ are disjoint (non-overlapping) events, then:

$$P\!\left(\bigcup_{n=1}^{\infty} A_n\right) = \sum_{n=1}^{\infty} P(A_n)$$

This holds for finite unions ($n = 1$ to $M$) and for countably infinite ones.

Disjointness is the essential condition. Thinking of probability as area inside the sample-space rectangle (total area $1$) makes it intuitive: the area of a union of non-overlapping blobs is just the sum of their areas.

Two breakthroughs underpin modern probability: framing it in terms of sets and events (unions, intersections), and writing down these axioms — Kolmogorov was central to the axiomatic formulation. Before the axioms it was hard to say what counted as a correct probabilistic argument, and centuries of (still unresolved) philosophical debate over the meaning of probability produced less mathematical progress than these two simple rules did. The power of the axioms: any $P$, under any interpretation, that maps events to $[0, 1]$ and satisfies the two axioms makes every theorem of probability applicable — without our having to settle what probability "really means."

· · ·

3. Properties Derived from the Axioms

Some simple, very useful consequences. (Non-negativity, $0 \le P \le 1$, can be treated as a baseline assumption — call it "Axiom 0" if you like.)

Property 1: Complement

$$P(A^c) = 1 - P(A)$$

Intuition from the naive definition: total $=$ favorable $+$ unfavorable. From the area picture: if $A$ has area $0.3$, everything outside has area $0.7$.

Proof

$1 = P(S)$ by Axiom 1. Write $S = A \cup A^c$ (everything inside $A$ together with everything outside). Since $A$ and $A^c$ are disjoint ($A \cap A^c = \varnothing$ by definition), Axiom 2 gives $P(S) = P(A) + P(A^c)$. Hence $1 = P(A) + P(A^c)$, i.e. $P(A^c) = 1 - P(A)$.

Property 2: Monotonicity

If $A \subseteq B$ (meaning: if $A$ occurs then $B$ occurs), then

$$P(A) \le P(B)$$

Obvious from the picture — a bigger oval has bigger area — but here is a proof from the axioms. Decompose $B$ into two disjoint pieces: the part inside $A$, and the ring of $B$ that lies outside $A$.

Proof

Write $B = A \cup (B \cap A^c)$. The two pieces are disjoint, so by Axiom 2, $$P(B) = P(A) + P(B \cap A^c).$$ Probabilities are non-negative, so the added term is $\ge 0$, giving $P(B) \ge P(A)$.

Property 3: Probability of a union (two events)

When $A$ and $B$ are disjoint, $P(A \cup B) = P(A) + P(B)$. When they overlap, adding the two probabilities double-counts the intersection, so we subtract it once. This is the two-event case of inclusion-exclusion.

$$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$

Proof by disjointification and wishful thinking

The strategy is to make things disjoint so Axiom 2 applies. Rewrite the union as $A$ together with the part of $B$ not already in $A$: $$A \cup B = A \cup (B \cap A^c).$$ These pieces are disjoint, so $P(A \cup B) = P(A) + P(B \cap A^c)$.

Now use "wishful thinking": we wish $P(A) + P(B \cap A^c) = P(A) + P(B) - P(A \cap B)$, marked with a question mark rather than asserted. Cancelling $P(A)$, the wish reduces to $$P(A \cap B) + P(A^c \cap B) = P(B).$$ This holds by Axiom 2: $A \cap B$ and $A^c \cap B$ are disjoint (an element in both would lie in $A$ and $A^c$, impossible), and their union is $B$ (we split $B$ into its part inside $A$ and its part outside $A$). Wish confirmed.

· · ·

4. General Inclusion-Exclusion

Three events

$$P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(A \cap B) - P(A \cap C) - P(B \cap C) + P(A \cap B \cap C)$$

Reasoning via the triple-overlap diagram:

Add the three singles — overlaps are now overcounted.
Subtract the three pairwise intersections.
Track the triple intersection: added 3 times, then subtracted 3 times, so counted 0 times. Everything outside it is now correct; add the triple intersection back once.

Why "inclusion-exclusion"

We keep including and excluding, alternating, until everything is counted exactly once.

$n$ events

$$P\!\left(\bigcup_{i=1}^{n} A_i\right) = \sum_i P(A_i) - \sum_{i

The conditions $i < j$ (and $i < j < k$, etc.) avoid listing the same intersection twice. The signs alternate; the final, full-intersection term carries sign $(-1)^{n+1}$. (Check: $n = 2$ ends on a subtraction, $n = 3$ ends on an addition — consistent.) The last term has no summation because there is exactly one intersection of all $n$ events. The proof is by induction, structurally identical to the two-event case but more tedious.

· · ·

5. de Montmort's Matching Problem

A famous application of inclusion-exclusion, from de Montmort (1713), when probability was still in its infancy and the motivation was gambling. Also called the matching problem; it appears in many disguises.

The game

Take a deck of $n$ cards labeled $1$ through $n$ (one number per card). Shuffle. Flip cards one at a time while counting aloud: "one" on the first flip, "two" on the second, and so on. You win if at any point the number you say matches the number on the card flipped — i.e., the card in position $j$ happens to be card number $j$. Question: what is $P(\text{at least one match})$?

Setting up the events

Let $A_j$ be the event that card $j$ matches — the $j$-th card in the deck is the card numbered $j$. We want $P(A_1 \cup A_2 \cup \cdots \cup A_n)$. Direct approaches are hard; inclusion-exclusion is the clean route, and symmetry makes the sums collapse.

Intersection probabilities

Using the naive definition with $n!$ equally likely shuffles:

$$P(A_j) = \frac{1}{n}$$

Card $j$ is equally likely to occupy any of the $n$ positions. Equivalently, fix card $j$ in its slot and permute the other $n - 1$ cards: $(n-1)!/n! = 1/n$. Crucially this does not depend on $j$.

$$P(A_1 \cap \cdots \cap A_k) = \frac{(n-k)!}{n!}$$

By symmetry, for any specific $k$ events: the first $k$ cards are pinned to their positions, the remaining $n - k$ permute freely. For $k = 2$ this is $\frac{(n-2)!}{n!} = \frac{1}{n(n-1)}$.

Applying inclusion-exclusion

There are $\binom{n}{j}$ terms at level $j$, each equal to $\frac{(n-j)!}{n!}$. The factorials cancel beautifully:

$$\binom{n}{k}\cdot\frac{(n-k)!}{n!} = \frac{n!}{k!\,(n-k)!}\cdot\frac{(n-k)!}{n!} = \frac{1}{k!}$$

So level 1 contributes $n \cdot \tfrac{1}{n} = 1$, level 2 contributes $\tfrac{1}{2!}$, level 3 contributes $\tfrac{1}{3!}$, and so on. With the alternating signs:

$$P(\text{at least one match}) = 1 - \frac{1}{2!} + \frac{1}{3!} - \frac{1}{4!} + \cdots + \frac{(-1)^{n+1}}{n!}$$

The punchline: $1 - 1/e$

Conclusion

That alternating sum is the Taylor series for $e^x$ at $x = -1$ (truncated): $$e^{-1} = 1 - 1 + \frac{1}{2!} - \frac{1}{3!} + \frac{1}{4!} - \cdots$$ so $$P(\text{at least one match}) \approx 1 - \frac{1}{e} \approx 0.632.$$ The answer barely depends on $n$ once $n$ is moderately large, and the constant $e$ appears even though the problem mentions nothing exponential. The number $1/e$ recurs throughout the course.