One of the most famous problems in probability, and one that surprises almost everyone the first time they see it.
You have a group of $K$ people (at a party, in a room). What is the probability that at least one pair shares the same birthday? Not a specific pair, and not necessarily exactly one pair — just that somewhere in the group two people match. A natural follow-up: how many people do you need for at least a 50/50 chance of a match?
Picture 365 boxes, one per day, and drop one labeled dot per person into the box for their birthday. People are labeled $1$ through $K$ — distinguishable individuals, not interchangeable particles.
If there are more people than boxes ($K > 365$), some box must hold at least two dots: more items than containers forces a shared container. So for $K > 365$ the probability of a match is exactly $1$.
This collision structure is central in computer science. When a data structure (e.g., a hash table) tries to store two items in the same slot, that is a "collision," and the probability of collisions is exactly a birthday-problem calculation.
Most people, asked how many are needed for a 50/50 chance, guess well over 100 — often 150 or 180 (roughly $365/2$). The actual answer is 23.
It is far easier to compute $P(\text{no match})$ and subtract from $1$. Choosing whether to attack an event or its complement is a recurring strategic decision.
Using the naive definition (justified because outcomes are equally likely):
The numerator must have exactly $K$ terms; the last is $365 - K + 1$, not $365 - K$. Check $K = 1$: $P(\text{no match})$ should be $365/365 = 1$ (one person has no one to match), and the formula gives the single term $365 - 1 + 1 = 365$. Dropping the "$+1$" is the classic mistake.
| $K$ (people) | $P(\text{at least one match})$ |
|---|---|
| $23$ | $50.7\%$ (first $K$ to exceed 50%) |
| $50$ | $97\%$ |
| $100$ | $> 99.9999\%$ |
With only 50 people the chance is already 97%, and with 100 (still under a third of 365) it is essentially certain — wildly higher than the "over 100 needed" intuition.
The intuitive but misleading quantity is $K$ — "only 23 people." The relevant quantity is the number of pairs, $\binom{K}{2} = \frac{K(K-1)}{2}$, because any pair can be the matching pair.
(Trick for multiplying by 11: add the two digits of 23 and place the sum in the middle, $2\_3 \to 2(5)3 = 253$.) So 23 people generate 253 pairs — on the same order of magnitude as 365. Seen this way, a match is no longer surprising: there are 253 separate chances for a coincidence.
If we relax "same birthday" to "same birthday or within one day," the threshold for a 50/50 chance drops from 23 to 14 — again surprising. Proving the 14 is much harder: the naive instinct of replacing 365 with 362 (excluding the day, the day before, and the day after) and multiplying fails, because the forbidden days overlap and leave gaps that no clean product can handle. Approximation methods for such problems come later in the course.
"The biggest coincidence of all would be if there were no coincidences." Among the mind-boggling number of possible coincidences in the world, some are bound to occur. The birthday result is a clean instance: 23 people quietly create 253 opportunities for a match, so a match is expected rather than remarkable. Stanford's "Probability by Surprise" applets (and similar simulations online) help build this intuition by running the experiment repeatedly.
A probability space is a sample space $S$ together with a function $P$ (P for probability) that assigns numbers to events. Only two axioms are needed.
$$P(\varnothing) = 0 \qquad P(S) = 1$$
The empty set is an event that can never occur, so by convention it gets probability $0$; the full sample space always occurs, so it gets probability $1$.
If $A_1, A_2, A_3, \ldots$ are disjoint (non-overlapping) events, then:
$$P\!\left(\bigcup_{n=1}^{\infty} A_n\right) = \sum_{n=1}^{\infty} P(A_n)$$
This holds for finite unions ($n = 1$ to $M$) and for countably infinite ones.
Disjointness is the essential condition. Thinking of probability as area inside the sample-space rectangle (total area $1$) makes it intuitive: the area of a union of non-overlapping blobs is just the sum of their areas.
Two breakthroughs underpin modern probability: framing it in terms of sets and events (unions, intersections), and writing down these axioms — Kolmogorov was central to the axiomatic formulation. Before the axioms it was hard to say what counted as a correct probabilistic argument, and centuries of (still unresolved) philosophical debate over the meaning of probability produced less mathematical progress than these two simple rules did. The power of the axioms: any $P$, under any interpretation, that maps events to $[0, 1]$ and satisfies the two axioms makes every theorem of probability applicable — without our having to settle what probability "really means."
Some simple, very useful consequences. (Non-negativity, $0 \le P \le 1$, can be treated as a baseline assumption — call it "Axiom 0" if you like.)
Intuition from the naive definition: total $=$ favorable $+$ unfavorable. From the area picture: if $A$ has area $0.3$, everything outside has area $0.7$.
$1 = P(S)$ by Axiom 1. Write $S = A \cup A^c$ (everything inside $A$ together with everything outside). Since $A$ and $A^c$ are disjoint ($A \cap A^c = \varnothing$ by definition), Axiom 2 gives $P(S) = P(A) + P(A^c)$. Hence $1 = P(A) + P(A^c)$, i.e. $P(A^c) = 1 - P(A)$.
If $A \subseteq B$ (meaning: if $A$ occurs then $B$ occurs), then
Obvious from the picture — a bigger oval has bigger area — but here is a proof from the axioms. Decompose $B$ into two disjoint pieces: the part inside $A$, and the ring of $B$ that lies outside $A$.
Write $B = A \cup (B \cap A^c)$. The two pieces are disjoint, so by Axiom 2, $$P(B) = P(A) + P(B \cap A^c).$$ Probabilities are non-negative, so the added term is $\ge 0$, giving $P(B) \ge P(A)$.
When $A$ and $B$ are disjoint, $P(A \cup B) = P(A) + P(B)$. When they overlap, adding the two probabilities double-counts the intersection, so we subtract it once. This is the two-event case of inclusion-exclusion.
The strategy is to make things disjoint so Axiom 2 applies. Rewrite the union as $A$ together with the part of $B$ not already in $A$: $$A \cup B = A \cup (B \cap A^c).$$ These pieces are disjoint, so $P(A \cup B) = P(A) + P(B \cap A^c)$.
Now use "wishful thinking": we wish $P(A) + P(B \cap A^c) = P(A) + P(B) - P(A \cap B)$, marked with a question mark rather than asserted. Cancelling $P(A)$, the wish reduces to $$P(A \cap B) + P(A^c \cap B) = P(B).$$ This holds by Axiom 2: $A \cap B$ and $A^c \cap B$ are disjoint (an element in both would lie in $A$ and $A^c$, impossible), and their union is $B$ (we split $B$ into its part inside $A$ and its part outside $A$). Wish confirmed.
Reasoning via the triple-overlap diagram:
We keep including and excluding, alternating, until everything is counted exactly once.
The conditions $i < j$ (and $i < j < k$, etc.) avoid listing the same intersection twice. The signs alternate; the final, full-intersection term carries sign $(-1)^{n+1}$. (Check: $n = 2$ ends on a subtraction, $n = 3$ ends on an addition — consistent.) The last term has no summation because there is exactly one intersection of all $n$ events. The proof is by induction, structurally identical to the two-event case but more tedious.
A famous application of inclusion-exclusion, from de Montmort (1713), when probability was still in its infancy and the motivation was gambling. Also called the matching problem; it appears in many disguises.
Take a deck of $n$ cards labeled $1$ through $n$ (one number per card). Shuffle. Flip cards one at a time while counting aloud: "one" on the first flip, "two" on the second, and so on. You win if at any point the number you say matches the number on the card flipped — i.e., the card in position $j$ happens to be card number $j$. Question: what is $P(\text{at least one match})$?
Let $A_j$ be the event that card $j$ matches — the $j$-th card in the deck is the card numbered $j$. We want $P(A_1 \cup A_2 \cup \cdots \cup A_n)$. Direct approaches are hard; inclusion-exclusion is the clean route, and symmetry makes the sums collapse.
Using the naive definition with $n!$ equally likely shuffles:
Card $j$ is equally likely to occupy any of the $n$ positions. Equivalently, fix card $j$ in its slot and permute the other $n - 1$ cards: $(n-1)!/n! = 1/n$. Crucially this does not depend on $j$.
By symmetry, for any specific $k$ events: the first $k$ cards are pinned to their positions, the remaining $n - k$ permute freely. For $k = 2$ this is $\frac{(n-2)!}{n!} = \frac{1}{n(n-1)}$.
There are $\binom{n}{j}$ terms at level $j$, each equal to $\frac{(n-j)!}{n!}$. The factorials cancel beautifully:
So level 1 contributes $n \cdot \tfrac{1}{n} = 1$, level 2 contributes $\tfrac{1}{2!}$, level 3 contributes $\tfrac{1}{3!}$, and so on. With the alternating signs:
That alternating sum is the Taylor series for $e^x$ at $x = -1$ (truncated): $$e^{-1} = 1 - 1 + \frac{1}{2!} - \frac{1}{3!} + \frac{1}{4!} - \cdots$$ so $$P(\text{at least one match}) \approx 1 - \frac{1}{e} \approx 0.632.$$ The answer barely depends on $n$ once $n$ is moderately large, and the constant $e$ appears even though the problem mentions nothing exponential. The number $1/e$ recurs throughout the course.