Lecture 2: Story Proofs, Axioms of Probability

Harvard Statistics 110 (Joe Blitzstein)
Watch on YouTube

1. General Problem-Solving Advice

Don't abandon common sense

The course features many counterintuitive results, but that doesn't mean common sense is useless. Use it to sanity-check answers and keep work reasonable.

Notation tip

Prefer self-annotating expressions. Writing $\binom{52}{5}$ is better than computing the raw number — it communicates what is being counted. Simplify only when it's trivially reducible (e.g., $\frac{4}{2 \cdot 1} = 2$).

Check answers using special cases

Re-reading your own work doesn't catch errors — you'll repeat the same mistake. Instead:

Simple and extreme cases: plug in $n = 0$, $n = 1$, $k = 0$, etc. If the answer is obviously wrong for a trivial input, something is broken.
Alternative approaches: solve the problem a second way. If the answers agree, good. If not, you learn something.

Label everything

When a problem says "$n$ people" or "$k$ balls," immediately label them $1$ through $n$ (or $1$ through $k$). This is not stated in the problem but is extremely useful.

Key principle

As far as probability and nature are concerned, objects behave as if they are distinguishable and labeled, even if they look identical to you. Treating "identical" objects as truly indistinguishable leads to counting errors.

Example: 10 green balls in a jar all look the same, but mentally labeling them 1–10 gives the correct count. Same for "6 robberies in 6 districts" — number the robberies 1–6 and the districts 1–6.

· · ·

2. Team-Splitting Example

Different-sized teams

10 people split into a team of 4 and a team of 6:

$\binom{10}{4} = \binom{10}{6}$

Pick the team of 4; whoever's left is the team of 6. This also proves $\binom{n}{k} = \binom{n}{n-k}$ by counting the same thing two ways.

Same-sized teams

10 people split into two teams of 5:

$\dfrac{\binom{10}{5}}{2}$

The division by 2 corrects for overcounting — picking $\{1,2,3,4,5\}$ first and $\{6,7,8,9,10\}$ second is the same split as the reverse. This correction is needed only when the two groups are interchangeable (same size, no distinguishing label like "Team A" vs. "Team B").

· · ·

3. Sampling Table: The Fourth Entry

The 2×2 sampling table (with/without replacement, order matters/doesn't) has three entries that follow directly from the multiplication rule. The fourth — order doesn't matter, with replacement — is the tricky one.

Problem

Pick $k$ times from a set of $n$ objects, with replacement, where order doesn't matter. How many outcomes?

$$\binom{n + k - 1}{k}$$

Sanity checks

Case	Formula gives	Why it's correct
$k = 0$	$\binom{n-1}{0} = 1$	Picking nothing: one way (do nothing)
$k = 1$	$\binom{n}{1} = n$	Picking once: replacement and order are irrelevant
$n = 2$	$\binom{k+1}{1} = k+1$	Two boxes, $k$ dots total: box 1 can have $0, 1, \ldots, k$ dots

The $n = 2$ case is the simplest non-trivial example — a generally useful research heuristic.

Equivalence to particles-in-boxes

Choosing $k$ times from $n$ objects (with replacement, unordered) is equivalent to placing $k$ indistinguishable particles into $n$ distinguishable boxes.

Proof: Stars and Bars

Represent a configuration as a sequence of dots (particles) and vertical bars (separators between boxes).

Example: $n = 4$ boxes, $k = 6$ particles, configuration $(3,\; 0,\; 2,\; 1)$

• • • | | • • | •

3 dots, separator, separator (empty box), 2 dots, separator, 1 dot

Any valid configuration maps to exactly one such sequence and vice versa. The sequence always contains:

$k$ dots
$n - 1$ separators
Total positions: $n + k - 1$

Conclusion

To specify a configuration, choose which $k$ of the $n + k - 1$ positions are dots. That gives $\binom{n+k-1}{k}$ ways. Equivalently, choose where the $n - 1$ separators go: $\binom{n+k-1}{n-1}$ — same number.

Aside: Bose-Einstein statistics

For two fair coins, the standard model has 4 equally likely outcomes: HH, HT, TH, TT. In 1925, Bose proposed (for particles, not coins) that indistinguishable objects yield only 3 outcomes — HH, TT, and "one of each" — all equally likely. Einstein supported the idea; together they predicted Bose-Einstein condensates, confirmed experimentally ~70 years later.

For probability problems using the naive definition: use the labeled (distinguishable) model. The Bose-Einstein model is relevant in quantum physics but not the default for counting or probability.

· · ·

4. Story Proofs

A story proof (proof by interpretation) establishes an identity by counting the same quantity two different ways, using a concrete combinatorial scenario. Contrast with algebraic proofs, which manipulate factorial expressions.

Advantages over algebra:

Provides intuition for why the identity holds
Often far simpler, especially for sum identities
Makes results memorable

Example 1: Symmetry of binomial coefficients

$$\binom{n}{k} = \binom{n}{n-k}$$

Story: Pick $k$ people from $n$ to include, or equivalently pick $n - k$ people to exclude. Same partition, same count.

Example 2: Absorption / extraction identity

$$n \binom{n-1}{k-1} = k \binom{n}{k}$$

Story: Choose a committee of $k$ from $n$ people, with one member designated president.

Left side (president first): Choose the president ($n$ choices), then fill the remaining $k - 1$ seats from the remaining $n - 1$ people: $n \cdot \binom{n-1}{k-1}$.
Right side (committee first): Choose the committee of $k$ from $n$, then elect one of the $k$ members as president: $\binom{n}{k} \cdot k$.

Both count the same thing, so they're equal.

Example 3: Vandermonde's identity

$$\binom{m+n}{k} = \sum_{j=0}^{k} \binom{m}{j}\binom{n}{k-j}$$

Algebraic proof is difficult (factorials inside a sum). Story proof is clean:

Story: Pick $k$ people from a population of $m + n$, where the population consists of two groups of sizes $m$ and $n$.

If $j$ people come from the first group, then $k - j$ must come from the second. The number of ways to pick $j$ from the first group and $k - j$ from the second is $\binom{m}{j}\binom{n}{k-j}$. Summing over all valid splits $j = 0, 1, \ldots, k$ gives the total.

· · ·

5. Non-Naive Definition of Probability

The naive definition requires (1) finitely many outcomes and (2) equally likely outcomes. The general definition removes both restrictions.

Probability space

A probability space consists of $(S, P)$:

$S$ — sample space: the set of all possible outcomes (can be infinite).
$P$ — probability function: maps events (subsets of $S$) to real numbers in $[0, 1]$.

An event $A$ is a subset of $S$. We say $A$ occurs if the observed outcome $s_0 \in A$.

Axioms

Only two rules are needed. Every theorem in probability theory follows from them.

Axiom 1 — Extremes

$$P(\varnothing) = 0 \qquad P(S) = 1$$

Impossible events have probability zero; the certain event has probability one.

Axiom 2 — Countable Additivity

If $A_1, A_2, A_3, \ldots$ are disjoint (non-overlapping) events, then:

$$P\!\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} P(A_i)$$

This extends to countably infinite unions, not just finite ones.