Lecture 1: Probability and Counting

Harvard Statistics 110 (Joe Blitzstein)
Watch on YouTube

1. Why Probability, and a Bit of History

Probability and statistics show up almost everywhere: physics (quantum mechanics is entirely probabilistic), genetics, economics (econometrics, game theory), finance, and the social sciences. A few less obvious applications are worth highlighting.

History and the humanities

Probability is even used to study history. A famous example is the work of Mosteller and Wallace on The Federalist Papers — documents central to the ratification of the U.S. Constitution. The authorship of several papers was disputed, and they used probabilistic, Bayes-rule-style methods (the kind developed in this course) to address it. Frederick Mosteller founded Harvard's Statistics Department and was Blitzstein's "grand-advisor" (his advisor's advisor).

The reach into social science and government keeps growing; Harvard's Institute for Quantitative Social Science (IQSS) sits at the intersection of statistics, political science, history, and government.

Gambling: the historical root

Probability theory was born out of games of chance. The canonical origin is the mid-1650s correspondence between Pierre de Fermat (a lawyer who did mathematics on the side) and Blaise Pascal, who exchanged long letters analyzing gambling problems — computing, for the first time, the probabilities of outcomes in games no one had previously treated mathematically. Their surviving letters are available online.

Gambling remains useful pedagogically because it supplies clean, concrete examples — dice, cards, coins — that can be analyzed without heavy machinery. A practical prerequisite: be familiar with a standard 52-card deck (knowing how to play poker is not required).

Newton and the danger of intuition

Even Isaac Newton was consulted by gamblers, because at the time no one else could compute the odds. Newton got his calculation right, but his intuition about a dice problem turned out to be wrong. This previews a central theme.

Theme of the course

Many results in probability are deeply counterintuitive, even to brilliant people. That is part of what makes the subject fun — it is full of paradoxes and surprises — but it is also why we must be mathematically precise rather than trusting intuition.

The slogan

Math is the logic of certainty. Statistics is the logic of uncertainty. Everyone faces uncertainty; probability and statistics are how we quantify it and update our beliefs.

· · ·

2. Sample Spaces and Events

Before defining probability we need two set-theoretic objects.

Definition — Sample space

A sample space $S$ is the set of all possible outcomes of an experiment. The word "experiment" is read extremely broadly: any process with multiple possible outcomes, where you don't know in advance which will occur.

Definition — Event

An event $A$ is a subset of the sample space $S$.

Example: rolling two six-sided dice has $36$ possible outcomes (the source of the $36$ is shown later via the multiplication rule), and a set such as "the sum is $7$" is an event.

Key idea

The breakthrough that turned probability from something "more like astrology" into a genuine mathematical subject was modeling events as sets. Before that, people reasoned by intuition and analogy, and most of those heuristics turned out to be wrong. You must be comfortable with set operations — unions, intersections, and complements — to connect intuitive statements about events to precise mathematics.

· · ·

3. The Naive Definition of Probability

This is the historical starting point — and the "high school" definition. It applies only when you have strong justification. Throughout the course, $P$ denotes probability.

Naive definition

For an event $A$,

$$P(A) = \frac{\text{number of outcomes favorable to } A}{\text{number of possible outcomes}} = \frac{|A|}{|S|}.$$

The denominator is the size of the sample space; the numerator is how many of those outcomes make $A$ occur.

Worked example: two coin flips

Flip a coin twice. The four outcomes are $\text{HH}, \text{HT}, \text{TH}, \text{TT}$. The probability that both tosses are tails is

$$P(\text{TT}) = \frac{1}{4}$$

one favorable outcome out of four.

Two strong assumptions

The naive definition silently assumes:

  1. All outcomes are equally likely.
  2. The sample space is finite. If the outcome could be any real number or integer, the denominator is infinite and the definition is meaningless.

The equally-likely assumption is reasonable under symmetry — a fair, symmetric die gives each face probability $\tfrac{1}{6}$ — but fails for a loaded die, or for a coin with "sticky" behavior where one toss influences the next. "Fair coin" itself just means heads and tails are equally likely on a single toss; be careful not to reason in a circle.

The "life on Neptune" critique

Pushed to an extreme, the naive definition becomes absurd. "Is there life on Neptune? Either there is or there isn't, so the probability is $\tfrac{1}{2}$." Most people reject this, yet such arguments appear regularly in the media.

It gets worse. "Is there intelligent life on Neptune? Either there is or there isn't, so $\tfrac{1}{2}$." But there should be a strict inequality — intelligent life must be strictly less likely than life of any kind — and the naive definition fails to capture it.

Takeaway

The naive definition is fine when equally-likely outcomes are genuinely justified and the space is finite (and it remains important for many problems and for the subject's history). Otherwise you need the more general definition, developed later in the course.

· · ·

4. Counting: The Multiplication Rule

Even when the naive definition applies, listing every outcome is hopeless beyond trivial cases, so the first major topic is counting. (Calculus is a prerequisite; counting is not, so the course builds it from scratch.)

Multiplication rule

Suppose an experiment is performed in stages:

  • The first sub-experiment has $n_1$ possible outcomes.
  • For each outcome of the first, the second sub-experiment has $n_2$ possible outcomes.
  • $\ldots$
  • For each outcome of the previous stages, the $r$-th sub-experiment has $n_r$ possible outcomes.

Then the combined experiment has $\;n_1 \cdot n_2 \cdots n_r\;$ possible outcomes overall.

It can be proved formally by induction, but it is better understood by thinking about it directly with a tree diagram.

Tree-diagram intuition: ice cream

Suppose an ice cream order has $2$ cone types — cake (C) or waffle (W) — and $3$ flavors: chocolate, vanilla, strawberry. The first branch splits $2$ ways (cone), and each of those splits $3$ ways (flavor), giving $2 \cdot 3 = 6$ outcomes.

cake —— chocolate
     —— vanilla
     —— strawberry
waffle — chocolate
     —— vanilla
     —— strawberry
2 cone branches, each splitting into 3 flavor branches: 6 leaves

Two points the tree makes obvious:

· · ·

5. Binomial Coefficients

Definition — Binomial coefficient

$\binom{n}{k}$, read "$n$ choose $k$," counts the number of subsets of size $k$ from a group of $n$ objects, where order does not matter:

$$\binom{n}{k} = \frac{n!}{(n-k)!\,k!}.$$

By convention $\binom{n}{k} = 0$ when $k > n$ (you can't choose $11$ objects from $10$).

The self-annotating notation $\binom{n}{k}$ (sometimes written $n C k$) is preferred over computing the raw number, because it records what is being counted.

Derivation from the multiplication rule

Pick $k$ objects from $n$ in a specific order:

  • 1st pick: $n$ choices
  • 2nd pick: $n - 1$ choices (can't reuse the first)
  • $\ldots$
  • $k$-th pick: $n - k + 1$ choices

That product is $n (n-1) \cdots (n - k + 1)$. But this counts ordered selections, and the same $k$ objects can be arranged in $k!$ orders, so we have overcounted by a factor of $k!$. Dividing:

$$\binom{n}{k} = \frac{n(n-1)\cdots(n-k+1)}{k!} = \frac{n!}{(n-k)!\,k!}$$

The two forms agree because the trailing factors cancel: $n! / (n-k)!$ leaves exactly the falling product $n(n-1)\cdots(n-k+1)$.

· · ·

6. Worked Example: Probability of a Full House

Find the probability of a full house in a five-card poker hand from a well-shuffled standard 52-card deck. A full house is three cards of one rank and two of another (e.g., three 7s and two 10s). Because the deck is well shuffled, all five-card hands are equally likely, so the naive definition is justified.

Denominator

Number of possible hands $= \binom{52}{5}$ — choose $5$ cards from $52$, order irrelevant.

Numerator

Build the favorable hands stage by stage (multiplication rule), keeping "three 7s and two 10s" in mind:

StageChoiceCount
Rank for the tripleany of 13 ranks$13$
Which 3 of that rank's 4 cardschoose 3 of 4$\binom{4}{3}$
Rank for the pairany of the remaining 12 ranks$12$
Which 2 of that rank's 4 cardschoose 2 of 4$\binom{4}{2}$
$$P(\text{full house}) = \frac{13 \cdot \binom{4}{3} \cdot 12 \cdot \binom{4}{2}}{\binom{52}{5}}$$
Watch the count

The rank for the pair must differ from the rank of the triple, so it has only $12$ choices, not $13$. There are other valid ways to write the answer, but reasoning through the tree (one stage at a time) is more structured and less error-prone.

· · ·

7. The Sampling Table

$\binom{n}{k}$ is one entry in a broader question: drawing a sample of $k$ objects from a population of $n$. Two independent yes/no choices generate four cases — with or without replacement, and order matters or not — giving a $2\times 2$ table.

Order mattersOrder doesn't matter
With replacement$n^k$$\binom{n+k-1}{k}$
Without replacement$\dfrac{n!}{(n-k)!}$$\binom{n}{k}$

Three of the four entries follow immediately from the multiplication rule; they should be understood, not memorized.

The hard fourth entry

With replacement, order doesn't matter is much subtler — by far the hardest of the four. The answer is

$$\binom{n+k-1}{k}.$$

It is worth verifying on small values of $n$ and $k$ and trying to see why it holds. The proof (stars and bars) is given in the next lecture.