Probability and statistics show up almost everywhere: physics (quantum mechanics is entirely probabilistic), genetics, economics (econometrics, game theory), finance, and the social sciences. A few less obvious applications are worth highlighting.
Probability is even used to study history. A famous example is the work of Mosteller and Wallace on The Federalist Papers — documents central to the ratification of the U.S. Constitution. The authorship of several papers was disputed, and they used probabilistic, Bayes-rule-style methods (the kind developed in this course) to address it. Frederick Mosteller founded Harvard's Statistics Department and was Blitzstein's "grand-advisor" (his advisor's advisor).
The reach into social science and government keeps growing; Harvard's Institute for Quantitative Social Science (IQSS) sits at the intersection of statistics, political science, history, and government.
Probability theory was born out of games of chance. The canonical origin is the mid-1650s correspondence between Pierre de Fermat (a lawyer who did mathematics on the side) and Blaise Pascal, who exchanged long letters analyzing gambling problems — computing, for the first time, the probabilities of outcomes in games no one had previously treated mathematically. Their surviving letters are available online.
Gambling remains useful pedagogically because it supplies clean, concrete examples — dice, cards, coins — that can be analyzed without heavy machinery. A practical prerequisite: be familiar with a standard 52-card deck (knowing how to play poker is not required).
Even Isaac Newton was consulted by gamblers, because at the time no one else could compute the odds. Newton got his calculation right, but his intuition about a dice problem turned out to be wrong. This previews a central theme.
Many results in probability are deeply counterintuitive, even to brilliant people. That is part of what makes the subject fun — it is full of paradoxes and surprises — but it is also why we must be mathematically precise rather than trusting intuition.
Math is the logic of certainty. Statistics is the logic of uncertainty. Everyone faces uncertainty; probability and statistics are how we quantify it and update our beliefs.
Before defining probability we need two set-theoretic objects.
A sample space $S$ is the set of all possible outcomes of an experiment. The word "experiment" is read extremely broadly: any process with multiple possible outcomes, where you don't know in advance which will occur.
An event $A$ is a subset of the sample space $S$.
Example: rolling two six-sided dice has $36$ possible outcomes (the source of the $36$ is shown later via the multiplication rule), and a set such as "the sum is $7$" is an event.
The breakthrough that turned probability from something "more like astrology" into a genuine mathematical subject was modeling events as sets. Before that, people reasoned by intuition and analogy, and most of those heuristics turned out to be wrong. You must be comfortable with set operations — unions, intersections, and complements — to connect intuitive statements about events to precise mathematics.
This is the historical starting point — and the "high school" definition. It applies only when you have strong justification. Throughout the course, $P$ denotes probability.
For an event $A$,
$$P(A) = \frac{\text{number of outcomes favorable to } A}{\text{number of possible outcomes}} = \frac{|A|}{|S|}.$$
The denominator is the size of the sample space; the numerator is how many of those outcomes make $A$ occur.
Flip a coin twice. The four outcomes are $\text{HH}, \text{HT}, \text{TH}, \text{TT}$. The probability that both tosses are tails is
one favorable outcome out of four.
The naive definition silently assumes:
The equally-likely assumption is reasonable under symmetry — a fair, symmetric die gives each face probability $\tfrac{1}{6}$ — but fails for a loaded die, or for a coin with "sticky" behavior where one toss influences the next. "Fair coin" itself just means heads and tails are equally likely on a single toss; be careful not to reason in a circle.
Pushed to an extreme, the naive definition becomes absurd. "Is there life on Neptune? Either there is or there isn't, so the probability is $\tfrac{1}{2}$." Most people reject this, yet such arguments appear regularly in the media.
It gets worse. "Is there intelligent life on Neptune? Either there is or there isn't, so $\tfrac{1}{2}$." But there should be a strict inequality — intelligent life must be strictly less likely than life of any kind — and the naive definition fails to capture it.
The naive definition is fine when equally-likely outcomes are genuinely justified and the space is finite (and it remains important for many problems and for the subject's history). Otherwise you need the more general definition, developed later in the course.
Even when the naive definition applies, listing every outcome is hopeless beyond trivial cases, so the first major topic is counting. (Calculus is a prerequisite; counting is not, so the course builds it from scratch.)
Suppose an experiment is performed in stages:
Then the combined experiment has $\;n_1 \cdot n_2 \cdots n_r\;$ possible outcomes overall.
It can be proved formally by induction, but it is better understood by thinking about it directly with a tree diagram.
Suppose an ice cream order has $2$ cone types — cake (C) or waffle (W) — and $3$ flavors: chocolate, vanilla, strawberry. The first branch splits $2$ ways (cone), and each of those splits $3$ ways (flavor), giving $2 \cdot 3 = 6$ outcomes.
cake —— chocolate
—— vanilla
—— strawberry
waffle — chocolate
—— vanilla
—— strawberry
Two points the tree makes obvious:
$\binom{n}{k}$, read "$n$ choose $k$," counts the number of subsets of size $k$ from a group of $n$ objects, where order does not matter:
$$\binom{n}{k} = \frac{n!}{(n-k)!\,k!}.$$
By convention $\binom{n}{k} = 0$ when $k > n$ (you can't choose $11$ objects from $10$).
The self-annotating notation $\binom{n}{k}$ (sometimes written $n C k$) is preferred over computing the raw number, because it records what is being counted.
Pick $k$ objects from $n$ in a specific order:
That product is $n (n-1) \cdots (n - k + 1)$. But this counts ordered selections, and the same $k$ objects can be arranged in $k!$ orders, so we have overcounted by a factor of $k!$. Dividing:
The two forms agree because the trailing factors cancel: $n! / (n-k)!$ leaves exactly the falling product $n(n-1)\cdots(n-k+1)$.
Find the probability of a full house in a five-card poker hand from a well-shuffled standard 52-card deck. A full house is three cards of one rank and two of another (e.g., three 7s and two 10s). Because the deck is well shuffled, all five-card hands are equally likely, so the naive definition is justified.
Number of possible hands $= \binom{52}{5}$ — choose $5$ cards from $52$, order irrelevant.
Build the favorable hands stage by stage (multiplication rule), keeping "three 7s and two 10s" in mind:
| Stage | Choice | Count |
|---|---|---|
| Rank for the triple | any of 13 ranks | $13$ |
| Which 3 of that rank's 4 cards | choose 3 of 4 | $\binom{4}{3}$ |
| Rank for the pair | any of the remaining 12 ranks | $12$ |
| Which 2 of that rank's 4 cards | choose 2 of 4 | $\binom{4}{2}$ |
The rank for the pair must differ from the rank of the triple, so it has only $12$ choices, not $13$. There are other valid ways to write the answer, but reasoning through the tree (one stage at a time) is more structured and less error-prone.
$\binom{n}{k}$ is one entry in a broader question: drawing a sample of $k$ objects from a population of $n$. Two independent yes/no choices generate four cases — with or without replacement, and order matters or not — giving a $2\times 2$ table.
| Order matters | Order doesn't matter | |
|---|---|---|
| With replacement | $n^k$ | $\binom{n+k-1}{k}$ |
| Without replacement | $\dfrac{n!}{(n-k)!}$ | $\binom{n}{k}$ |
Three of the four entries follow immediately from the multiplication rule; they should be understood, not memorized.
With replacement, order doesn't matter is much subtler — by far the hardest of the four. The answer is
It is worth verifying on small values of $n$ and $k$ and trying to see why it holds. The proof (stars and bars) is given in the next lecture.