Lecture 6: Monty Hall, Simpson's Paradox

Harvard Statistics 110 (Joe Blitzstein)
Watch on YouTube

1. Monty Hall: Setup and Assumptions

The problem is named after Monty Hall, host of the TV game show Let's Make a Deal. It is the kind of problem almost everyone gets wrong the first time, thinks they understand, then gets fooled by again in slight disguise — so the goal is a way of thinking, not just an answer.

The setup:

There are three doors: door 1, door 2, door 3.
One door hides a car; the other two hide goats.
You (the contestant) want the car, not a goat.
You pick a door. By symmetry we may assume you pick door 1 (renumber otherwise).
Monty then opens one of the other two doors, always revealing a goat, and offers you the chance to switch to the remaining unopened door.

The question: should you switch? The assumptions matter, and several are usually left implicit.

Assumptions

The car is equally likely to be behind any door: $P(D_j) = \tfrac{1}{3}$ for each $j$, before any door is opened.
Monty knows where the car is. This is essential — it is a different problem if he doesn't.
Monty always opens a goat door and always offers the switch. He never reveals the car (which is why he must know where it is — otherwise he might spoil the game by chance).
When Monty has a choice of goat door (only when your initial guess is correct), he picks between them with equal probability $\tfrac{1}{2}$.

The last assumption is the one most often omitted. A homework extension explores "lazy Monty Hall," where he prefers opening door 2 with probability $p$ (he doesn't want to walk to door 3); that breaks the symmetry, but the basic problem assumes $p = \tfrac{1}{2}$.

· · ·

2. The Answer, and Why "50/50" Is Wrong

The naive reasoning: Monty opens door 2, so you choose between door 1 and door 3 with no apparent distinguishing information — so it's 50/50. This is wrong. Under the stated assumptions:

$$P(\text{win by switching}) = \tfrac{2}{3}, \qquad P(\text{win by staying}) = \tfrac{1}{3}$$

So you should always switch.

Why "50/50" fails

The "50/50" answer abuses the naive definition of probability, which assumes equally likely outcomes. The doors were equally likely initially ($\tfrac{1}{3}$ each); that does not mean they stay equally likely conditionally, after observing what Monty does.

The crucial point: condition on all the evidence. The naive approach conditions only on "door 2 has a goat." The real evidence is richer — it is that Monty chose to open door 2. Seeing why that extra fact matters is the whole problem.

· · ·

3. Solution 1: Tree Diagram

The cleanest picture is a two-stage probability tree, conditioned on the contestant having chosen door 1. The first branch is which door has the car; the second is which door Monty opens.

car 1 —— opens 2 (½) → 1/6
car 1 —— opens 3 (½) → 1/6
car 2 —— opens 3 (1) → 1/3
car 3 —— opens 2 (1) → 1/3

Car door (1/3 each) → door Monty opens → path probability. If the car is behind your door 1, Monty may open either goat door (1/2 each); otherwise he is forced.

Reading the branches:

Car behind door 1 (your choice): Monty may open door 2 or door 3, each with probability $\tfrac{1}{2}$.
Car behind door 2: Monty has no choice — he must open door 3 (probability $1$).
Car behind door 3: Monty has no choice — he must open door 2 (probability $1$).

Now condition on the observed event "Monty opens door 2." Only two paths are consistent with it:

Path	Car door	Monty opens	Path probability
A	door 1	door 2	$\tfrac{1}{3} \cdot \tfrac{1}{2} = \tfrac{1}{6}$
B	door 3	door 2	$\tfrac{1}{3} \cdot 1 = \tfrac{1}{3}$

The other two paths are deleted, just as in the pebble-world picture: conditioning removes everything inconsistent with what you observe. The survivors ($\tfrac{1}{6}$ and $\tfrac{1}{3}$) don't sum to $1$, so renormalize by dividing by their total $\tfrac{1}{2}$ (equivalently, multiply both by $2$):

$$P(D_1 \mid \text{opens } 2) = \frac{1/6}{1/2} = \tfrac{1}{3}, \qquad P(D_3 \mid \text{opens } 2) = \frac{1/3}{1/2} = \tfrac{2}{3}$$

Conclusion

Conditional on Monty opening door 2, the car is behind door 3 with probability $\tfrac{2}{3}$. Switching to door 3 therefore succeeds with probability $\tfrac{2}{3}$. By the same calculation (circling the other two paths), if Monty opens door 3 the answer is again $\tfrac{2}{3}$.

· · ·

4. Solution 2: Law of Total Probability

The tree is a law-of-total-probability calculation in pictures. The key step in any such argument is deciding what to condition on. Use wishful thinking: ask what you wish you knew. Here, obviously, you wish you knew where the car is — so condition on that.

$$P(S) = \sum_{j=1}^{3} P(S \mid D_j)\, P(D_j)$$

Let $S$ be the event that you succeed using the switching strategy, and $D_j$ the event that door $j$ has the car. Condition on which door has the car, with prior weights $P(D_j) = \tfrac{1}{3}$. Given that you picked door 1 and will switch, the conditional probabilities are easy:

$P(S \mid D_1) = 0$ — the car is behind your chosen door, so switching always fails.
$P(S \mid D_2) = 1$ — Monty must open door 3; switching lands on door 2.
$P(S \mid D_3) = 1$ — Monty must open door 2; switching lands on door 3.

$$P(S) = 0 \cdot \tfrac{1}{3} + 1 \cdot \tfrac{1}{3} + 1 \cdot \tfrac{1}{3} = \tfrac{2}{3}$$

This is the unconditional probability that switching succeeds. Is the conditional probability given that Monty opened a specific door the same? Here, yes — by symmetry. Doors 2 and 3 are interchangeable until Monty opens one, so:

$$P(S \mid \text{Monty opens } 2) = P(S \mid \text{Monty opens } 3) = \tfrac{2}{3}$$

Both the conditional and unconditional probabilities equal $\tfrac{2}{3}$. In the "lazy Monty" extension the unconditional probability is still $\tfrac{2}{3}$, but the conditional probabilities change because the symmetry is broken.

· · ·

5. Two Intuitions and Some History

Intuition 1: Case breakdown

The clean verbal summary

One-third of the time your initial guess is right — then switching loses. But that is only $\tfrac{1}{3}$ of the time.
Two-thirds of the time your initial guess is wrong — Monty opens the only goat door he can, and the car sits behind the one remaining door, so switching wins.

Switching wins exactly when your initial guess was wrong, which is $\tfrac{2}{3}$ of the time.

Intuition 2: The million-door version

Take the extreme case: replace 3 doors with $1{,}000{,}000$. You pick one; Monty opens $999{,}998$ goat doors, leaving just your door and one other. Almost nobody refuses to switch here — you are almost certain your initial guess (probability $\tfrac{1}{1{,}000{,}000}$) was wrong, and almost certain the single remaining door hides the car.

Conceptually there is no difference between this and the three-door problem; the "50/50" argument would apply identically and is just as wrong. The million-door version makes the absurdity visible; the three-door version hides it.

History and simulation

The controversy erupted when a reader posed the question to Marilyn vos Savant's column in Parade magazine. She gave the correct answer (switch), and thousands wrote in insisting she was wrong — including some with PhDs in mathematics, some quite rudely. The dispute partly reflects genuine ambiguity when assumptions are implicit, and partly how strongly the wrong intuition grips people.

A practical lesson: even without conditional-probability machinery, you can just simulate — with cups and props, or a short program. Run it a thousand times and switching wins about two-thirds of the time. When in doubt, simulate.

· · ·

6. Simpson's Paradox

The second notorious problem: is it possible for one doctor to have a higher success rate than another at every single type of surgery, yet a lower overall success rate?

The phenomenon

It sounds impossible — surely if A beats B in every category, A beats B in the total. Simpson's paradox says no: the direction of an inequality can flip when you aggregate. One thing looks better in every individual case yet worse in the total.

An aside on "paradox": there is no such thing as a true paradox. A genuine contradiction would mean the universe explodes and we wouldn't be here. What we call a paradox is something deeply counterintuitive — it forces you to think harder, and once you do, it makes sense.

Worked example: Dr. Hibbert vs. Dr. Nick

Blitzstein uses two doctors from The Simpsons (a mnemonic for "Simpson's"). Dr. Hibbert is the respected, expensive town doctor; Dr. Nick is the cheap infomercial quack who offers any surgery for \$129.99. The numbers are invented to make the paradox stark. Each doctor performs 100 surgeries total — so neither does more volume — split between heart surgery (hard) and band-aid removal (easy).

Dr. Hibbert

Surgery	Successes	Failures	Total	Success rate
Heart	70	20	90	$70/90 \approx 78\%$
Band-aid	10	0	10	$10/10 = 100\%$
Overall	80	20	100	$80/100 = 80\%$

Dr. Nick

Surgery	Successes	Failures	Total	Success rate
Heart	2	8	10	$2/10 = 20\%$
Band-aid	81	9	90	$81/90 = 90\%$
Overall	83	17	100	$83/100 = 83\%$

Within each type, Dr. Hibbert wins: heart $78\%$ vs. $20\%$, band-aid $100\%$ vs. $90\%$. Yet aggregated, Dr. Nick wins, $83\%$ vs. $80\%$ — he can truthfully advertise the higher rate at a fraction of the price.

Why it happens: the lurking variable

The mechanism

The direction flipped between the conditional comparison (per surgery type, prefer Hibbert) and the unconditional one (aggregated, Nick looks better). The cause: $90\%$ of Dr. Nick's surgeries are easy band-aid removals, while Dr. Hibbert took mostly hard heart surgeries. The surgery mix differs, and that drives the aggregate.

This is realistic. The world's leading neurosurgeons may have lower headline success rates precisely because they get referred the hardest cases that no one else can handle.

Other examples

Baseball: player A can have a higher batting average than B in the first half of the season and the second half, yet a lower average over the whole season.
UC Berkeley admissions (a real lawsuit): aggregated graduate-admissions data appeared to show bias against women, but department by department there was generally no clear evidence. Women applied more to competitive departments (lower admit rates for everyone), and aggregating produced the apparent paradox.
Jelly beans (the author's first encounter): two "better" jars each beat their paired "worse" jar; combine the two better jars and the two worse jars, and the aggregated "better" jar can end up with a lower fraction of your favorite flavor. Making up your own numbers is excellent practice.

The "adding fractions wrong" intuition

Memorable mechanism

To add fractions correctly you do not add numerators and denominators:

$$\frac{1}{3} + \frac{2}{5} \neq \frac{3}{8}$$

But adding numerators and denominators (the "wrong" way) is exactly how aggregation works — you add up successes and add up trials. If fractions actually added that way, the per-category ordering would always carry over and the paradox could not occur. Because real addition is not like that, the paradox is possible.

· · ·

7. Simpson's Paradox: Formal Statement

Map the example to events:

$A$ = the surgery is successful.
$B$ = treated by Dr. Nick (so $B^c$ = treated by Dr. Hibbert).
$C$ = the surgery is heart surgery (so $C^c$ = band-aid removal). $C$ is the confounder.

Simpson's Paradox (general form)

It is possible for all three of these to hold at once:

$$P(A \mid B, C) < P(A \mid B^c, C)$$

$$P(A \mid B, C^c) < P(A \mid B^c, C^c)$$

$$\text{yet} \qquad P(A \mid B) > P(A \mid B^c)$$

The within-category comparisons favor $B^c$ (Hibbert), but aggregation reverses the inequality. Essentially any instance of the paradox can be written this way.

The confounder, and why to control for it

Confounding

$C$ is the confounder (the letter also stands for "control") — a variable to control for. The more relevant comparison is the conditional one: surgery type clearly matters, and everyone agrees Dr. Hibbert is better. Failing to condition on $C$ gives a misleading answer, because knowing the doctor ($B$) gives information about the surgery type ($C$), which affects success. Choosing Dr. Nick signals an easy band-aid removal, inflating his apparent rate.

Why the inequality can flip: where the proof breaks

It is tempting to think the aggregate inequality must follow from the per-category ones via the law of total probability. Seeing where that fails is instructive. The conditional form (everything given $B$) is valid — conditional probabilities are genuine probabilities:

$$P(A \mid B) = P(A \mid B, C)\,P(C \mid B) + P(A \mid B, C^c)\,P(C^c \mid B)$$

The analogous expansion holds with $B$ replaced by $B^c$. We know each Nick term is smaller than the corresponding Hibbert term:

$$P(A \mid B, C) < P(A \mid B^c, C), \qquad P(A \mid B, C^c) < P(A \mid B^c, C^c)$$

Where it breaks

You cannot conclude $P(A \mid B) < P(A \mid B^c)$, because the weights differ. Nick's weights are $P(C \mid B)$ and $P(C^c \mid B)$; Hibbert's are $P(C \mid B^c)$ and $P(C^c \mid B^c)$, and there is no way to relate them. Concretely, $P(C \mid B) = \tfrac{10}{100} = 0.1$ (a Nick surgery being a heart surgery) is utterly different from $P(C \mid B^c) = \tfrac{90}{100} = 0.9$. Because the weights change between doctors, the aggregate can flip — that weight difference is exactly what enables Simpson's paradox.

· · ·

8. Key Takeaways

Conditioning means conditioning on all the evidence, not a convenient subset. In Monty Hall the evidence includes which door Monty opened, not merely that it had a goat.
The law of total probability is "wishful thinking": condition on the thing you wish you knew (where the car is), then proceed as if you knew it.
Extreme cases (a million doors) and simulation are both powerful for building and checking intuition.
Aggregating data can reverse the direction of a comparison (Simpson's paradox). The reversal comes from differing weights — differing category mixes — not from any contradiction.
Identify and control for confounders. Failing to condition on a relevant variable can produce a confidently wrong conclusion.