If the whole semester had to be compressed into two ideas, they would be conditioning and random variables and their distributions. Conditioning has already appeared (Blitzstein's slogan: "conditioning is the soul of statistics"); the gambler's ruin problem is one last big conditioning example before the course pivots to random variables, which then run through the rest of the semester.
Two gamblers, A and B, repeatedly bet \$1 on the same game. Each round is independent:
After each round one dollar changes hands. The game (the whole process) ends when one gambler is bankrupt — that gambler is "ruined" and the other has all the money.
The goal: find the probability that A wins the entire game (B is ruined). By symmetry, solving this also solves the probability that A is ruined.
Draw a number line with $0$ at one end and $N$ at the other, and the starting position $i$ somewhere in between (no assumption about whether $i$ is near $0$, near $N$, or in the middle). A particle at position $i$ moves one step right with probability $p$ or one step left with probability $q$ each round.
Track the random walk as A's current wealth (B's is then $N$ minus that). The endpoints $0$ and $N$ are absorbing states: think of them as black holes — once the walk lands there, it is stuck forever and the game is over. Reaching $N$ means A has all the money; reaching $0$ means A is bankrupt. This is the exact same problem, just visualized differently.
The problem looks hard: the number of rounds is unknown, and one might worry the walk could oscillate forever. But it has a clean recursive structure. After one round there are two cases:
So after one step we face the same problem with a shifted starting point. This tells us what to condition on: the outcome of the first step. This technique is called first-step analysis.
A lesson in disciplined wishful thinking: we cannot condition on the entire future, so we condition on the one useful piece (who won the first round) that breaks the problem into manageable parts.
Counterintuitively, the general problem is easier than a numerically specific one. If the problem fixed $i = \$100$ and $N = \$212$, focusing on that single probability would hide the key idea — relating $p_i$ to $p_{i+1}$ and $p_{i-1}$. We must solve for the whole family of probabilities at once.
Let $p_i =$ probability A wins the entire game given that A currently has $i$ dollars. By the law of total probability, conditioning on the first round:
$$p_i = p\,p_{i+1} + q\,p_{i-1}, \qquad 1 \le i \le N - 1$$
with boundary conditions:
$$p_0 = 0 \quad (\text{A starts bankrupt}) \qquad p_N = 1 \quad (\text{B starts bankrupt})$$
This is a difference equation — the discrete analog of a differential equation, defining each term in relation to its neighbors. A computer could solve the recursion directly, but a closed form is better.
Blitzstein's aside: difference equations are at least as important as differential equations yet are rarely taught anywhere. Observations over time are inherently discrete — you can only observe in discrete time — so difference equations arise naturally, and it is better to solve them directly than to approximate with a differential equation.
Standard textbooks (Ross, DeGroot) grind through several pages of algebra. Instead, borrow a trick from differential equations: guess a solution of the form
This particular guess won't be the final answer, but it reveals the structure — and it is the right general strategy for any linear, constant-coefficient difference equation. Substitute:
Discard the trivial $x = 0$ (it cannot satisfy the boundary conditions), divide through by $x^{i-1}$:
A plain quadratic. By the quadratic formula:
Since $q = 1 - p$:
$$1 - 4pq = 1 - 4p(1 - p) = 4p^2 - 4p + 1 = (2p - 1)^2$$
So $\sqrt{1 - 4pq} = |2p - 1|$, and the two roots are:
The roots are simply $1$ and $q/p$.
For distinct roots (i.e., $p \ne q$), the general solution is a linear combination of each root raised to the $i$-th power:
(This linear-combination rule is general: with distinct roots, sum each root-to-the-power; repeated roots require a more elaborate form, handled below by a limit.) Apply the boundaries:
$$p_i = \frac{1 - (q/p)^i}{1 - (q/p)^N}$$
When $p = q = \tfrac{1}{2}$, the two roots coincide (root $1$ with multiplicity two) and the difference-equation theory gets more delicate. Easier: take a limit. Let $x = q/p$ and let $x \to 1$ in the unfair-case solution. The expression $\dfrac{1 - x^i}{1 - x^N}$ is of the form $0/0$, so apply L'Hôpital's rule (differentiate numerator and denominator in $x$):
$$p_i = \frac{i}{N}$$
One should verify this is genuinely the solution (not just a suspicious limit) by plugging $i/N$ back into the difference equation — it checks out. Intuitively, it must: letting the unfair game approach fairness should make its solution approach the fair one, and it does, with no discontinuity.
The conditional-thinking and pattern-recognition are the points to remember, not the formula. But the numbers carry a moral.
$p_i = i/N$ says A's chance of winning equals the fraction of the total wealth A holds. Start with $\tfrac{2}{3}$ of the money, win with probability $\tfrac{2}{3}$. Simple and memorable.
Take equal starting wealth ($i = N/2$) and a game only slightly tilted against A: $p = 0.49$ (just $0.01$ from fair).
| Total $N$ | Each starts with | $P(\text{A wins})$ |
|---|---|---|
| $20$ | \$10 | $\approx 40\%$ |
| $100$ | \$50 | $\approx 12\%$ |
| $200$ | \$100 | $\approx 2\%$ |
With \$100 each and a mere $1\%$ disadvantage per round, A loses everything about $98\%$ of the time.
In a casino, the house almost always has far more money than you — so you are also at a fraction-of-wealth disadvantage on top of the per-round edge. Games range from slightly to very unfair. The conclusion: keep gambling against the house and, with high probability, you go broke.
Does the game ever wander forever without ending? We never computed that directly, but we don't need to. Ask the mirror-image question — the probability B wins and A is ruined — by relabeling A as B (so B starts with $N - i$): in the fair case it is $(N - i)/N$. Adding the two outcomes:
The two ending probabilities sum to $1$, leaving zero probability that the game oscillates forever (the same holds in the unfair case). Mathematically nothing forbids an infinite game, but it has probability zero — so it never actually happens.
The event-based notation becomes unwieldy fast. Describing "player A has $i$ dollars at time $7$," or constraints on totals and differences across players, would require endless ad-hoc event names. We want to work with numbers and variables, as in ordinary math. Enter the random variable — but its definition is subtle.
Bad textbook/Wikipedia attempts: "a variable that takes random values," or "a quantity whose values are random and to which a probability distribution is assigned." These give no real content — they lean on "distribution" without defining it, and even then don't say what the object is. The formally correct alternative ("a measurable function from one space to another") is accurate but opaque without measure theory and real analysis.
In "$x + 2 = 9$" we call $x$ a variable and solve to get $x = 7$ — but $7$ is a constant, and it never changes into anything else. Even an equation with two solutions gives $x = 5$ or $x = 7$, never genuinely both at once. The symbol $x$ stands for a constant; there is no such thing as a "variable number." To capture something genuinely changing, the right notion is a function (whose graph shows variation), not a lone symbol in an equation.
A random variable is not a variable at all — it is a function
$$X : S \to \mathbb{R}$$
from the sample space $S$ (the outcomes of a random experiment) to the real line.
The crucial question is where the randomness comes from, since functions are deterministic. The answer: the randomness comes from the experiment. We run the random experiment and observe one specific outcome $s \in S$; the random variable then maps that outcome to a real number $X(s)$. The probabilities live on the sample space (different outcomes have different probabilities); $X$ simply reports a numerical summary of an aspect of the experiment. Working with real numbers is far friendlier than working with abstract outcomes $s$.
Think of a random variable as a numerical summary of an aspect of the experiment ("summary" interpreted broadly — it need not capture the whole experiment).
The simplest non-constant random variable has just two possible values; take them to be $0$ and $1$.
$X \sim \text{Bern}(p)$ if $X$ takes only the values $0$ and $1$ with:
$$P(X = 1) = p \qquad P(X = 0) = 1 - p$$
Whatever the underlying experiment, the outcome is mapped to either $1$ (with probability $p$) or $0$ (with probability $1 - p$).
A point of rigor: $\{X = 1\}$ is itself an event — otherwise $P(X = 1)$ would be meaningless, since we can only take $P$ of an event. Formally, $$\{X = 1\} = \{\, s \in S : X(s) = 1 \,\}.$$ Intuitively: before the experiment we don't know $X$; after it, $X$ turns out to equal $1$ (or $0$). That set of outcomes is the event.
A Bernoulli trial is one experiment yielding success ($1$) or failure ($0$); we get to define "success" however is convenient. Run $n$ independent $\text{Bern}(p)$ trials (the canonical example: flip a coin $n$ times). The number of successes has the Binomial$(n, p)$ distribution.
$X \sim \text{Bin}(n, p)$ means $X$ is the number of successes in $n$ independent $\text{Bern}(p)$ trials, so $X$ is an integer from $0$ to $n$.
To "specify the distribution" of $X$, give the probabilities of its possible values — here, $P(X = k)$ for each $k$. One specific way to get $k$ successes is $k$ successes followed by $n - k$ failures, e.g. with $n = 7$, $k = 3$:
That sequence has probability $p^3 (1 - p)^4$, and in general $p^k (1 - p)^{n-k}$. But any rearrangement is equally good, and the number of arrangements is the number of ways to choose which positions are successes: $\binom{n}{k}$. Therefore:
$$P(X = k) = \binom{n}{k} p^k (1 - p)^{n-k}, \qquad k = 0, 1, \ldots, n$$
This is the probability mass function (PMF) — the rule giving the probability of each value of $X$.
Story proof — no algebra needed. Consider $n$ $\text{Bern}(p)$ trials and then $m$ more, all independent. $X$ counts the successes in the first $n$, $Y$ counts the successes in the remaining $m$, so $X + Y$ counts the total successes in $n + m$ independent $\text{Bern}(p)$ trials — which is, by definition, $\text{Bin}(n + m, p)$.