This review lecture revisits a handful of representative problems rather than introducing new theory. Each one drills a core technique — linearity, symmetry, universality of the uniform, LOTUS, going from a CDF to a PDF, and story proofs — applied to a self-contained example. The recurring meta-lesson: break a hard random variable into pieces, recognize the pattern, and let structure (not brute calculation) do the work.
1. Coupon Collector: Linearity and Geometrics
The problem
You collect toys (or coupons) that come in $n$ equally likely types. Each purchase gives one uniformly random type. On average, how many toys must you buy to collect a complete set of all $n$ types? "Time" is measured discretely as the number of toys purchased.
The equally-likely assumption matters: with unequal probabilities the calculation becomes extremely tedious (pages of work), so exam problems keep it equally likely.
Break the total into pieces
Let $T$ be the total number of toys needed. Decompose it by milestones:
$$T = T_1 + T_2 + \cdots + T_n,$$
where $T_j$ is the additional time spent waiting for the $j$-th new type, after you already hold $j - 1$ distinct types ("new" = a type you don't already have).
$T_1 = 1$ always: the very first toy is automatically new.
$T_2 = $ additional toys until the second new type, and so on up to $T_n$.
Each piece is geometric
Suppose you currently hold $j - 1$ distinct types. On each new purchase, the chance of getting something new is
$$p_j = \frac{n - (j - 1)}{n},$$
since $n - (j - 1)$ of the $n$ types are still missing; with the complementary probability you draw a duplicate and try again. So the number of extra trials past the previous milestone, $T_j - 1$, is $\mathrm{Geom}(p_j)$. (Subtracting $1$ matches the convention that a Geometric counts failures before the first success and so starts at $0$; you always need at least one more purchase, hence the $+1$ when converting back.)
Add up with linearity
Why linearity wins
Linearity of expectation gives the answer directly — and would hold even if the $T_j$ were dependent. (Here they happen to be independent, but independence is not required.)
$$E(T) = E(T_1) + E(T_2) + \cdots + E(T_n).$$
For a Geometric starting at $1$ the mean is $1/p$ (that is, $q/p + 1$, adding back the $+1$). Therefore:
The bracketed sum is the $n$-th harmonic number $H_n$.
Result and approximation
Answer
$$E(T) = n \, H_n = n\left(1 + \tfrac{1}{2} + \cdots + \tfrac{1}{n}\right) \approx n \log n \quad (\text{large } n).$$
Give the exact answer ($n\,H_n$) on an exam unless an approximation is explicitly requested; $n \log n$ is the handy large-$n$ estimate. What looked like a hard problem becomes easy once it is split into geometric pieces and reassembled with linearity.
· · ·
2. Universality of the Uniform
This was the number-one review request. The key statement: if $X$ is a continuous random variable with a strictly increasing, continuous CDF $F$, then plugging $X$ into its own CDF yields a $\mathrm{Unif}(0,1)$:
$$F(X) \sim \mathrm{Unif}(0, 1).$$
(This is one of the two equivalent halves of universality; only this half is treated here.)
Why it's true — a geometric argument
Draw a generic CDF $F$: increasing, continuous, leveling off at $1$ (either hitting $1$ and staying, or approaching $1$ asymptotically). The horizontal axis is the value $x$; the vertical axis runs $0$ to $1$.
Pick any target height on the vertical axis — say $\tfrac{1}{3}$ — and let $x_0$ be the point with $F(x_0) = \tfrac{1}{3}$ (the inverse-CDF value). Now ask for the probability that $F(X) \le \tfrac{1}{3}$, where $X$ is random with CDF $F$. To compute $F(X)$ you first draw $X$ from the distribution, then read off its height $F(X)$.
The event $F(X) \le \tfrac{1}{3}$ happens exactly when $X$ lands to the left of $x_0$ (if $X$ were to the right, its height would exceed $\tfrac{1}{3}$). Nothing was special about $\tfrac{1}{3}$: for any $y \in [0, 1]$, $P(F(X) \le y) = y$. That is precisely the $\mathrm{Unif}(0,1)$ CDF — probability proportional to length on $[0, 1]$.
The mechanism just translates a random point on the $x$-axis into a uniformly random height between $0$ and $1$.
Using it to simulate: the logistic distribution
The flip side of universality lets you generate draws from any continuous distribution given a uniform random number, via the inverse CDF. The logistic distribution (the basis of logistic regression, widely used in economics and statistics) has CDF
$$F(x) = \frac{e^x}{1 + e^x}, \qquad x \in \mathbb{R}.$$
(Good practice: verify this is a valid CDF — continuous, increasing, with the right limits.) To simulate it, let $U \sim \mathrm{Unif}(0,1)$ and apply the inverse CDF. Setting $F(x) = u$ and solving for $x$ gives
$F^{-1}(U) = \log\!\big(U / (1 - U)\big)$ is a draw from the logistic distribution. The expression is the log-odds: if $u$ were a probability, $u/(1-u)$ is the odds and its log is the log-odds. You could confirm by computing the CDF of $F^{-1}(U)$ directly, but it is easier to recognize why it works through universality.
· · ·
3. Symmetry and Linearity
The problem
Let $X, Y, Z$ be i.i.d. positive random variables. Find $E\!\left(\dfrac{X}{X + Y + Z}\right)$. No PDF, PMF, discrete/continuous assumption, or formula is given — only "i.i.d. positive." Positivity is solely to avoid dividing by zero. So the result must be completely general.
Pitfall
Linearity is for sums only. $E$ of a ratio is not the ratio of expectations; there is no rule turning $E\!\left(\frac{X}{X+Y+Z}\right)$ into $\frac{E(X)}{E(X+Y+Z)}$. The trick is symmetry, not a quotient formula.
Solve by symmetry, then linearity
Because $X, Y, Z$ are i.i.d., relabeling them cannot change the answer. The denominator $X + Y + Z$ is order-independent, so the three expectations are identical:
The answer must lie strictly between $0$ and $1$, since the positive numerator is smaller than the (larger) denominator — an answer like $4$ would be obviously wrong. And $\tfrac{1}{3}$ matches intuition: among three symmetric contributors, each accounts on average for one third of the total. Intuition is a good guess, but the symmetry-plus-linearity argument is the actual proof.
· · ·
4. LOTUS: Pattern Over Variable Names
LOTUS (the Law of the Unconscious Statistician) looks simple but causes frequent mistakes. The fix is to focus on the pattern — integrate the function against the density of whatever variable you are expanding in — rather than on what the variables happen to be named.
Example
Let $U \sim \mathrm{Unif}(0,1)$, $X = U^2$, and $Y = e^X$. Find $E(Y)$, written as an integral.
Note on exams: if an integral is hard, the prompt will say to leave it as an integral; otherwise compute and fully simplify to a number.
Two correct LOTUS approaches
Approach 1 — expand in $X$
Treat $Y = e^X$ as a function of $X$ and apply LOTUS over $X$'s density:
$$E(Y) = \int_0^1 e^x \, f_X(x) \, dx,$$
where $f_X$ is the PDF of $X$ (the limits are $0$ to $1$ because squaring a number in $[0,1]$ stays in $[0,1]$). This is correct but incomplete until you actually supply $f_X$ — "the PDF of $X$" is not an answer by itself. Getting $f_X$ requires finding the CDF of $X$ and differentiating (see next section).
Approach 2 — expand in $U$
Since $Y = e^X = e^{U^2}$ is also a function of $U$, apply LOTUS over $U$'s density, which is just $1$ on $[0,1]$:
$$E(Y) = \int_0^1 e^{u^2} \, du.$$
This can be written down immediately, with no extra PDF computation. (Solving this integral is a different matter — it resembles the Gaussian integral without the minus sign — but the problem only asks for the integral form.)
Both correct
Both answers earn full credit, provided you actually write out the density you integrate against.
The common pitfall
Students mix the two approaches — for instance, letting stray $X$'s appear in a problem where $X$ was never defined. The mindset that "LOTUS is about $X$" because the rule was first stated with a variable named $X$ is the trap. LOTUS is a pattern (function times the appropriate density, integrated or summed), independent of variable names.
· · ·
5. From CDF to PDF
A recurring subskill: to get a PDF of a transformed variable, first find its CDF by reducing the event back to something understood (here, the uniform), then differentiate.
Take $X = U^2$ with $U \sim \mathrm{Unif}(0,1)$, and find $f_X$.
Understand what a CDF is, reduce the event back to the uniform we already understand, then differentiate. Substituting this $f_X$ completes Approach 1 above.
· · ·
6. Story Proof: Distribution of $n - X$
A quick review of a story (interpretation) proof. Let $X \sim \mathrm{Bin}(n, p)$. Find the distribution of $n - X$.
The PMF method (more work)
$n - X$ is discrete; compute its PMF. For each $k$:
Using $\binom{n}{n-k} = \binom{n}{k}$, rewrite this as
$$P(n - X = k) = \binom{n}{k} q^{k} p^{n-k},$$
which is exactly the $\mathrm{Bin}(n, q)$ PMF, so $n - X \sim \mathrm{Bin}(n, q)$.
The story proof (one line)
$$X \sim \mathrm{Bin}(n, p) \;\Longrightarrow\; n - X \sim \mathrm{Bin}(n, q), \quad q = 1 - p.$$
$X$ counts the number of successes in $n$ i.i.d. $\mathrm{Bern}(p)$ trials, so $n - X$ counts the number of failures. Since each trial is either a success or a failure (never both) and you may define which outcome is "success," simply swap the roles: relabel failure as success and success as failure. Immediately $n - X \sim \mathrm{Bin}(n, q)$ — no calculation, just swap success and failure.
Exam tip: don't waste time writing "it is immediately obvious"; one short sentence naming the swap is enough, and conserving time matters.
· · ·
7. Poisson to Exponential: First-Arrival Time
A Poisson example that links a discrete count to a continuous waiting time.
Setup
Suppose the number of emails received in a time interval of length $t$ is
$$N_t \sim \mathrm{Pois}(\lambda t),$$
where $\lambda$ is a rate (e.g., $20$ emails/hour) and $\lambda t$ is the expected count over the interval. Recall $\lambda t$ is both the mean and the variance of a Poisson. Different intervals of the same length yield different counts, so $N_t$ is genuinely random.
Define $T = $ the time of the first email, with the clock starting at $0$. $T$ is continuous (an email can arrive at any real time), connecting the discrete count $N_t$ to a continuous waiting time.
Find the CDF via the complement
Strategy
When finding a probability, ask whether the event or its complement is easier. Here the complement wins: "first email after time $t$" is exactly "zero emails in $[0, t]$."
$$f_T(t) = \lambda e^{-\lambda t}, \qquad t > 0.$$
This is the exponential distribution (to be studied formally later).
The example shows how the discrete Poisson count drives the continuous first-arrival PDF — the same complement-then-differentiate move as the $U^2$ example, now bridging discrete and continuous.
· · ·
8. Closing Reminder: Three Classes of Objects
Keep three things distinct; conflating them causes much of the trouble in this course.
Distribution
The blueprint for creating a random variable (e.g., a CDF). The "random house" blueprint.
Random variable
The random quantity itself. The "random house."
Constant
A single fixed value. A specific, fixed house.
A distribution is the plan, a random variable is the random realization-generating object, and a constant is one concrete number. Mixing them up is a frequent source of error.