The previous lecture proved the universality of the uniform: any distribution can be generated from a single $\mathrm{Unif}(0,1)$. This lecture revisits the theorem, works an example, and explores its flip side before moving to the normal distribution.
Let $F$ be a continuous, strictly increasing CDF. These assumptions are stronger than necessary — in general a CDF need only be right-continuous and nondecreasing (allowing flat regions) — but they make the inverse $F^{-1}$ well defined and the proof clean.
If $U \sim \mathrm{Unif}(0,1)$ and we define $X = F^{-1}(U)$, then $X$ has CDF $F$.
This is unusual in direction. Normally we start with a random variable and derive its CDF; here we start with the CDF and synthesize a random variable that has it. That is why it is called "universality" — starting from one uniform, we can in principle create a random variable with any desired distribution.
The result is the foundation of simulation. Uniforms are easy to generate on a computer; other continuous distributions are not. To simulate draws from $F$:
In some cases $F^{-1}$ is easy to write down analytically; in many cases it is hard or impossible in closed form. But conceptually, the uniform gives you everything.
The theorem also justifies an earlier claim: the three properties of a CDF (right-continuous, nondecreasing, limits $0$ and $1$) are not just necessary but sufficient — any function with those properties really is the CDF of some random variable.
The theorem runs the other way too. Start with $X$ having CDF $F$ (no uniform yet). Applying $F$ to both sides of $X = F^{-1}(U)$ gives:
This is self-referential and looks mysterious: we take a random variable and plug it into its own CDF. It is legitimate because $F$ is just a function, and a function of a random variable is a random variable. Since any CDF takes values in $[0,1]$, $F(X)$ automatically lands in $[0,1]$ — consistent with being uniform, though not yet a proof of it.
This identity is useful in statistical inference (Stat 111): $X$ may have a complicated or unknown distribution, but reducing it to a known, simple $\mathrm{Unif}(0,1)$ is convenient for model checking. If many instances of $F(X)$ do not look uniform, the model is suspect.
The CDF is $F(x) = P(X \le x)$. It is tempting to plug in capital $X$ blindly: $F(X) = P(X \le X)$. But the event $\{X \le X\}$ always happens, so this would force $F(X) = 1$. That step is invalid. The correct reading: treat $F$ as a function written in terms of a placeholder $x$, then substitute the random variable $X$ for that placeholder.
This is the $\mathrm{Expo}(1)$ distribution: continuous everywhere and strictly increasing on the positive side.
To simulate $X \sim \mathrm{Expo}(1)$, invert $F$. Set $u = 1 - e^{-x}$ and solve for $x$ (ordinary algebra):
By universality, $-\log(1-U)$ has CDF $F$. To draw $10$ i.i.d. exponentials, generate $10$ i.i.d. uniforms and apply this function to each.
While inverting the exponential CDF we used $1 - U$. A useful fact: if $U \sim \mathrm{Unif}(0,1)$, then $1 - U \sim \mathrm{Unif}(0,1)$ as well. So we could equally have written $X = -\log(U)$.
Intuition: $U$ is a random point in $[0,1]$. Measuring its distance from the left end ($U$) versus the right end ($1 - U$) is just a relabeling of the same random point — the distribution is unchanged. (Worth verifying by computing the CDF directly, as good practice with CDFs and PDFs.)
More generally, $a + bU$ (with $a, b$ constants) is uniform on the appropriate interval. For example, to go from $\mathrm{Unif}(0,1)$ to $\mathrm{Unif}(0,10)$, multiply by $10$.
Nonlinear usually means non-uniform. Do not assume any function of $U$ that stays in $[0,1]$ is uniform. For instance $U^2 \in [0,1]$ but is not uniform — compute its CDF and it does not match the uniform CDF. Always check rather than assume.
Independence of random variables is defined directly in terms of independence of events.
Random variables $X_1, \ldots, X_n$ are independent if, for all $x_1, \ldots, x_n$,
$$P(X_1 \le x_1, \ldots, X_n \le x_n) = \prod_{i=1}^{n} P(X_i \le x_i).$$
The left side is the joint CDF (all variables considered together); the right side is the product of the individual marginal CDFs. To find the probability of the intersection, just multiply.
This looks simpler than independence of events, where (for three events) the triple intersection is not enough — all pairwise statements are also required. The resolution: the condition here holds for all $x_1, \ldots, x_n$, so the single-looking equation is actually uncountably many equations, not one.
In the discrete case it is usually easier to use PMFs. Replace "$\le$" by "$=$":
The left side is the joint PMF. In the discrete case the joint-CDF and joint-PMF conditions are equivalent (the proof is tedious but routine bookkeeping with sums).
Full independence means: knowing the values of any subcollection tells you nothing about the others. This is strictly stronger than pairwise independence, which only says no single variable carries information about any other single variable.
Two fair coin flips, plus an indicator of whether they match (the "matching pennies" game). These three are pairwise independent but not independent.
Pairwise independence is therefore not enough to guarantee full independence.
The normal distribution (also called the Gaussian) is by far the most important distribution in statistics.
Blitzstein prefers "normal" to "Gaussian": Gauss already has plenty named after him, and he was not the first to use this distribution — so it is not quite fair to give him the credit. The two terms refer to the same thing.
The headline reason is the Central Limit Theorem (CLT), possibly the most famous theorem in all of probability (proved later in the course). Stated informally:
If you add up a large number of i.i.d. random variables, the distribution of the sum looks approximately normal (after appropriate shifting and scaling).
What is shocking is the universality: the summands can be continuous or discrete, "beautiful or ugly" — almost anything. Their sum always approaches the same bell shape. Of the many curves that look bell-shaped, this one specific curve is the one that always arises. (There are further generalizations beyond the i.i.d. case, under technical assumptions.)
The PDF is a symmetric bell-shaped curve. Among the infinitely many curves of that general shape, the normal is one specific function. Start with the standard normal, written $\mathcal{N}(0,1)$ — mean $0$, variance $1$ (to be verified). The normal family has two parameters: mean and variance.
Its PDF (using $Z$, the traditional letter for a standard normal) is
where $c$ is a normalizing constant chosen so the total area is $1$. Two properties are visible immediately:
To pin down $c$, we need the value of the Gaussian integral
Standard tricks — $u$-substitution, integration by parts, other changes of variable — all fail. The reason is a theorem: the indefinite integral of $e^{-z^2/2}$ cannot be expressed in closed form using elementary functions ($\sin, \cos, \exp, \log$, polynomials, etc.). It is not that no one has found it; it is provably impossible.
(One could expand $e^{-z^2/2}$ as a Taylor series and integrate term by term — every term is an easy polynomial integral — but the result is an infinite series we cannot simplify.) Impossibility of the antiderivative does not rule out finding the definite integral by another route.
Write the integral down a second time and multiply the two copies, using dummy variables $x$ and $y$:
The combination $x^2 + y^2$ is the cue to switch to polar coordinates: $x^2 + y^2 = r^2$, with radius $r \in [0,\infty)$ and angle $\theta \in [0, 2\pi)$. The one fact needed from multivariable calculus is the Jacobian: $dx\,dy$ is replaced by $r\,dr\,d\theta$, not just $dr\,d\theta$. That extra factor of $r$ is exactly what makes the problem solvable.
$$I^2 = \int_{0}^{2\pi}\!\int_{0}^{\infty} e^{-r^2/2}\, r\, dr\, d\theta.$$
Inner integral: substitute $u = r^2/2$, so $du = r\,dr$, giving $\int_0^\infty e^{-u}\,du = 1$. The outer integral is then $\int_0^{2\pi} 1 \, d\theta = 2\pi$.
$I^2 = 2\pi$, so $I = \sqrt{2\pi}$. Since the integral was written down twice, a single copy is $\sqrt{2\pi}$. Hence the normalizing constant is
$$c = \frac{1}{\sqrt{2\pi}}.$$
It is striking that integrating an exponential produces $\pi$. The $\pi$ entered through the polar angle, which sweeps a full circle. The standard normal PDF is therefore
Let $Z \sim \mathcal{N}(0,1)$. Then
If $g(x)$ is an odd function ($g(-x) = -g(x)$), then $\int_{-a}^{a} g(x)\,dx = 0$: the negative area cancels the positive area (as with $\sin$). Here $z\,e^{-z^2/2}$ is odd — replacing $z$ by $-z$ leaves $e^{-z^2/2}$ unchanged but flips the leading $z$ — so the integral is $0$ with no computation.
Since $E(Z) = 0$,
To get $E(Z^2)$, use LOTUS (Law of the Unconscious Statistician): no need for the PDF of $Z^2$; integrate $z^2$ against the PDF of $Z$ directly. The integrand is even, so integrate over $[0,\infty)$ and double:
Integrate by parts with $z^2 = z \cdot z$:
$$E(Z^2) = \frac{2}{\sqrt{2\pi}} \left\{ \Big[-z\, e^{-z^2/2}\Big]_{0}^{\infty} + \int_{0}^{\infty} e^{-z^2/2}\, dz \right\}.$$
The boundary term is $0$ at both ends ($0$ at $z = 0$; exponentially small as $z \to \infty$). The remaining integral is exactly half the Gaussian integral, $\tfrac{1}{2}\sqrt{2\pi}$. Multiplying:
$$E(Z^2) = \frac{2}{\sqrt{2\pi}} \cdot \frac{1}{2}\sqrt{2\pi} = 1, \qquad \text{so } \mathrm{Var}(Z) = 1.$$
This confirms the "$1$" in $\mathcal{N}(0,1)$.
Because the standard normal is so important — and its integral so hard — its CDF gets its own name, the Greek capital $\Phi$:
(The dummy variable is renamed $t$ to avoid clashing with the upper limit $z$.) Although the antiderivative has no elementary form, $\Phi(z)$ is easy to evaluate numerically and is widely tabulated. Treating $\Phi$ as a standard function sidesteps the impossibility of the closed-form integral.
$$\Phi(-z) = 1 - \Phi(z).$$ Worth verifying by drawing a picture — good practice with symmetry and CDFs.
Next time: the general (non-standard) normal, obtained by shifting and scaling the standard normal.