Lecture 13: Normal Distribution

Harvard Statistics 110 (Joe Blitzstein)
Watch on YouTube

1. Universality of the Uniform (Review)

The previous lecture proved the universality of the uniform: any distribution can be generated from a single $\mathrm{Unif}(0,1)$. This lecture revisits the theorem, works an example, and explores its flip side before moving to the normal distribution.

Statement

Let $F$ be a continuous, strictly increasing CDF. These assumptions are stronger than necessary — in general a CDF need only be right-continuous and nondecreasing (allowing flat regions) — but they make the inverse $F^{-1}$ well defined and the proof clean.

Universality of the Uniform

If $U \sim \mathrm{Unif}(0,1)$ and we define $X = F^{-1}(U)$, then $X$ has CDF $F$.

This is unusual in direction. Normally we start with a random variable and derive its CDF; here we start with the CDF and synthesize a random variable that has it. That is why it is called "universality" — starting from one uniform, we can in principle create a random variable with any desired distribution.

Why it matters: simulation

The result is the foundation of simulation. Uniforms are easy to generate on a computer; other continuous distributions are not. To simulate draws from $F$:

Generate $U \sim \mathrm{Unif}(0,1)$.
Compute $X = F^{-1}(U)$.

In some cases $F^{-1}$ is easy to write down analytically; in many cases it is hard or impossible in closed form. But conceptually, the uniform gives you everything.

Sufficiency of the CDF axioms

The theorem also justifies an earlier claim: the three properties of a CDF (right-continuous, nondecreasing, limits $0$ and $1$) are not just necessary but sufficient — any function with those properties really is the CDF of some random variable.

The flip side: plug a random variable into its own CDF

The theorem runs the other way too. Start with $X$ having CDF $F$ (no uniform yet). Applying $F$ to both sides of $X = F^{-1}(U)$ gives:

$$F(X) \sim \mathrm{Unif}(0,1)$$

This is self-referential and looks mysterious: we take a random variable and plug it into its own CDF. It is legitimate because $F$ is just a function, and a function of a random variable is a random variable. Since any CDF takes values in $[0,1]$, $F(X)$ automatically lands in $[0,1]$ — consistent with being uniform, though not yet a proof of it.

This identity is useful in statistical inference (Stat 111): $X$ may have a complicated or unknown distribution, but reducing it to a known, simple $\mathrm{Unif}(0,1)$ is convenient for model checking. If many instances of $F(X)$ do not look uniform, the model is suspect.

Notational warning

The CDF is $F(x) = P(X \le x)$. It is tempting to plug in capital $X$ blindly: $F(X) = P(X \le X)$. But the event $\{X \le X\}$ always happens, so this would force $F(X) = 1$. That step is invalid. The correct reading: treat $F$ as a function written in terms of a placeholder $x$, then substitute the random variable $X$ for that placeholder.

Worked example: the exponential distribution

$$F(x) = 1 - e^{-x}, \quad x > 0 \qquad (F(x)=0 \text{ for } x \le 0)$$

This is the $\mathrm{Expo}(1)$ distribution: continuous everywhere and strictly increasing on the positive side.

To simulate $X \sim \mathrm{Expo}(1)$, invert $F$. Set $u = 1 - e^{-x}$ and solve for $x$ (ordinary algebra):

$$X = F^{-1}(U) = -\log(1 - U)$$

By universality, $-\log(1-U)$ has CDF $F$. To draw $10$ i.i.d. exponentials, generate $10$ i.i.d. uniforms and apply this function to each.

· · ·

2. Symmetry and Transformations of the Uniform

$1 - U$ is also Uniform$(0,1)$

While inverting the exponential CDF we used $1 - U$. A useful fact: if $U \sim \mathrm{Unif}(0,1)$, then $1 - U \sim \mathrm{Unif}(0,1)$ as well. So we could equally have written $X = -\log(U)$.

Intuition: $U$ is a random point in $[0,1]$. Measuring its distance from the left end ($U$) versus the right end ($1 - U$) is just a relabeling of the same random point — the distribution is unchanged. (Worth verifying by computing the CDF directly, as good practice with CDFs and PDFs.)

Linear transformations preserve uniformity

More generally, $a + bU$ (with $a, b$ constants) is uniform on the appropriate interval. For example, to go from $\mathrm{Unif}(0,1)$ to $\mathrm{Unif}(0,10)$, multiply by $10$.

Common mistake

Nonlinear usually means non-uniform. Do not assume any function of $U$ that stays in $[0,1]$ is uniform. For instance $U^2 \in [0,1]$ but is not uniform — compute its CDF and it does not match the uniform CDF. Always check rather than assume.

· · ·

3. Independence of Random Variables

Independence of random variables is defined directly in terms of independence of events.

Definition via the joint CDF

Independence (general)

Random variables $X_1, \ldots, X_n$ are independent if, for all $x_1, \ldots, x_n$,

$$P(X_1 \le x_1, \ldots, X_n \le x_n) = \prod_{i=1}^{n} P(X_i \le x_i).$$

The left side is the joint CDF (all variables considered together); the right side is the product of the individual marginal CDFs. To find the probability of the intersection, just multiply.

This looks simpler than independence of events, where (for three events) the triple intersection is not enough — all pairwise statements are also required. The resolution: the condition here holds for all $x_1, \ldots, x_n$, so the single-looking equation is actually uncountably many equations, not one.

Discrete case: the joint PMF

In the discrete case it is usually easier to use PMFs. Replace "$\le$" by "$=$":

$$P(X_1 = x_1, \ldots, X_n = x_n) = \prod_{i=1}^{n} P(X_i = x_i).$$

The left side is the joint PMF. In the discrete case the joint-CDF and joint-PMF conditions are equivalent (the proof is tedious but routine bookkeeping with sums).

Full vs. pairwise

Full independence means: knowing the values of any subcollection tells you nothing about the others. This is strictly stronger than pairwise independence, which only says no single variable carries information about any other single variable.

Example: pairwise independent but not independent (matching pennies)

$X_1, X_2 \overset{\text{iid}}{\sim} \mathrm{Bern}(1/2), \qquad X_3 = \mathbb{1}(X_1 = X_2)$

Two fair coin flips, plus an indicator of whether they match (the "matching pennies" game). These three are pairwise independent but not independent.

Not independent: $X_3$ is a function of $X_1$ and $X_2$, so knowing both gives total information about $X_3$.
$X_1, X_2$ independent: by assumption.
$X_1, X_3$ independent: knowing $X_1 = 1$ reduces "$X_3 = 1$" to "$X_2 = 1$", still $50/50$. So $X_1$ carries no information about $X_3$. By symmetry the same holds for $X_2$ and $X_3$.

Pairwise independence is therefore not enough to guarantee full independence.

· · ·

4. The Normal Distribution

The normal distribution (also called the Gaussian) is by far the most important distribution in statistics.

Blitzstein prefers "normal" to "Gaussian": Gauss already has plenty named after him, and he was not the first to use this distribution — so it is not quite fair to give him the credit. The two terms refer to the same thing.

Why it is fundamental: the Central Limit Theorem

The headline reason is the Central Limit Theorem (CLT), possibly the most famous theorem in all of probability (proved later in the course). Stated informally:

CLT (informal)

If you add up a large number of i.i.d. random variables, the distribution of the sum looks approximately normal (after appropriate shifting and scaling).

What is shocking is the universality: the summands can be continuous or discrete, "beautiful or ugly" — almost anything. Their sum always approaches the same bell shape. Of the many curves that look bell-shaped, this one specific curve is the one that always arises. (There are further generalizations beyond the i.i.d. case, under technical assumptions.)

The standard normal PDF

The PDF is a symmetric bell-shaped curve. Among the infinitely many curves of that general shape, the normal is one specific function. Start with the standard normal, written $\mathcal{N}(0,1)$ — mean $0$, variance $1$ (to be verified). The normal family has two parameters: mean and variance.

Its PDF (using $Z$, the traditional letter for a standard normal) is

$$f(z) = c \, e^{-z^2 / 2},$$

where $c$ is a normalizing constant chosen so the total area is $1$. Two properties are visible immediately:

Symmetric: replacing $z$ by $-z$ leaves $f$ unchanged.
Fast decay: $e^{-z^2/2}$ decays to $0$ very quickly as $|z|$ grows (exponential decay of a squared argument).

· · ·

5. Finding the Normalizing Constant

To pin down $c$, we need the value of the Gaussian integral

$$I = \int_{-\infty}^{\infty} e^{-z^2 / 2} \, dz.$$

Why the usual methods fail

Standard tricks — $u$-substitution, integration by parts, other changes of variable — all fail. The reason is a theorem: the indefinite integral of $e^{-z^2/2}$ cannot be expressed in closed form using elementary functions ($\sin, \cos, \exp, \log$, polynomials, etc.). It is not that no one has found it; it is provably impossible.

(One could expand $e^{-z^2/2}$ as a Taylor series and integrate term by term — every term is an easy polynomial integral — but the result is an infinite series we cannot simplify.) Impossibility of the antiderivative does not rule out finding the definite integral by another route.

The trick: square the integral, then go polar

Write the integral down a second time and multiply the two copies, using dummy variables $x$ and $y$:

$$I^2 = \left(\int_{-\infty}^{\infty} e^{-x^2/2}\,dx\right)\!\left(\int_{-\infty}^{\infty} e^{-y^2/2}\,dy\right) = \iint e^{-(x^2 + y^2)/2}\, dx\, dy.$$

The combination $x^2 + y^2$ is the cue to switch to polar coordinates: $x^2 + y^2 = r^2$, with radius $r \in [0,\infty)$ and angle $\theta \in [0, 2\pi)$. The one fact needed from multivariable calculus is the Jacobian: $dx\,dy$ is replaced by $r\,dr\,d\theta$, not just $dr\,d\theta$. That extra factor of $r$ is exactly what makes the problem solvable.

$$I^2 = \int_{0}^{2\pi}\!\int_{0}^{\infty} e^{-r^2/2}\, r\, dr\, d\theta.$$

Inner integral: substitute $u = r^2/2$, so $du = r\,dr$, giving $\int_0^\infty e^{-u}\,du = 1$. The outer integral is then $\int_0^{2\pi} 1 \, d\theta = 2\pi$.

Conclusion

$I^2 = 2\pi$, so $I = \sqrt{2\pi}$. Since the integral was written down twice, a single copy is $\sqrt{2\pi}$. Hence the normalizing constant is

$$c = \frac{1}{\sqrt{2\pi}}.$$

It is striking that integrating an exponential produces $\pi$. The $\pi$ entered through the polar angle, which sweeps a full circle. The standard normal PDF is therefore

$$f(z) = \frac{1}{\sqrt{2\pi}}\, e^{-z^2 / 2}.$$

· · ·

6. Mean and Variance of the Standard Normal

Mean is $0$ (by symmetry)

Let $Z \sim \mathcal{N}(0,1)$. Then

$$E(Z) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} z\, e^{-z^2/2}\, dz = 0.$$

Odd-function symmetry

If $g(x)$ is an odd function ($g(-x) = -g(x)$), then $\int_{-a}^{a} g(x)\,dx = 0$: the negative area cancels the positive area (as with $\sin$). Here $z\,e^{-z^2/2}$ is odd — replacing $z$ by $-z$ leaves $e^{-z^2/2}$ unchanged but flips the leading $z$ — so the integral is $0$ with no computation.

Variance is $1$ (LOTUS + integration by parts)

Since $E(Z) = 0$,

$$\mathrm{Var}(Z) = E(Z^2) - [E(Z)]^2 = E(Z^2).$$

To get $E(Z^2)$, use LOTUS (Law of the Unconscious Statistician): no need for the PDF of $Z^2$; integrate $z^2$ against the PDF of $Z$ directly. The integrand is even, so integrate over $[0,\infty)$ and double:

$$E(Z^2) = \frac{2}{\sqrt{2\pi}} \int_{0}^{\infty} z^2\, e^{-z^2/2}\, dz.$$

Integrate by parts with $z^2 = z \cdot z$:

$u = z \;\Rightarrow\; du = dz$;
$dv = z\, e^{-z^2/2}\, dz \;\Rightarrow\; v = -e^{-z^2/2}$ (since $\frac{d}{dz}\!\left[-e^{-z^2/2}\right] = z\, e^{-z^2/2}$).

$$E(Z^2) = \frac{2}{\sqrt{2\pi}} \left\{ \Big[-z\, e^{-z^2/2}\Big]_{0}^{\infty} + \int_{0}^{\infty} e^{-z^2/2}\, dz \right\}.$$

The boundary term is $0$ at both ends ($0$ at $z = 0$; exponentially small as $z \to \infty$). The remaining integral is exactly half the Gaussian integral, $\tfrac{1}{2}\sqrt{2\pi}$. Multiplying:

Conclusion

$$E(Z^2) = \frac{2}{\sqrt{2\pi}} \cdot \frac{1}{2}\sqrt{2\pi} = 1, \qquad \text{so } \mathrm{Var}(Z) = 1.$$

This confirms the "$1$" in $\mathcal{N}(0,1)$.

· · ·

7. Standard Notation: $\Phi$

Because the standard normal is so important — and its integral so hard — its CDF gets its own name, the Greek capital $\Phi$:

$$\Phi(z) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{z} e^{-t^2/2}\, dt.$$

(The dummy variable is renamed $t$ to avoid clashing with the upper limit $z$.) Although the antiderivative has no elementary form, $\Phi(z)$ is easy to evaluate numerically and is widely tabulated. Treating $\Phi$ as a standard function sidesteps the impossibility of the closed-form integral.

Symmetry of $\Phi$

$$\Phi(-z) = 1 - \Phi(z).$$ Worth verifying by drawing a picture — good practice with symmetry and CDFs.

Next time: the general (non-standard) normal, obtained by shifting and scaling the standard normal.