We extend the definition of the Normal Distribution to higher dimensions. Indeed

N (μ, Σ)

where $μ \in R^{n}, Σ$ is a $n \times n$ positive symmetric definite matrix.

Positive Symmetric Definite

A matrix $M$ is symmetric if $m_{ij} = m_{ji}$ and positive definite if for every nonzero real column vector $x$ , $x^{⊺} M x \geq 0$ .
https://en.wikipedia.org/wiki/Definite_matrix

For any real, invertible matrix $A$ , the product $A A^{T} = M$ is PSD. This is called the Cholesky Factorization.

Bivariate Normal Distribution

Special case when $n = 2$ . We say that $(X, Y)$ has bivariate normal distribution $N (μ, Σ)$ if the joint density

f_{X, Y} (x, y) = \frac{1}{2 π det Σ} exp (- \frac{1}{2} [x - μ_{x} y - μ_{y}]^{⊺} Σ^{- 1} [x - μ_{x} y - μ_{y}])

Example 1

If $X \sim N (μ_{1}, σ_{1}^{2})$ and $Y \sim N (μ_{2}, σ_{2}^{2})$ are independent then $(X, Y)$ have bivariate normal distribution:

N ((μ_{1}, μ_{2}), (σ_{1}^{2} 0 0 σ_{2}^{2}))

This is because we have the Joint Distribution: $f_{X, Y} (x, y) = f_{X} (x) f_{Y} (y)$ in which the resulting exponent is three matrices with

Σ^{- 1} = (\frac{1}{σ _{1}^{2}} 0 0 \frac{1}{σ _{2}^{2}})

Properties

Normalization

Let $Y_{1}, Y_{2} \sim N (0, 1)$ and be independent.
Let $Σ$ be a 2x2 PSD where $Σ = A A^{T}$ .

Then

A (Y _{2} Y _{1}) + (μ _{2} μ _{1})

has bivariate normal distribution,

N ((μ_{1} μ_{2}), Σ)

Conversely, if

(Z_{1} Z_{2}) \sim N (μ, Σ)

then

A^{- 1} (Z_{1} - μ_{1} Z_{2} - μ_{2}) \sim N ((00), (1001))

which is how we can “normalize it”.

Proof:

We have that

f_{Y_{1}, Y_{2}} (y_{1}, y_{2}) = \frac{1}{2 π} exp (- \frac{1}{2} (y_{1}^{2} + y_{2}^{2})) = \frac{1}{2 π} exp (- \frac{1}{2} ⟨ (y _{2} y _{1}), (y _{2} y _{1}) ⟩)

which is just

(Y_{1} Y_{2}) \mapsto A (Y_{1} Y_{2}) + (μ_{1} μ_{2}) = (Z_{1} Z_{2})

Joint Conversion

What is the joint density of $(Z_{1}, Z_{2})$ ? We can define the prior function as $g : R^{2} \to R^{2}$ , where

g (X_{1} X_{2}) = (Z_{1} Z_{2})

where

f_{Z_{1}, Z_{2}} (z_{1}, z_{2}) = f_{X_{1}, X_{2}} (g^{- 1} (z_{1}, z_{2})) \cdot ∣ det D g^{- 1} (z_{1}, z_{2}) ∣ = f_{X_{1}, X_{2}} (g^{- 1} (z_{1}, z_{2})) \cdot \frac{1}{∣ det D g ( x _{1} , x _{2} ) ∣}

where we multiply by the determinant because $g$ changes the volume or “space” in $R^{2}$ since we are switching from one coordinate system to another.

The determinant encodes the notion of area and volume
Multiplying by it fixes the transformation change so that the probabilities match in both coordinate systems.

Back to the proof, we can apply the inverse to the map. Let the map be $g$ .

g (y_{1} y_{2}) g^{- 1} (z _{1} z _{1}) D g = A (y _{2} y _{1}) + (μ _{2} μ _{1}) = - A^{- 1} (z _{2} - μ _{2} z _{1} - μ _{1}) = A

$g$ is an affine transformation (linear transformation given by $A$ ) then translated by $μ$ .
$D$ is the Jacobian operator on $g$ .

Upon changing variables and application of the density function,

= f_{X_{1}, X_{2}} (g^{- 1} (z_{1}, z_{2})) \cdot \frac{1}{∣ det A ∣} = \frac{1}{2 π ∣ det A ∣} exp (- \frac{1}{2} ⟨ A^{- 1} (z _{2} - μ _{2} z _{1} - μ _{1}), A^{- 1} (z _{2} - μ _{2} z _{1} - μ _{1}) ⟩) = \frac{1}{2 π ∣ det A ∣} exp (- \frac{1}{2} (z _{2} - μ _{2} z _{1} - μ _{1})^{T} [(A^{- 1})^{T} A^{- 1}] (z _{2} - μ _{2} z _{1} - μ _{1}))

We read from here that $(Z _{2} Z _{1})$ has bivariate normal distribution with

μ = (μ _{2} μ _{1}) Σ^{- 1} = (A^{- 1})^{T} A^{- 1}

Variance and Covariance

Suppose

(X _{2} X _{1}) \sim N (μ, Σ)

then

μ = (\E X _{2} \E X _{1}) Σ = (Var (X_{1}) Cov (X_{1}, X_{2}) Cov (X_{1}, X_{2}) Var (X_{2}))

which is the mean vector and the covariance matrix

Proof

We use joint conversion and normalization . So, we have

(X_{1} X_{2}) = A (Y_{1} Y_{2}) + (μ _{2} μ _{1})

where $A A^{T} = Σ$ and $Y_{1}, Y_{2} \sim N (0, 1)$ and are independent. We denote $A = (a c b d)$ . Then

(X _{2} - μ _{2} X _{1} - μ _{1}) = (a c b d) (Y _{2} Y _{1}) = (a Y_{1} + b Y_{2} c Y_{1} + d Y_{2})

Then to find the variances:

Var (X_{1}) - Var (a Y_{1} + b Y_{2}) = Var (a Y_{1}) + Var (b Y_{2}) = a^{2} + b^{2}

and likewise for $Var (X_{2}) = c^{2} + d^{2}$ . Next,

Cov (X_{1}, X_{2}) = Cov (a Y_{1} + b Y_{2}, c Y_{1} + d Y_{2}) = a c Var (Y_{1}) + (a d + b c) Cov (Y_{1}, Y_{2}) + b d Var (Y_{2}) = a c + b d

As $Σ = A A^{T}$ , we get

(a c b d) (a b c d) = (a^{2} + b^{2} a c + b d a c + b d c^{2} + d^{2})

which follows from what we calculated before. For the means, we have that

(X_{1} X_{2}) = A (Y_{1} Y_{2}) + μ

which gives us:

(E [X_{1}] E [X_{2}]) = A (E [Y_{1}] E [Y_{2}]) + μ = μ

Equivalent Characterization of Bivariate Normal

$(X_{1}, X_{2})$ have a bivariate normal distribution if and only if $\forall a, b \in R$ , $a X_{1} + b X_{2}$ is a normal random variable.

Proof:

Forward Direction. Upon conversion to matrices through joint conversion:

(X _{2} X _{1}) = A (Y _{2} Y _{1}) + μ

where $Y_{1}, Y_{2} \sim N (0, 1)$ . This gives us $a Y_{1} + b Y_{2}$ . But this is normal for any $a, b \in R$ by convolutions which is shown in Sum of 2 Independent RVs.

Reverse Direction. Better explained with the Fourier transformation and is not covered.

Conditional Distribution

Let $(X, Y) \sim N (μ, Σ)$ . Then the conditional distribution of $Y$ given $X = x$ is

N (μ_{Y} + ρ \frac{σ _{Y}}{σ _{X}} (x - μ_{X}), σ_{Y}^{2} (1 - ρ^{2}))

where $ρ$ is the Correlation.

Corollary: Conditional Expectation

\E Y ∣ X = x = μ_{Y} + ρ \frac{σ _{Y}}{σ _{X}} (x - μ_{X})

Var (Y ∣ X = x) = σ_{Y}^{2} (1 - ρ^{2})

Proof:

If $(X, Y)$ has bivariate normal distribution, then the Joint Distribution is determined by

(μ _{Y} μ _{X}), (Var (X) Cov (X, Y) Cov (X, Y) Var (Y))

In particular, if $Cov (X, Y) = 0$ then $X, Y$ are independent.

In general, think of $Cov (X, Y)$ as an inner product for the bivariate normal distribution. So,

orthogonal Cov (X, Y) = 0 ⟺ independent ⟺ X, Y are independent

We write $Z_{1} = \frac{X - μ _{X}}{σ _{X}}$ such that $Z \sim N (0, 1)$ . Then

Cov (Y, Z_{1}) = \E (Y - \E Y) (Z_{1} - \E Z_{1}) = \E (Y - \E Y) Z_{1} = \frac{1}{σ _{X}} \E (Y - \E Y) (X - μ_{X}) = \frac{Cov ( Y , X )}{σ _{X}} = \frac{ρ σ _{X} σ _{Y}}{σ _{X}} = ρ σ_{Y}

then

Cov (Y - ρ σ_{Y} Z_{1}, Z_{1}) = 0

since

(Y - ρ σ _{Y} Z _{1} Z _{1}) = (\frac{1}{σ _{X}} - ρ \frac{σ _{Y}}{σ _{X}} 01) (X - μ_{X} Y)

is bivariate normal, we deduce that $Y - ρ σ_{Y} Z_{1}$ is independent of $Z_{1}$ . So,

Y = Y - ρ σ_{Y} Z_{1} + ρ σ_{Y} Z_{1} = Y_{2} + ρ \frac{σ _{Y}}{σ _{X}} (X - μ_{X})

and $Y_{2}$ has normal distribution and

\E Y_{2} = \E Y - ρ σ_{Y} Z_{1} = μ_{Y}

Var (Y_{2}) = Var (Y - ρ σ_{Y} Z_{1}) = Var (Y) + (ρ σ_{Y})^{2} Var (Z_{1}) - 2 (ρ σ_{Y}) Cov (Y, Z_{1}) = σ_{Y}^{2} + ρ^{2} σ_{Y}^{2} - 2 (ρ σ_{Y}) (ρ σ_{Y}) = (1 - ρ^{2}) σ_{Y}^{2}

kyle's notes

Explorer

Multivariate Normal

Positive Symmetric Definite

Bivariate Normal Distribution

Example 1

Properties

Normalization

Proof:

Joint Conversion

Variance and Covariance

Proof

Equivalent Characterization of Bivariate Normal

Proof:

Conditional Distribution

Corollary: Conditional Expectation

Proof:

Graph View

Table of Contents

Backlinks