Definition (G-Test)

Suppose is a random variable with finitely many values and let . Suppose we make observations of the values and that is the number of observations of that we made.

Let , and define

If for some , then the summand is . If there exists an such that but , then .

Gibbs’ Inequality

We have that

Moreover,

Proof:

If then , and .

The summand is always non-negative, so it must be equal to .

What this tells us is that if is small, then it matches our expectations. Otherwise the observation is out of ordinary.

Wilks’ Theorem

Suppose the observations of the values that we make are in fact independent observations of the random variable . For large values of , the values of are well approximated by a chi-square distribution with degrees of freedom.

The values of are well approximated by a chi-square distribution with degrees of freedom. Indeed, the probability that lands inside some interval is

So for our , the probability of observing a value of is

This is how we decide if is small or large.

Heuristic Addendum

The approximation of is good enough as long as the vast majority of the expected counts are all at least .