Assume PDF is our target PDF and that

  • outside
  • , for some 1 The goal is to sample the RV .
  • Here, is green and are the red vertical bars.2
Desmos Graph

Naive Algorithm

  1. Generate . For example, let and .
  2. Generate independent of .
  3. If , accept and set .2

Why does this work?

  • is uniformly distributed in the box . This means that the accepted pair is uniformly distributed on
  • The area of is , since the area of is .
  • This implies that the marginal PDF for accepted is then
  • This tells us the distribution of accepted is .
  • This integral works because we are integrating over , and the PDF is only defined between and . The joint PDF collapses to , because it is the value of the accepted points during sampling.

General Algorithm

We can write a more general A-R method. The goal now is to sample from a target PDF

For intuition sake, we can imagine .

Desmos Graph

The top part is and the purple part is .

We want to find a PDF and such that

  • We know how to sample from . is called the proposed PDF. It is close to and easy to sample from.
  • for all . is called a majorizing function of .
    • Note that is required because is a PDF which integrates to .
  • We want to be as small as possible.

Acceptance-Rejection Algorithm

Given target PDF , find proposal PDF and constant such that for all .

  1. Generate where .
  2. Generate where .
  3. If , then accept and set . Otherwise, we reject and go step 1.

We want to measure the following rates:

Efficiency

The probability of acceptance is: Where and correspond to the volumes under the curves:

Thus, the expected acceptance rate is .

Theorem (Uniformity on Region )

Let the relevant geometric regions be defined as:

The pair obtained from Steps 1 & 2 is uniformly distributed on .

Proof:

  • Let be the joint PDF of .
  • Let be the conditional PDF of given .

Recall from theorem that the joint PDF is equal to the conditional times marginal. Since (Step 1), we have:

From Step 2, we generate . The PDF of a uniform distribution on is , so:

Substituting this back into the joint PDF:

Since the PDF is constant on the support , is uniformly distributed on .

Theorem (From Uniformity on to Target )

We established that the candidate points are uniform on .

How does ?

Proof:

Let be the first accepted point. Because and we strictly accept points within , the accepted points are uniformly distributed on :

is the area under the PDF , so its total volume is . The joint PDF of the accepted point is therefore:

To find the distribution of just the -values, we take the marginal of :

Thus the accepted samples follow the target distribution .

Theorem (A-R Generates )

This theorem explains why this works via geometric intuition.

Consider the graph of enveloping .

Desmos Graph
  • Step 1 (): is more likely to fall where is big. This leads to a “mushing” (clustering) of points along the x-axis at the peaks.
  • Step 2 (): Choosing gives “more space” for to be uniform where is large.

This vertical expansion compensates for the horizontal “mushing” of . The high horizontal density (large ) low vertical density (large range ) gives us a near 2D constant density in . These effects cancel out perfectly, see from Theorem (Uniformity on Region A). This allows us to cut out the shape of or region . The remaining points (accepted points) form .

In general, this is easy to visualize in , but this method provides justification for .

Modified A-R

Let be our target PDF and be our proposal PDF. Let where for all .

  1. Generate
  2. Generate .
  3. If then accept . Set . Else, reject and go to Step 1.

Example 1 (Sampling Standard Normal via A-R)

Let suppose we wanted to sample , the standard normal. So,

and let

Visually,

Desmos Graph

Note that is the exponential distribution which has PDF for multiplied randomly .

To do Step 1 (generate ) we need to:

  1. Generate independently. Let be transformed by Inversion and be .
  2. Set
  3. Now we find such that for all .

For our case, we can set

This allows us to sample computationally.

Footnotes

  1. In practice, you want as small as possible so that sampling can converge faster.

  2. Ideally, is as tight as possible. The textbook denotes . 2