The following is a quick rant about importance sampling (see that post for notation).

I've heard the following **incorrect** statement one too many times,

We chose \(q \approx p\) because \(q=p\) is the "optimal" proposal distribution.

While it is certainly a good idea to pick \(q\) to be as similar as possible to
\(p\), it is by no means *optimal* because it is oblivious to \(f\)!

With importance sampling, it is possible to achieve a variance reduction over
Monte Carlo estimation. The optimal proposal distribution, assuming \(f(x) \ge 0\)
for all \(x\), is \(q(x) \propto p(x) f(x).\) This choice of \(q\) gives us a *zero
variance* estimate *with a single sample*!

Of course, this is an unreasonable distribution to use because the normalizing
constant *is the thing you are trying to estimate*, but it is proof that *better
proposal distributions exist*.

The key to doing better than \(q=p\) is to take \(f\) into account. Look up "importance sampling for variance reduction" to learn more.