Laplace approximation

The Laplace technique can be used to approximate the reasonably well behaved functions that have most of their mass concentrated in a small area of their domain.(Laplace approximation) Technically, it works for functions that are in the class of \(\mathcal{L}^2\), which is also called the square integrable function, meaning \[ \int g(x)^2dx <\infty \] Such a function generally has a very rapid decreasing tails so that in the far reaches of the domain we would not expect to see large spikes.

The Laplace approximation framework is a simple but widely used framework, and aims to find a Gaussian approximation to a probability density defined over a set of continuous variables. (Laplace approximation)

In Bayesian statistics, Laplace approximation can refer to either approximating the posterior normalizing constant with Laplace's method or approximating the posterior distribution with a Gaussian centered at the maximum a posteriori estimate.(Amaral Turkman 2019)

Suppose we want to approximate the pdf \(p(\theta)\), which doesn't belong to any known distribution. The density curve is smooth and well peaked around its point of maxima \(\hat{\theta}\). Thereby, \(\frac{dp(\theta)}{d\theta}|_{\hat{\theta}}=0\) and \(\frac{d^2p(\theta)}{d\theta^2}|_{\hat{\theta}}<0\); thus we can conclude that \(\frac{dlnp(\theta)}{d\theta}|_{\hat{\theta}}=0\) and \(\frac{d^2lnp(\theta)}{d\theta^2}|_{\hat{\theta}}<0\)

Then we can approximate it by a normal pdf. Denote \(h(\theta)=lnp(\theta)\) \[ \begin{gather*} h(\theta)\approx lnp(\hat{\theta})+(\theta-\hat{\theta})\frac{dlnp(\theta)}{d\theta}|_{\hat{\theta}}+\frac{1}{2}(\theta-\hat{\theta})^2\frac{d^2lnp(\theta)}{d\theta^2}|_{\hat{\theta}}\\ =lnp(\hat{\theta})-\frac{1}{2}(\theta-\hat{\theta})^2\frac{-d^2lnp(\theta)}{d\theta^2}|_{\hat{\theta}}\ (\frac{dlnp(\theta)}{d\theta}|_{\hat{\theta}}=0)\\ =lnp(\hat{\theta})-\frac{(\theta-\hat{\theta})^2}{2\sigma^2}\ (\sigma^2=(\frac{-d^2lnp(\theta)}{d\theta^2}|_{\hat{\theta}})^{-1}>0)\\ \Rightarrow p(\theta)=e^{h(\theta)}\approx p(\hat{\theta})e^{-\frac{(\theta-\hat{\theta})^2}{2\sigma^2}}\\ \Rightarrow p(\theta)\propto e^{-\frac{(\theta-\hat{\theta})^2}{2\sigma^2}} \end{gather*} \] Suppose \(f(\theta)=e^{-\frac{(\theta-\hat{\theta})^2}{2\sigma^2}}\), then we can approximate \(p(\theta)\) as \[ \begin{gather*} p(\theta)\approx \frac{1}{\int f(\theta)d\theta}f(\theta)\\ =\frac{1}{\sqrt{2\pi}\sigma\int \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(\theta-\hat{\theta})^2}{2\sigma^2}}d\theta}e^{-\frac{(\theta-\hat{\theta})^2}{2\sigma^2}}\\ =\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(\theta-\hat{\theta})^2}{2\sigma^2}}\\ \text{where }\int f(\theta)d\theta\text{ is the normalizing term} \end{gather*} \] As a result, pdf of \(\theta\) is approximated by a normal distribution using Laplace method, which can be shown as below \[ \begin{gather*} \theta\sim N(\hat{\theta},\sigma^2)\\ \sigma^2=(\frac{-d^2lnp(\theta)}{d\theta^2}|_{\hat{\theta}})^{-1}>0 \end{gather*} \]

Reference