Monte Carlo Techniques

Monte Carlo methods and numerical approximations
Monte Carlo methods are an efficient tool that uses random numbers, generated from a statistical distribution, in an experiment to solve a deterministic problem. It is a very powerful tool since we can numerically approximate the solution. The main drawback is that it is computationally expensive but it gives a good flexibility when we work in a stochastic environment. Quasi-Monte Carlo and Sequential Monte Carlo are two successful attempts to reduce the computational time.

Simple example
Let's introduce a simple example to explore how Monte Carlo technique can be used to approximate an integral \(I\) of a continuous function \(f(x)\in C^2[a,b]\) (i.e. \(f(x)\) is a twice differentiable function defined on \([a,b] \times [a,b]\)), with $$ I=\int_a^bf(x)\mathrm{d}x. $$ We use the composite trapezoidal rule, for a comparison with Monte Carlo technique, to approximate the integral over the interval \([a,b]\). Let the increment \(h=\frac{b-a}{n},\) such that \(x_j=a+jh\), for each \(j=0,1,\dots,n\). We then have $$ I\approx \frac{h}{2} \left[f(a)+2\sum_{j=1}^{n-1}f(x_j)+f(b)\right], $$ with an error term of \(\frac{(b-a)^3}{12n^2} f^{''}(\mu)=O(\frac{1}{n^2})\) for some \(\mu \in (a,b)\). This can be generalized for \(s\)-dimensional problem producing an error of \(O(n^{-\frac{2}{s}})\). It gets more slow when we try to approximate a multi-dimensional integral over a region \(D\). Let's take an example of 1-dimensional integral when \(f(x)=x^2\) to be approximated over the interval \([0,1]\). The actual integral is \(\frac{1}{3}\) and we know that the integral is the area under the curve given by the continuous function. Figure 1 represents the way that the trapezoidal rule approximates the Integral \(I\) by the sum of the areas of the blue rectangles.

Figure 1: Approximating \(\int_0^1 x^2\mathrm{d}x\) using \(h=\frac{1}{14}\)

let's transform the problem of evaluating an exact integral to a probabilistic problem by introducing two random variables \(\mathrm{X}\sim U[0,1]\) and \(\mathrm{Y}\sim U[f(0),f(1)]=U[0,1]\). We generate, from the statistical distributions, a pair of random numbers \(v_i=(x_i,y_i)\), and we plot the points on the same graph with the function that we would like to approximate. We calculate the number of points that lie in the region bounded by the curve of \(f(x)\) and the line \(y=0\), and divide the number of scored points by the total number of generated points:
$$ I=\text{Area under the curve} = \mathrm{P}(v_i \text{ is under the curve}) \times \text{Total Area of 1} \times 1. $$
An example is illustrated in Figure 2 with 1000 pairs of random numbers have been generated and 349 lied in the area of interest giving an approximate value of 0.340 for the integral. In the previous example, we have solved a probabilistic problem to approximate a deterministic problem. We can evaluate a definite integral \(I\) using Monte Carlo technique, by identifying a random variable \(\mathrm{X}\) defined over the region \(D\). We write \(f(x)=g(x)p(x)\) where \(p(x)\) is the density function of the distribution that \(\mathrm{X}\) follows and \(g(x)\) can be any continuous function.

Assume that$$ \label{MC} I=\int_D f(x)\mathrm{d}x, $$ we then have$$ \label{MCI}\mathrm{E}(g(\mathrm{X}))=\int_D g(x)p(x)\mathrm{d}x\\=\int_D f(x)\mathrm{d}x\\=I. $$

Figure 2: Approximating \(\int_0^1 x^2\mathrm{d}x\) using \(n=1000\)


Example Write a probabilistic representation for \(\int_0^2 x^2\mathrm{d}x\).

Take a sample of \(m\) uniformly distributed numbers over \([0,2]\), we then have$$ \hat{I}_m=\frac{1}{m}\sum_{i=1}^m g(x_i)=\frac{2}{m}\sum_{i=1}^m f(x_i), $$ and $$ \begin{array}{cll} \mathrm{E}(\hat{I}_m)&=\mathrm{E} \left(\frac{2}{m}\sum_{i=1}^m f(x_i) \right)&\\&=\frac{2}{m}\sum_{i=1}^m\mathrm{E}\left(f(x_i) \right)&=2\times\mathrm{E}\left(f(\mathrm{X})\right)\\&=\mathrm{E}\left(2\times f(\mathrm{X}) \right)&=\mathrm{E}\left(g(\mathrm{X})\right)\\&=\int_0^2 g(s) \frac{1}{2}\mathrm{d}s&\\&=\int_0^2 f(x)\mathrm{d}s.& \end{array} $$

Some properties of the estimator
Let's take another example when the region \(D\) is the interval \([0,1]\), and \(\mathrm{X}\) is a random variable that follows \(U[0,1]\) (with probability density function \(p(x)=1\)). We have \(f(x)=g(x) \times p(x)=g(x)\).

We get \(I=\mathrm{E}(f(x))\), where \(\mathrm{X}\) is a random variable that is uniformly distributed over \([0,1]\). A sample of independent \(x_1,x_2,\dots,x_m\) has been taken from our distribution, with $$ \label{MC_I_hat} \hat{I}_m=\frac{1}{m}\sum_{i=1}^m f(x_i), $$ and $$ \begin{array}{cl} \mathrm{E}(\hat{I}_m)&=\mathrm{E}(\frac{1}{m}\sum_{i=1}^m f(x_i))\\&=\frac{1}{m}\sum_{i=1}^m\mathrm{E}(f(x_i))\\&=\mathrm{E}(f(\mathrm{X}))\\&=\int_0^1 f(x)\mathrm{d}s. \end{array} $$ \(\hat{I}_m\) converges almost surely to \(I\) by the strong law of large numbers ( \(\mathrm{P}[\lim_{x \to +\infty} \hat{I}_m = I]=1\)). Since \(x_i\)'s are independent, then \(\mathrm{Cov}(x_i,x_j)=0 \text{ when } i\neq j\) gives the variance of \(\hat{I}_m\) as $$ \begin{array}{cl} \mathrm{Var}(\hat{I}_m)&=\frac{1}{m^2}\sum_{i=1}^m \mathrm{Var}\left(f(x_i)\right)\\ &=\frac{1}{m} \mathrm{Var}\left(f(\mathrm{X})\right)\\ &=\frac{1}{m} \int_0^1 \left(f(s)-\mathrm{E}(\hat{I})\right)^2\mathrm{d}s. \end{array} $$ When the sample size \(m\) is large, we can apply the Central Limit Theorem (CLT) to get $$ \frac{\hat{I}_m-I}{\sqrt{\mathrm{Var}(\hat{I}_m)}} \sim N(0,1). $$ Therefore \(I\) is bounded by a confidence interval $$ I\in \left(\hat{I}_m-z_{\alpha}\sqrt{\frac{\mathrm{Var}(f(x))}{m}},\hat{I}_m+z_{\alpha}\sqrt{\frac{\mathrm{Var}(f(x))}{m}}\right), $$ where \(z= \Phi^{-1}(\alpha)\) and \(\Phi\) is the CDF of the standard normal distribution \(N(0,1)\). The error term is of order \(O(\frac{1}{\sqrt{m}})\), which does not depend on the dimensionality of the problem \(s\). The error that is represented by the variance of the estimator is called the statistical Monte Carlo error, and it can be reduced by increasing the number of generated paths or by using a variance reduction technique to reduce the quantity \(\mathrm{Var}(\hat{I}_m)\) by producing negative correlations in the sample. In comparison with an error of \(O(m^{-\frac{2}{s}})\) produced by Trapezoidal rule, the Monte Carlo error is of a smaller order of magnitude when the dimensionality of the problem \(s \geq 5\).

Quasi-Monte Carlo
This method uses a deterministic sample of \(x_1,x_2,\dots,x_m\) such that the error of our integral \(I\) is reduced. Quasi-Monte Carlo method aims to beat the average error of the ordinary Monte Carlo method of order \(O(\frac{1}{\sqrt{m}})\). In Quasi-Monte Carlo technique, the approximation of the integral of \(f(x)\), over the domain \(A\), is replaced by $$ I=\int_A f(x)\mathrm{d}x \approx \frac{1}{m}\sum^{m}_{\substack{k=1\\ x_k \in A}} f(x_k), $$ for some quasi-random numbers, \(x_k\), belong to the domain \(A\). Some Quasi-Monte Carlo sequences of numbers yield an error of order \(\displaystyle{O \left(\frac{(\mathrm{log}~ m)^{s-1}}{m} \right)}\) , where \(s\) is the dimensionality of our integral. Figure 3 is an example to illustrate the difference between pseudo and quasi random numbers.


In the left picture, an example of \(\alpha\)-sequences with values of \(p=3\) for \(x_i\) and \(q=5\) for \(y_i\)





References
  1. Glasserman P., 2004, Monte Carlo Methods in Financial Engineering, Springer.
  2. Gentle J.E., (2005), Random Number Generation and Monte Carlo Methods, 2nd edition, Springer
  3. Robert C.P and Casella G., (2004), Monte Carlo Statistical Methods, 2nd edition, Springer.

No comments:

Post a Comment