Generating Random Numbers and Statistical Distributions

Do computers think? Do logical computers generate random numbers? How do they generate numbers that are apparently random?

Computers do not generate completely random numbers. In fact, the numbers given by some computers are produced by deterministic algorithms. By knowing the algorithm, the next generated random number in the sequence can be calculated. A random variable is a function that maps events in the sample space to numerical values in R. A statistical sample of a distribution is a sequence of generated random numbers that belong to the co-domain of the random variable following that distribution.

On this page, we focus on Monte Carlo methods, which are based on `random numbers'. There are various methods that use the Monte Carlo technique, and so they are called Monte Carlo methods. Quasi-Monte Carlo and Sequential Monte Carlo are alternatives to Monte Carlo in that they produce better samples under certain conditions. The previous two methods differ from the ordinary Monte Carlo methods in that the way they generate the sample reduces the error and improves the accuracy of the sample. A very brief introduction to Quasi-Monte Carlo methods will be given on this page.


Simulating random numbers from a uniform distribution

Pseudo-random numbers

Let's look at the uniform distribution and generate a sequence of numbers that follow a uniform distribution. The density function of a random variable that follows a uniform distribution with parameters a and b is given by f(x):={1baa<x<b;0otherwise.

D.H.Lehmer (1948) proposed a simple linear congruential generator (LCG) as the base source for generating random numbers (J.E.Gentle, 2005).

Let's generate a sequence of random integers from the set S={1,2,3,4,,m}. We set the first element in the sequence x0 that is called the `seed' (primitive root) . We then define the following xi+1=f(xi) and xi+2=f(f(xi)), which are in the general case, f[m](xi):=f(f(...f(xi)))m times. We say that x0 has order k if and only if k1 is the smallest positive integer such that f[k](xi)=xi. When the order of x0 is k=m, the sequence of xi's is uniformly distributed over integers [1,m]. S is called a cyclic set if the order of x0 is equal to the number of integers in the set, " m". We can then generate all elements in S using x0 as a starting number in the sequence.

Example
Given S={1,2,3,4,5} and x0=3, find a mapping to generate a sequence of uniformly distributed integers in the set S.

Solution
Let f(1)=3, f(2)=5, f(3)=4, f(4)=2 and f(5)=1. The sequence xi takes the form, {3,4,2,5,1  3,4,2,5,1  3,4,2,5,1 }.
Every element in the sequence is repeated every 5 terms and so the terms are not independent. Note that P(xi=y)=15 for any yS.


We can generate such a sequence using LCG with 0xi<m, by defining xi+1=f(xi)=(a xi+c) mod m.
The sequence maps each integer to a distinct integer in [1,m1] for c=0 and some aN{0}. The case of c0, we generate integers between [0,m1].


Theorem. Hull-Dobell theorem.

The sequence, xi, defined above, is cyclic with a primitive root x0 of order m if
  • c and m are co-primes with the greatest common divisor gcd=1
  • a1=0 mod p, if p is a prime factor of m
  • a1=0 mod 4, if 4 is a factor of m


Example Use Hull-Dobell theorem and generate each of the following cyclic sequences.
  1. to generate a cyclic sequence of integers between 1 and 10.
  2. to generate a cyclic sequence of integers between 0 and 98.
  3. to generate a cyclic sequence of integers between 1 and 100.

Solution.
  1. Set m=11, c=0 and a=2. The generated sequence is then 5,10,9,7,3,6,1,2,4,8,5,10,.
  2. Set m=99, c=5 and a=11×3+1=34.
  3. Set m=100, c=3 and a=2×5×2+1=21. we then add 1 to each number in the sequence to transform the interval from [0,99] to [1,100] as required.


Standard uniformly distributed numbers in [0,1]

We setui={xi1m2:c=0xim1:c0},
which is a sequence of uniformly distributed numbers over the unit interval [0,1]. We say that u is a random variable that follows a standard uniform distribution (i.e. uU[0,1]).

In order to have a sequence of random numbers, generated by a random seed, one can choose the seed x0 to be a number known to the computer that describes the time or to replace it by the last generated number in the sequence. We provide a new seed every time we generate a number xi and therefore xi+1 does not depend directly on xi. Note that the possible values of xi belong to the interval [0,m1], not to [1,m1] as in the previous example. xi=[time](i) mod (m),ui=xim1.
Quasi-random numbers

In this section, one example is given on generating low discrepancy sequence of numbers. Low discrepancy sequence is a sequence of random numbers that spreads over the domain of integration as the size of the sample increases. The numbers in the sequence are called Quasi-random numbers. The Monte Carlo technique that uses Quasi-random numbers is called Quasi-Monte Carlo technique. It is a variance reduction technique to reduce the variance of the estimator such that the random number xi depends on the previously generated numbers in the sequence.

Example α-sequences

Let p be a prime number and define xk=kp:kN, 0km;yi=xixi:kN, 0km,

where yi is the fraction part of xi, and xi denotes floor xi that is the integer part of xi. Then {y0,y1,,yn} are uniformly distributed numbers over [0,1). If p and q are two distinct prime numbers, then the generated pairs (yi,zi) corresponding to (p,q) are uniformly distributed over I2[0,1]. Here yi and zi are independent random numbers (Proof is omitted). This is a way to generate s independent uniform distributions over (Is[0,1]) by choosing s different prime numbers. Note that the sequence here does not repeat it self (Not cyclic).


Simulating random numbers from specific distributions
We have learnt how to generate numbers that follow a uniform distribution. In this section, we will learn how to use numbers that follow a uniform distribution to generate other statistical distributions.

Box-Muller algorithm
Let U and V be two independent random variables that are uniformly distributed over [0,1]. Then the two random variables Y and Z defined by Y=2log(U) cos(2πV)Z=2log(U) sin(2πV)
are independent and each follows a standard normal distribution N(0,1) with a mean of 0 and a standard deviation of 1. Knowing that ZN(0,1), we can generate any normally distributed random variable X with mean μ and variance σ2 (i.e. XN(μ,σ2)) by defining X=σZ+μ.

Central Limit Theorem (CLT)
Another method to generate a standard normal distribution is to apply the CLT to a sample of independent identical distributions (iid) Y1,Y2,,Ym. When m is large enough, and if each distribution Yi has a mean of μ and a standard deviation of σ, we get X=mi=1(Yi)N(m E(Yi),m Var(Yi)).
Take the example of standard uniform iid U1,U2,,Um, each with a mean of 12 and a variance of 112. Define the new random variable Z=mi=1(Ui)m2m12.
Z is then normally distributed with ZN(0,1) when m. Note that using the CLT method to generate normal distributions is a time consuming method and not so efficient.


The inverse cumulative distribution function method
Theorem
Given a random variable X that follows a distribution and that its cumulative distribution function (CDF) is a monotonic non-decreasing function of x, which is given byPX(x)=Pr(Xx).

Similarly we define the CDF of a random variable Y as PY(y)=Pr(Yy) such that both functions PY(y) and PX(x) have probability values in the interval [0,1]. If PY(y) is continuous, then for every value xR there exists a value yR such that PX(x)=PY(y). Now let YU[0,1] such that PX(x)=PY(y)=y, which gives x=P1X(y). This is accurate when PX and PY are strictly increasing functions, while we need to define P1X as P1X(y)=inf{x:PX(x)y},
when PX is a non-decreasing function.

Example
If XExp(λ) (exponential distribution) with density function f(x) defined for x[0,) as f(x)=λeλx. We then integrate f(s) over [0,x] to get the cumulative distribution function of X PX(x)=1eλx. Now let U be a random variable uniformly distributed over [0,1]. Then the random variable X=1λlog(1U)=1λlog(U)
follows an exponential distribution with parameter λ.

Example
If XPoisson(λ) Poisson process with intensity λ and density function f(k)=λkeλk! for k=0,1,. We calculate PX(k)=Pr(Xk), for k=0,1,, then generate YU[0,1] and take X=k if PX(k1)<YPX(k).

Exercise Given that XU[0,1] and that a sample of size 9 has been drawn from the distribution of X with values {0.11,0.22,0.33,0.44,0.55,0.66,0.77,0.88,0.99}, generate a sample of a poisson distribution from the previous sample given that the intensity λ=1.


Accept-Reject method
There are some distributions that are difficult to generate by the inverse cumulative distribution function method. The Accept-Reject method can be an alternative. This method is based on Monte Carlo technique. We intend to generate random numbers that belong to the interval [a,b] and that follow a statistical distribution with density function f(x). In order to do so, we take the rectangle bounded by the intervals [a,b] in x-axis and [0,M] in the y-axis where the mode of the density function f(x) is M, with Mf(x) x[a,b]. We then plot the probability density function f(x) of the random variable X inside the rectangle. We generate a pair of random numbers from two independent distributions ZU[0,1] and Yg. g is a density function to be chosen such that we know how to generate a sample from its distribution. We then accept X=Y if Z1cf(Y)g(Y) with an acceptance rate of 1c, where c satisfies f(x)g(x)c x[a,b].

Why does the Accept-Reject method work this way? P(Yx|Z1cf(Y)g(Y))=P(YxZ1cf(Y)g(Y))P(Z1cf(Y)g(Y))=xf(y)c g(y)01dz g(y)dyf(y)c g(y)01dz g(y)dy=1cxf(y)g(y)g(y)dy1cf(y)g(y)g(y)dy=xf(y)g(y)g(y)dy = PX(Xx).
The simplest application is when we set YU[a,b] such that Y is uniformly distributed with density function g(y)=1ba y. We can then set c=Mg(y)=M(ba).

Exercise
Given two sequences of independent random numbers, ui and vi, each generated from a uniform distribution over [0,1]. Write a pseudo-code that uses Accept-Reject method to generate a sequence of normally distributed random numbers with zero mean and standard deviation of 1 for values between -3 and 3 with an acceptance rate of 13.


Multivariate normal distribution
Given a random vector of s random variables X=[X1,X2,,Xs]T and a mean vector of μ=[μ1,μ2,,μs]T such that the random variable Xi has a mean of μi. Introduce the symmetric covariance matrix for s random variables Σ=(σ21σ1,sσs,1σ2s), 
where σi is the standard deviation of Xi, and σi,j=σiσjρi,j is the covariance between Xi and Xj with correlation coefficient ρi,j. We then say that the random vector X follows a normal distribution with mean vector μ and covariance matrix Σ and we use the notation XN(μ,Σ). Assume that Z=[Z1,Z2,,Zs]T is a random vector consisting of s independent normally distributed variables each with a mean of 0 and a standard deviation of 1 such that ZN(0s,Is), where 0s is a zero vector of size s and Is is the s×s identity matrix. We can then generate X from Z by defining X as X=μ+AZ,
where A is any s×s matrix such that AAT=Σ. The matrix A is not unique and there are many ways to find such a matrix. Cholesky factorization method is one of the commonly used methods to calculate A as a lower triangular matrix.

Example Generate a random vector Y=[Y1,Y2] such that Y1 and Y2 are correlated with a correlation coefficient ρ. Assume that Y1 has a mean of μ1 and a standard deviation of σ1, while Y2 has a mean of μ2 and a standard deviation of σ2. We then haveμ=[μ1μ2],  Σ=[σ21σ1σ2ρσ1σ2ρσ22]. 

LetA=[a10a3a4],  such that  AAT=[a21a1a3a1a3a23+a24].
From the comparison between AAT and Σ, we getA=[σ10ρσ21ρ2 σ2].

We can then writeY1=μ1+σ1Z1,Y2=μ2+ρσ2Z1+1ρ2 σ2Z2.
where Z1N(0,1) and Z2N(0,1) are independent random variables.



References
  1. Casella G., (2008), Monte Carlo Statistical Methods.
  2. Glasserman P., 2004, Monte Carlo Methods in Financial Engineering, Springer.
  3. Gentle J.E., (2005), Random Number Generation and Monte Carlo Methods, 2nd edition, Springer.
  4. Hull. T.E. and Dobell A.R, (1962), Random Number Generators, SIAM Rev., 4(3) (July 1962), 230-254.
  5. Jiang X. and Birge J.R., (2004), Comparisons of alternative quasi-Monte Carlo sequences for American option pricing, working paper.
  6. Niederreiter H., (1992), Random Number Generation and Quasi-Monte Carlo Methods, Siam.

No comments:

Post a Comment