P(A|B)=P(A∩B)P(B).
Bayes' Theorem.
For any events A and B with P(B)>0,
P(A|B)=P(B|A)P(A)P(B),
where
- P(A) and P(B) are Prior Probabilities.
- P(A|B) is a Posterior Probabilities. Bayes' theorem is used to update the parameters of a distribution.
Example:
- 10 balls and 6 cubes are in a box.
- 5 balls are blue and the other 5 are red.
- Two cubes are blue and the other 4 are red.
- An item has been randomly chosen from the box.
- What is the probability that the chosen item is a cube, given that you have spotted a red item, but was not able to identify its shape?
Solution.
One of the ways to solve the problem, is to take the fact that the chosen item is red and exclude all the information related to the blue items as can be seen on the above figure.
A general solution is to use Bayes' theorem as follows
Red | Blue | Sub-total | |
---|---|---|---|
Ball | 5 | 5 | 10 |
Cube | 4 | 2 | 6 |
Sub-total | 9 | 7 | 16 |
P(A)=# of times A occurs# replications.
Let event A: the chosen item is cube. Let event B: the chosen item is red.
P(B)=916 P(B∩A)=416=14.
This gives P(A|B)=1/49/16=49 , while P(A)=616=38.
Independence: Two subsets A,B∈F, where F is a σ-algebra on a sample space Ω, are called independent, if and only if P(A∩B)=P(A)×P(B).
Example: Let x and y be two independent and exponentially distributed random variables with parameters λ1 and λ2, respectively. The probability that both random variables are greater than M is
P(x>M∩y>M)=P(x>M)P(y>M)=e−(λ1+λ2)M
Example: Suppose that we are about to take an observation, which we know it has come from a Poisson distribution with mean λ=3,4 or 5.
- Suppose that we think the mean λ is either 3, 4 or 5 with corresponding probabilities of 0.1, 0.5, 0.4.
- We now perform the experiment and get x=7.
- What are the posterior probabilities for λ, given that x=7.
This has more likely come from λ=5, since P(x=7|λ=3)=λxe−λx!=0.0216.
P(x=7|λ=4)=0.0595.
P(x=7|λ=5)=0.1044.
We then have P(λ=3|x=7)=P(x=7|λ=3)P(λ=4)P(x=7)=0.0293,
where P(x=7)=∑5i=3P(x=7|λ=i)P(λ=i).
Similarly, P(λ=4|x=7)=0.4038
P(λ=5|x=7)=0.5669
Maximum Likelihood Estimation:
Introduction
- R. Fisher, early in the last century, proposed the method of maximum likelihood estimation.
- The method finds the best parameters for a given family of distributions that the data has most likely come from.
- It assumes that the family of distributions, which the data belongs to, is known, but the parameters of the distribution are unknown.
- Let Θ={θ1,θ2,…,θn} be the set of parameters of a statistical distribution, then for any event x with P(x)>0, P(x|Θ=ˆΘ)=P(Θ=ˆΘ|x)P(x)P(Θ=ˆΘ).If event x=m⋂i=1xi , then P(x1,x2,…,xm|Θ=ˆΘ)=P(x1,x2,…,xm;Θ=ˆΘ)P(Θ=ˆΘ).
- We have no prior information about the value of θ, so we can maximise P(x1,x2,…,xm|Θ=ˆΘ),in order to maximise P(x1,x2,…,xm;Θ=ˆΘ).
If the observations x1,x2,…,xm are independent identically distributed (iid) and have come from a continuous distribution, we can write the joint density function of the observed data as a product of their marginal density functions. L(Θ)=m∏i=1f(xi|Θ=ˆΘ),
where f(x|Θ) is the probability density function (or probability mass function) of the continuous (discrete) random variable x. Since log is a monotonic function, we can find the MLE ˆΘ by maximising the log-likelihood function l(Θ)=log(L(Θ))=m∑i=1log(f(xi|Θ=ˆΘ)).
Maximising the likelihood
We are looking for Θ=ˆΘ that maximises log(L(Θ)). Optimal ˆΘ is found from solving the set of equations ∂f(x,Θ)∂θi=0,i=1,…,m,
A square real matrix M is positive definite if for all non-zero real vectors y we have y′My>0,
Example 1: Suppose that x1,x2,…,xn are observed from an Exponential distribution with parameter λ, find the MLE of the sample.
Solution.
The Likelihood function is L(λ)=n∏i=1f(xi|λ=ˆλ)=n∏i=1λe−λxi.
Example 2: Suppose that x1,x2,…,xn are observed from a Normal distribution with a mean of μ and a standard deviation of σ, find the MLE of the sample.
Solution.
The Likelihood function is L(μ,σ)=n∏i=11σ√2πe−(xi−μ)22σ2=1(σ√2π)n exp(−12σ2n∑i=1(xi−μ)2).
given that the square symmetric matrix M=[−∂2f(x,Θ)∂θi∂θj]mi,j=1,
is positive definite at Θ=ˆΘ. Note that [∂2f(x,Θ)∂θi∂θj]mi,j=1
is the Hessian Matrix.
A square real matrix M is positive definite if for all non-zero real vectors y we have y′My>0,
where y′ is the transpose of y. It is also sufficient to verify that all the eigenvalues of M are positive.
Example 1: Suppose that x1,x2,…,xn are observed from an Exponential distribution with parameter λ, find the MLE of the sample.
Solution.
The Likelihood function is L(λ)=n∏i=1f(xi|λ=ˆλ)=n∏i=1λe−λxi.
The log-likelihood function is l(λ)=log(L(λ))=n∑i=1(log(λ)−λxi)=nlog(λ)−λn∑i=1xi.
Differentiate with respect to λ, ∂l∂λ=nλ−n∑i=1xi.
Set the derivative to zero and solve for λ to get ˆλ=n∑ni=1xi=1ˉx,
which indeed gives a maximum, since ∂2l∂λ2=−nλ2<0.
Example 2: Suppose that x1,x2,…,xn are observed from a Normal distribution with a mean of μ and a standard deviation of σ, find the MLE of the sample.
Solution.
The Likelihood function is L(μ,σ)=n∏i=11σ√2πe−(xi−μ)22σ2=1(σ√2π)n exp(−12σ2n∑i=1(xi−μ)2).
The log-likelihood function is l(μ,σ)=−n log(σ√2π)−12σ2n∑i=1(xi−μ)2.
Differentiate with respect to μ and with respect to σ to get ∂l∂μ=1σ2n∑i=1(xi−μ);
∂l∂σ=−nσ+1σ3n∑i=1(xi−μ)2.
Equating the previous two equations to zero to get
ˆμ=1n∑ni=1xi=ˉx, ˆσ=√1n∑ni=1(xi−μ)2.
The likelihood function achieves its maximum at the estimated parameters.
Example 3: Suppose that x1<K,x2<K,…,xn<K, xn+1=xn+2=⋯=xn+m=K are observed as final marks of a re-sit exam, for which all the pass marks are capped at K. Find MLE for the original scores y1,y2,…,yn+m before the marks are capped, given that xi={yi:yi<KK:yi≥K}.
A mark of K indicates a score of at least K.
The Likelihood function is L(Θ)=(1−F(K))mn∏i=1f(xi|Θ=ˆΘ),
where F(x)=P(X≤x) is the cumulative distribution function of the fitted distribution.
Numerical approximations. Sometimes, it is not easy to evaluate the MLE analytically and a numerical method such as Newton-Raphson method can be used to approximate the value of the MLE. F(K), if it is unknown, it can be numerically evaluated by approximating the integral F(K)=∫K−∞f(x)dx.
Graphs
Figure 1: A fitted Normal distribution with parameters μ=1.65 and σ=1.17 and a fitted Weibull distribution with parameters λ=0.411 and γ=1.47.
Figure 2: A fitted four-parameter model using design and experiment based on only six points that are shown in blue.
Figure 3: A fitted four-parameter model using design and experiment.
References
- Charles M.Goldie ,
Probability and Statistics
, Lecture notes, University of Sussex, 2011-2012.
-
CT6 reading
provided by the Institute and Faculty of Actuaries, 2013.
- Glasserman P.,
Monte Carlo Methods in Financial Engineering
, Springer, 2004.
- John E. Freund's,
Mathematical Statistics with Applications
, Seventh Edition, 2004.
No comments:
Post a Comment