Problem 1
Part (1) (10 points)
Use accept-reject algorithm to generate a truncate standard Normal distribution, with density
Consider the proposed distribution .
Part (a)
Sample from using inverse-transform method.
Solution
First, we must find the cdf of . One way to derive is to notice that is the density of the shifted exponential, i.e., if , then has a cdf
As a quick proof that is the cdf of , note that .
Alternatively, the cdf is defined as
which obtains the same result. Either way, we solve for in
and thus
where is drawn from . That is, if then
is a random variable whose density is .
Part (b)
Use accept-reject algorithm to generate a sample of from f. Choose a so that . Verify the generated sample via a plot of the true normalized density, and a histogram of the generated values.
Solution
We wish to find a such that . However, we do not need , we only need the kernel of , which has already been given. So, ideally, we find the smallest that satisifes the inequality
where is the kernel of , which is given by . We may solve this analytically by finding
and then let $c = k(y^)/g(y^)$.
First, we let
We can find the value that maximizes by finding the value that maximizes , which is given by
The value $y^\log hy^=2\log h$ and thus
Thus, we sample from and from and accept the sample as being drawn from if
Here is the code that implements our sampling method for using acceptance-rejection sampling:
# the proposed density we can easily sample from
g <- function(y)
{
2*exp(-2*(y-2))
}
# inverse transform method for sampling from g
rg <- function(n)
{
-0.5*log(1-runif(n))+2
}
rf <- function(n)
{
data <- vector(length=n)
for (i in 1:n)
{
repeat
{
y <- rg(1)
u <- runif(1)
if (runif(1) <= exp(-(y-2)^2/2))
{
data[i] <- y
break
}
}
}
data
}
k <- function(y)
{
exp(-y^2/2)
}
f <- function(y)
{
# normalizing constant
Z <- 17.5358
Z*k(y)
}
To verify the sampling method, we drawn samples from and plot its histogram along with a plot of its density function .
data <- rf(10000)
ys <- seq(2,10,by=.1)
hist(data,freq=F,breaks=50,main="f")
lines(x=ys,f(ys),col="red")
The histogram seems compatible with the pdf .
Problem 2 (5 points)
Implement your accept-reject algorithm to get a sample of 10000 from . Verify that your method works via a plot of the true normalized density, and a histogram of the generated values.
Solution
This is not the approach you were looking for, but on the test I did poorly on this section.
Be that as it may, here is essentially my answer on the test, except that the solution is so slow that I decreased so that, at the extreme tail of the distribution, .
alpha <- 2.5
n <- 10000
c <- 10
a <- floor(alpha)
b <- a / alpha
ys <- vector(length=n)
for (i in 1:n)
{
repeat
{
y <- rgamma(n=1,shape=a,scale=b)
u <- runif(1)
P <- dgamma(y,shape=alpha,scale=1) / (c*dgamma(y,shape=a,scale=b))
if (u < P)
{
ys[i] <- y
break
}
}
}
hist(ys,freq=F,breaks=50)
lines(x=seq(.01,10,by=.05),y=dgamma(seq(.01,10,by=.05),shape=alpha,scale=1),col="red")
Problem 3 (30 points)
Considers animals randomly divided into four categories (four phenotypes) as follows: with cell probabilities . We observe .
Part (a)
Hand code your algorithm using Newton-Ralphson to find the maximum likelihood estimator of directly from the observed likelihood. Compare your result with what you get using the built-in function in R.
Solution
In part 1 of the exam, the log-likelihood given the data was determined to be
We derived the updating equation for the MLE of to be
where
and
Instead of taking the time to simplify this expression, we will just substitute these derivations into the updating equation in the following R code:
l1 <- function(theta,x)
{
x[1]/(theta+2) + (x[2]+x[3])/(theta-1) + x[4]/theta
}
l2 <- function(theta,x)
{
-x[1]/(theta+2)^2 - (x[2]+x[3])/(theta-1)^2 - x[4]/theta^2
}
theta_mle <- function(x, start = 0.5, eps = 1e-6)
{
i <- 0
theta0 <- start
theta1 <- NULL
repeat
{
theta1 <- theta0 - l1(theta0,x) / l2(theta0,x)
if (abs(theta1-theta0) < eps) { break }
theta0 <- theta1
i <- i + 1
}
list(mle=theta1,iterations=i)
}
We invoke the updating equations on the given data to estimate with the following R code:
# observed data
x <-c(125, 18, 20, 34)
sol <- theta_mle(x=x, start=0.5, eps=1e-6)
mle <- sol$mle
mle_iterations <- sol$iterations
We see that the MLE converges to a solution around after . We compare this result with the built-in procedure, :
# the function to maximize, the log-likelihood function
loglike <- function(theta)
{
x[1]*log(0.5+theta/4) + x[2]*log(0.25*(1-theta)) + x[3]*log(0.25*(1-theta)) + x[4]*log(0.25*theta)
}
# since optim finds the value that minimizes the function, we provide it with
# the negative of the log-likelihood.
optim(0.5,function(theta) { -loglike(theta) },lower=0.1,upper=0.9,method="L-BFGS-B")$par
## [1] 0.626821
The value found is approximately the same. It was fussy with explicitly providing upper and lower bounds. This warrants further investigation, but for another time.
Part (b)
Implement the E-M algorithm to find the the maximum likelihood estimator of with the ``augmented’’ data with missing information . Compare your result with part (a).
Solution
We split the first category into two,
with cell probabilities .
So, and are \emph{unobserved}. The pdf is given by
where
Thus, the complete log-likelihood is given by
which may be written as
E-step
The function we seek to maximize is thus given by
which is given by
which by the property of linearity of expectations is equivalent to
Note that , , and are respectively observed to be ,, and thus, for instance,
The missing data is distributed
and thus
Putting it all together, we may rewrite as
M-step
In the M-step, we seek
which can be found by solving
Since the final solution is a little noisy when we leave symbolic, we are going to finish the solution with the observed values of . So,
has the solution, when we replace with ,
Iterating update equation until some stopping condition that signifies convergence may then be done. We denote the converged solution of the EM algorithm by
We implement the EM algorithm in the following R code:
theta0 <- .5
theta1 <- NULL
i <- 1
repeat
{
theta1 <- (159*theta0 + 68)/(197*theta0 + 144)
cat("theta[",i,"] = ", theta1,"\n")
if (abs(theta1-theta0) < 1e-6) { break }
theta0 <- theta1
i <- i + 1
}
## theta[ 1 ] = 0.6082474
## theta[ 2 ] = 0.6243211
## theta[ 3 ] = 0.6264889
## theta[ 4 ] = 0.6267773
## theta[ 5 ] = 0.6268156
## theta[ 6 ] = 0.6268207
## theta[ 7 ] = 0.6268214
print(theta1)
## [1] 0.6268214
We see the standard MLE and both obtain the same solution up to decimal places when using the same starting value and stopping condition. However, required more iterations before convergence, which was expected given that the EM algorithm is of linear order while the MLE using Newton-raphson is of quadratic order. However, the EM algorithm does have the benefit of a less complex updating equation.
Part (c)
Find the standard error of the MLE using either a numerical calculation of the inverse of the observed fisher’s information, i.e , or using Louis’ Method.
Solution
We already have the second derivative of the log-likelihood, so we choose to use the observed Fisher information evaluated at .
obs_fisher <- -l2(mle,x)
var_mle <- 1 / obs_fisher
sqrt(var_mle)
## [1] 0.05146735
We see that an estimate of standard error of the MLE is .
Problem 4 (10 points )
Consider the density (problems 6, Homework 1).
Part (a)
Compute the exact normalizing constant, both in closed form using (pencil and paper), and also to 5 decimal places (by evaluating the exact version in R).
Solution
It is given that
i.e., , where
and is the normalizing constant that satisfies the equation
Solving for yields the result
Each mode of the bimodal distribution resembles a normal distribution with . We may rewrite the above to
which is equivalent to
Since is a pdf, each integral evaluates to , and thus
is the normalizing constant, which is 25.06628 to decimal places.
Using R’s numerical integrator, we get the result:
C <- 10*sqrt(2*pi)
ker <- function(x) { 3*exp(-0.5*(x+2)^2) + 7*exp(-0.5*(x-2)^2) }
res <- integrate(ker,lower = -Inf, upper = Inf)
print(res)
## 25.06628 with absolute error < 4e-05
Part (b)
Approximate this integral with a Simpson’s rule to three decimal places. What is the effective range and number of subintervals required?
Solution
Simpson’s rule is implemented by the following R code. Note that we did not bother to optimize it.
# simpson : numerical integrator applying simpson's rule to n subintervals
# over the range (a,b).
#
# arguments;
# f : the function to integrate
# a : the lower-bound
# b : the upper-bound. (a,b) together discrete the range.
# n : the number of subintervals to partition the range
#
# we evenly partition the range into n subintervals of size (b-a)/n
# and apply simson's rule to each subinterval. then, we accumulate these
# values and return the result.
simpson <- function(f, a, b, n)
{
h <- (b-a)/n
s <- 0
x <- a
for (i in 1:(n/2))
{
s <- s + f(x) + 4 * f(x+h) + f(x+2*h)
x <- x + 2*h
}
s*h/3
}
To test effective range and subintervals, we decided to make a program that exhaustively searches for the minimum range and number of subintervals over a discrete set of points. In particular, we search for a range of the form that is symmetric and an even . It is not perfect, but it seems like a reasonable way to estimate these requirements.
Here is the R code:
R <- NULL
N <- NULL
found <- F
for (r in 1:200)
{
for (n in 1:100)
{
res <- simpson(ker,-r/10,r/10,2*n)
if (abs(res - C) < 0.001)
{
R <- r/10
N <- 2*n
cat("r = +-", R, ", n = ", N, ", C = ", res, "\n")
found <- T
break
}
}
if (found) { break }
}
## r = +- 5.9 , n = 16 , C = 25.06603
The minimum range , , when divided into subintervals, that is the same as the true value to decimal places is given by and .