Problem 1
Consider finding when has the density that is proportional to
Part (a)
Find the Monte Carlo estimate of using acceptance-rejection sampling. Take the candidate distribution as the double exponential, i.e., .
Solution
We are interested in sampling from a kernel .
q <- function(x) { exp(x)/(exp(2*x)+1) }
Suppose is the double exponential random variable with the kernel .
g <- function(x) { exp(-abs(x)) }
Since the double exponential is symmetric about its mean , we can simply sample and let with probability and with probability .
ry <- function(n)
{
ys <- numeric(n)
ws <- rexp(n=n,rate=1)
us <- runif(n)
for (i in 1:n) { ys[i] <- ifelse(us[i] < 0.5,-ws[i],ws[i]) }
ys
}
To find the constant such that , we solve
We forgo a formal proof and point out that the denominator is always larger than the numerator, i.e., . However, as goes to , goes to and thus . We implement the acceptance-rejection sampler for with kernel with:
rx <- function(n)
{
xs <- numeric(n)
for (i in 1:n)
{
repeat
{
y <- ry(1)
if (runif(1) < q(y)/g(y))
{
xs[i] <- y
break
}
}
}
xs
}
We estimate by taking the square of a sample:
x <- rx(10000)
print(mean(x^2))
## [1] 2.43841
Part (b)
Find the normalizing constant of the pdf by integrating over the support. Then derive the CDF of .
Solution
The normalizing constant is given by solving for in
Let , then , , and by symmetry about ,
The antiderivative is just , and thus
Since , as , , that means that as . Furthermore, when , thus
We verify with a numerical integrator:
right_riemann_sum <- function(f, a, b, n)
{
h <- (b-a)/n
h*sum(f(a + (1:n)*h))
}
When we apply the numerical integrator to the kernel and subtract we obtain a result that is approximately , confirming our earlier calculation:
right_riemann_sum(q,-10,10,1000) - pi/2
## [1] -9.080289e-05
Thus, the density of is given by
The cdf of is thus given by
which simplifies to
pdf <- function(x) { 2/pi * q(x) }
cdf <- function(x) { 2/pi * arctan(exp(x)) }
Part (c)
Generate a sample of using inverse transform method and find the Monte Carlo estimate of .
Solution
To apply the inverse transform method, we solve for in ,
Thus, a sampler for is given by
where .
rx.transform <- function(n)
{
us <- runif(n)
log(tan(pi/2*us))
}
We estimate with:
x <- rx.transform(10000)
print(mean(x^2))
## [1] 2.508885
Part (d)
Repeat the estimation using importance sampling with standardized weights.
Solution
#' importance sampling
#'
#' estimates E{h(X)} by taking n sample points from g and then taking a
#' weighted mean using the standardized weights.
rx.importance <- function(n,h)
{
w = function(x) {
out <- q(x)/g(x)
out/sum(out)
}
ys <- ry(n)
sum(w(ys)*h(ys))
}
We apply the procedure to to estimate :
print(rx.importance(10000,function(x) { x^2} ))
## [1] 2.437009
Problem 2
Consider the inverse Gaussian distribution with density
where . Estimate using MCMC. You may take the proposal distribution as a .
Solution
Here is our implementation of the Metropolis-Hastings algorithm:
# A sampling procedure from the pdf f using Metropolis-Hastings algorithm
rf <- function(n, theta1, theta2, burn=0)
{
g <- function(x) { dgamma(x,shape=sqrt(theta2/theta1),rate=1) }
rg <- function(n) { rgamma(n,shape=sqrt(theta2/theta1),rate=1) }
ker <- function(x) { x^(-1.5)*exp(-theta1*x-theta2/x) }
m <- n + burn
xs <- vector(length = m)
xs[1] <- rg(1)
for (i in 2:m) {
v <- rg(1)
u <- xs[i-1]
R <- ker(v) * g(u) / (ker(u) * g(v))
if (runif(1) <= R) { xs[i] <- v }
else { xs[i] <- u }
}
xs[(burn+1):m]
}
We apply the algorithm to and :
n <- 100000
theta1 <- 3
theta2 <- 5
xs <- rf(n,theta1,theta2,burn=20000)
Now, we estimate with:
print(mean(xs))
## [1] 1.28597
We have deduced that the true mean is given by , and so we see the algorithm provides a reasonable estimation.
We would like to plot the histogram with the density superimposed on top of it. So, we implement the density function with:
f.make <- function(theta1,theta2)
{
k <- function(x) { x^(-1.5) * exp(-theta1*x - theta2/x) }
Z <- right_riemann_sum(k,0,20,10000)
function(x) { k(x) / Z }
}
f <- f.make(theta1,theta2)
hist(xs,breaks=200,freq=F)
ps <- seq(0,10,by=.01)
lines(x=ps,y=f(ps))

Problem 3
Consider the data on coal-mining disasters from 1851 to 1962 (coal.txt data on blackboard). The rate of accidents per year appears to decrease around 1900, so we consider a change-point model for these data. Let be the number of accidents in year . , , and , . The change-point occurs after the -th year in the series. This model has parameters are . Below are three sets of priors for a Bayesian analysis of this model. Assume prior for , and assume follows a discrete uniform distribution over .
Part (a)
Derive the posterior distribution of .
Solution
First, we derive the distribution of ,
where
Thus, the likelihood is
which up to proportionality of the parameters may be written as
where .
The posterior distribution is given by
which up to proportionality may be rewritten as
where and .
We are given the priors , and thus
Putting it all together, the posterior distribution is given by
where , , and .
Part (b)
Derive the conditional posterior distributions necessary to carry out Gibbs sampling for this change-point model.
Solution
First, we find the conditional posterior distribution for by simply dropping those factors in the posterior that are not a function of ,
which is the kernel for ,
Next, we find the conditional posterior distribution for using the same argument,
which is the kernel for ,
Finally, we find the conditional posterior distribution for . First, we drop factors that are not a function of ,
Simplifying,
which is the kernel of a probability mass function parameterized by and .
To generate random variates from the conditional posterior of , we sample from where is realized with probability . Algorithmically, let and then realizes if
Additional analysis
Note that has a prior that only assigns non-zero values to where (there are data rows in the ``coal.txt’’ data file). Thus, given a prior that assigns probability to the outcome , the conditional posterior distribution for also assigns zero probability to . Intuitively, it makes sense that no amount of evidence is sufficient to overcome a prior of zero probability.
Furthermore, recall that is the change-point. If the change-point occurs at , then in fact there was no change-point and for . In other words, assigning a prior probability of is equivalent to claiming there is a change point.
We may generalize this so that if we are given prior information that , we may assign zero probability to that event in the prior, i.e., . Of course, we may assign a non-uniform prior over the support, as well. The posterior distribution from a previous experiment is a likely candidate for such a prior.
Part (c)
Implement the Gibbs sampler. Use a suite of convergence diagnostics to evaluate the convergence and mixing of your sampler.
Solution
We load the data with:
x <- read.table("coal.txt",header=T)[,2]
N <- length(x)
We implement the conditional samplers with:
t <- function(theta) { sum(x[1:theta]) }
rlambda1 <- function(n,theta) { rgamma(n,t(theta)+1,theta+1) }
rlambda2 <- function(n,theta) { rgamma(n,t(N)-t(theta)+1,N-theta+1) }
ktheta <- Vectorize(function(theta,lambda1,lambda2)
{
if (theta < 1 || theta >= N) { return(0) }
(lambda1/lambda2)^t(theta)*exp(theta*(lambda2-lambda1))
},"theta")
ptheta <- function(theta,lambda1,lambda2)
{
Z <- sum(ktheta(1:(N-1),lambda1,lambda2))
ktheta(theta,lambda1,lambda2)/Z
}
rtheta = function(n,lambda1,lambda2)
{
sample(x=1:(N-1),n,replace=T,prob=ptheta(1:(N-1),lambda1,lambda2))
}
rlambda1.prior <- function() { rgamma(1,shape=3,rate=1) }
rlambda2.prior <- function() { rgamma(1,shape=3,rate=1) }
rtheta.prior <- function() { sample(1:(N-1),1,replace=T) }
We implement the Gibbs sampling with:
gibbs <- function(n,burn=1000)
{
nn <- n+burn
thetas <- matrix(nrow=nn,ncol=3)
thetas[1,] <- c(rtheta.prior(),rlambda1.prior(),rlambda2.prior())
for (i in 1:(nn-1))
{
theta.new <- rtheta(1,thetas[i,2],thetas[i,3])
lambda1.new <- rlambda1(1,theta.new)
lambda2.new <- rlambda2(1,theta.new)
thetas[i+1,] <- c(theta.new,lambda1.new,lambda2.new)
}
thetas <- thetas[(burn+1):nn,]
theta.est <- mean(thetas[,1])
lambda1.est <- mean(thetas[,2])
lambda2.est <- mean(thetas[,3])
list(theta.dist=thetas,
theta.est=theta.est,
lambda1.est=lambda1.est,
lambda2.est=lambda2.est)
}
n <- 5000
burn <- 0
res <- gibbs(n,burn)
param.est <- c(res$theta.est,res$lambda1.est,res$lambda2.est)
names(param.est) <- c("theta","lambda1","lambda2")
knitr::kable(data.frame(param.est))
| param.est | |
|---|---|
| theta | 40.0884000 |
| lambda1 | 3.0650183 |
| lambda2 | 0.9216624 |
Now, we plot the marginals for the estimator.
par(mfrow=c(1,3))
hist(res$theta.dist[,1],freq=F,main="theta",xlab="",breaks=50)
hist(res$theta.dist[,2],freq=F,main="lambda1",xlab="",breaks=50)
hist(res$theta.dist[,3],freq=F,main="lambda2",xlab="",breaks=50)

plot(res$theta.dist[1:100,1],type="l",main="",ylab="theta")
plot(res$theta.dist[1:100,2],type="l",main="initial sample path",ylab="lambda1")
plot(res$theta.dist[1:100,3],type="l",main="",ylab="lambda2")

plot(res$theta.dist[2000:2100,1],type="l",main="",ylab="theta")
plot(res$theta.dist[2000:2100,2],type="l",main="later sample path",ylab="lambda1")
plot(res$theta.dist[2000:2100,3],type="l",main="",ylab="lambda2")

acf(res$theta.dist[,1],main="theta")
acf(res$theta.dist[,2],main="lambda1")
acf(res$theta.dist[,3],main="lambda2")

They all satisfy normality (approximately symmetric around the mean), converge quickly, exhibit low autocorrelation (ACF decays quickly), and show good support (they vigorously jiggle around the mean).
Problem 4
Compare bootstrapped CIs for the population 90th percentile to the large sample estimate as in the notes for (a) data, (b) data, (c) data, and (d) data. For sample sizes of and replicate .
Solution
Suppose we have a sample where is the cdf. An estimate of is given by
Assuming is continuous, the -th \emph{quantile} of a distribution is given by
which may be estimated with $q_n = F^{-1}n(p) = x{[n p]}$.
Delta method
Asymptotically,
Since the pdf is not known, we must estimate it with (note that we cannot simply take the derivative of since it is a step function). We use the built-in R function to estimate .
Using the asymptotic normality, we see that the variance of our estimator may be estimated as
and thus an confidence interval is given by
We implement the confidence interval estimator for with the following:
q.ci.delta <- function(samp,p,alpha,bw="bcv")
{
z <- qnorm(1-alpha/2)
q.est <- quantile(samp,p)
f <- density(samp,bw=bw,n=1,from=q.est,to=q.est)$y
se <- sqrt(p*(1-p)/(n*f^2))
list(estimate=q.est,p=p,alpha=alpha,ci=c(q.est-z*se,q.est+z*se))
}
Bootstrap method
Next, we consider the Bootstrap method. The statistic for the Bootstrap method, the -th quantile, is given by:
# p-quantile statistic that we provide as input to the
# bootstrap function
q.stat <- function(p=0.9)
{
function(x,indices)
{
f <- quantile(x[indices],p)
names(f) <- NULL
f
}
}
We implement the Bootstrap confidence interval estimator for with the following:
q.ci.bs <- function(samp,p,alpha,B)
{
library(boot)
b <- boot(samp, q.stat(p), B)
ci <- boot.ci(b,conf=1-alpha,type="perc")$percent
list(estimate=quantile(samp,p),p=p,alpha=alpha,ci=c(ci[4],ci[5]))
}
We simply sample from the empirical distribution (resample the observed sample) and apply the quantile statistic to each resample, geting a sampling distribution
and then pick the -th percentile.
Part (a)
Compute coverage probabilities of the two intervals and average interval length. (You need to run the intervals for times.)
Solution
To capture the data for each random variable, we encapsulate the process into a procedure that may be invoked for any function that models a random variable and whose first parameter denotes the sample size.
#' Retrieve CI stats for the specified method and random variable
#'
#' @param q.star the true value for the p-th quantile
#' @param rv the random variable generator, of type n -> R^n
#' @param n size of sample to generate
#' @param M number of trials
#' @param type "bootstrap" or "delta"
#' @param B number of bootstrap replicates (only relevant if type == "bootstrap")
#' @returns a vector of the relevant results
ci.stats <- function(q.star, rv, p, n, M=250, type="bootstrap", B=500)
{
coverage.prop <- 0
avg.length <- 0
for (i in 1:M)
{
samp <- rv(n)
res <- NULL
if (type=="bootstrap") { res <- q.ci.bs(samp,p,alpha,B) }
else { res <- q.ci.delta(samp,p,alpha) }
avg.length <- avg.length + (res$ci[2] - res$ci[1])
if (q.star >= res$ci[1] && q.star <= res$ci[2]) {
coverage.prop <- coverage.prop + 1
}
}
avg.length <- avg.length / M
coverage.prop <- coverage.prop / M
c(q.star,res$estimate,coverage.prop,avg.length)
}
Now, we use to generate and compute the data:
p <- 0.9 # p-th quantile
alpha <- 0.05 # CI alpha level
n <- 100 # sample size
B <- 500 # bootstrap replicates
M <- 500
# simulate and compute the data
tab <- matrix(nrow=8,ncol=4)
rownames(tab) <- c("exp(1): delta",
"exp(1): bootstrap",
"N(0,1): delta",
"N(0,1): bootstrap",
"U(0,1): delta",
"U(0,1): bootstrap",
"X^2(1): delta",
"X^2(1): bootstrap")
colnames(tab) <- c("q.star", "q.est", "coverage", "average length")
tab[1,] <- ci.stats(qexp(p),rexp,p,n,M,"delta")
tab[2,] <- ci.stats(qexp(p),rexp,p,n,M,"bootstrap",B)
tab[3,] <- ci.stats(qnorm(p),rnorm,p,n,M,"delta")
tab[4,] <- ci.stats(qnorm(p),rnorm,p,n,M,"bootstrap",B)
tab[5,] <- ci.stats(qunif(p),runif,p,n,M,"delta")
tab[6,] <- ci.stats(qunif(p),runif,p,n,M,"bootstrap",B)
tab[7,] <- ci.stats(qchisq(p,df=1),function(n) { rchisq(n,df=1) },p,n,M,"delta")
tab[8,] <- ci.stats(qchisq(p,df=1),function(n) { rchisq(n,df=1) },p,n,M,"bootstrap",B)
Part (b)
Summarizing your results in a table. Comment on your findings. Which is better?
Solution
knitr::kable(data.frame(tab))
| q.star | q.est | coverage | average.length | |
|---|---|---|---|---|
| exp(1): delta | 2.302585 | 2.3482875 | 0.876 | 1.0330318 |
| exp(1): bootstrap | 2.302585 | 1.8352320 | 0.938 | 1.1304459 |
| N(0,1): delta | 1.281552 | 1.3145702 | 0.924 | 0.6211391 |
| N(0,1): bootstrap | 1.281552 | 1.1113686 | 0.944 | 0.6633263 |
| U(0,1): delta | 0.900000 | 0.8950180 | 0.970 | 0.1482522 |
| U(0,1): bootstrap | 0.900000 | 0.8892106 | 0.938 | 0.1217511 |
| X^2(1): delta | 2.705544 | 2.7834884 | 0.858 | 1.5984722 |
| X^2(1): bootstrap | 2.705544 | 2.0984550 | 0.944 | 1.8483615 |
Overall, the Bootstrap estimator of CI is more compatible with the claimed confidence interval.
Since both are asymptotically normal, if we were to increase the sample size, they should converge to the coverage with an expected length given by
where
Supplementary Material
Problem 1 supplementary material: inverse transform method
In problem 1, is the double exponential random variable and we sampled from it by recognizing we can simply sample from the exponential and taking its negative with probability .
Now, we consider the inverse transform method. We are given the kernel , but for the inverse transform method we need the cdf. The normalizing constant is given by
which may be rewritten as
which has the solution . Thus, random variable that has a kernel has a density
g.density <- function(y) { 0.5*exp(-abs(y)) }
The cdf is defined as
If , then
and if , then
which may be written as the piece-wise function
Now, we find the inverse of . Over , the inverse of is given by
Solving for , we obtain . Note that given , . Over , the inverse of is given by
Solving for , we obtain
where . We thus have a piecewise function
Thus, to sample from , we observe from and, if , we let and otherwise let .
We implement this sampler with:
ry.itm <- function(n)
{
ys <- numeric(n)
for (i in 1:n)
{
u <- runif(1)
if (u <= 0.5) { ys[i] <- log(2*u) }
else { ys[i] <- -log(2-2*u)}
}
ys
}
Inverse transform method for sampling from random variable with kernel
First, we can use the inverse-transform method. The kernel of is , and so the normalizing constant is given by
which may be rewritten as
which has the solution . Thus, random variable that has a kernel has a density
g.density <- function(x) { 0.5*exp(-abs(x)) }
The cdf is defined as
If , then
and if , then
which may be written as the piece-wise function
Now, we find the inverse of . Over , the inverse of is given by
Solving for , we obtain . Note that given , . Over , the inverse of is given by
Solving for , we obtain
where . We thus have a piecewise function
Thus, to sample from , we observe from and, if , we let and otherwise let .
ry <- function(n)
{
ys <- numeric(n)
for (i in 1:n)
{
u <- runif(1)
if (u <= 0.5) { ys[i] <- log(2*u) }
else { ys[i] <- -log(2-2*u)}
}
ys
}