Skip to main content

Computational Statistics - SIUe - STAT 575 - Exam 1

Problem 1

Consider finding σ2=E(X2)\sigma^2 = E(X^2) when XX has the density that is proportional to

q(x)=exe2x+1,x(,). q(x) = \frac{e^x}{e^{2x}+1}, x \in (-\infty,\infty).

Part (a)

Find the Monte Carlo estimate of σ2\sigma^2 using acceptance-rejection sampling. Take the candidate distribution as the double exponential, i.e., g(x)=exp(x)g(x) = \exp(-|x|).

Solution

We are interested in sampling XX from a kernel qq.

q <- function(x) { exp(x)/(exp(2*x)+1) }

Suppose YY is the double exponential random variable with the kernel gg.

g <- function(x) { exp(-abs(x)) }

Since the double exponential is symmetric about its mean 00, we can simply sample WEXP(λ=1)W \sim \mathrm{EXP}(\lambda=1) and let Y=WY = -W with probability 0.50.5 and Y=WY = W with probability 0.50.5.

ry <- function(n)
{
  ys <- numeric(n)
  ws <- rexp(n=n,rate=1)
  us <- runif(n)
  for (i in 1:n) { ys[i] <- ifelse(us[i] < 0.5,-ws[i],ws[i]) }
  ys
}

To find the constant cc such that f(x)/cg(x)1f(x)/c g(x) \leq 1, we solve

c=maxx{q(x)g(x)}=maxx{ex+xe2x+1}. \begin{align*} c &= \max_x \left\{\frac{q(x)}{g(x)}\right\}\\ &= \max_x \left\{\frac{e^{x+|x|}}{e^{2x}+1}\right\}. \end{align*}

We forgo a formal proof and point out that the denominator is always larger than the numerator, i.e., q(x)/g(x)<1q(x)/g(x) < 1. However, as xx goes to ±\pm \infty, q(x)/g(x)q(x)/g(x) goes to 11 and thus c=1c = 1. We implement the acceptance-rejection sampler for XX with kernel qq with:

rx <- function(n)
{
  xs <- numeric(n)
  for (i in 1:n)
  {
    repeat
    {
      y <- ry(1)
      if (runif(1) < q(y)/g(y))
      {
        xs[i] <- y
        break
      }
    }
  }
  xs
}

We estimate E(X2)E(X^2) by taking the square of a sample:

x <- rx(10000)
print(mean(x^2))
## [1] 2.43841

Part (b)

Find the normalizing constant of the pdf by integrating q(x)q(x) over the support. Then derive the CDF of XX.

Solution

The normalizing constant is given by solving for ZZ in

1Zq(x)dx=1. \frac{1}{Z} \int_{-\infty}^{\infty} q(x) dx = 1.

Let u=exu = e^x, then du=exdxdu = e^x dx, e2x=(ex)2=u2e^{2x} = (e^{x})^2 = u^2, and by symmetry about 00,

Z=2111+u2du. Z = 2 \int_{1}^{\infty} \frac{1}{1+u^2} du.

The antiderivative is just arctan(u)\mathrm{arctan}(u), and thus

Z=2(arctan()arctan(1)) Z = 2 (\mathrm{arctan}(\infty) - \mathrm{arctan}(1))

Since tany=siny/cosy=x\tan y = \sin y / \cos y = x, as yπ/2y \to \pi/2, xx \to \infty, that means that arctan(x)π/2\mathrm{arctan}(x) \to \pi/2 as xx \to \infty. Furthermore, tan(x)=1\tan(x)=1 when x=π/4x=\pi/4, thus

Z=2(2π/4π/4)=2(π/4)=π/2. Z = 2 (2\pi/4 - \pi/4) = 2(\pi/4) = \pi/2.

We verify with a numerical integrator:

right_riemann_sum <- function(f, a, b, n)
{
  h <- (b-a)/n
  h*sum(f(a + (1:n)*h))
}

When we apply the numerical integrator to the kernel qq and subtract π/2\pi/2 we obtain a result that is approximately 00, confirming our earlier calculation:

right_riemann_sum(q,-10,10,1000) - pi/2
## [1] -9.080289e-05

Thus, the density of XX is given by

f(x)=2πexe2x+1. f(x) = \frac{2}{\pi} \frac{e^x}{e^{2x}+1}.

The cdf of XX is thus given by

F(x)=xf(s)ds=2π(arctan(ex)arctan(0)) F(x) = \int_{-\infty}^{x} f(s) ds = \frac{2}{\pi} (\mathrm{arctan}(e^x) - \mathrm{arctan}(0))

which simplifies to

F(x)=2πarctan(ex). F(x) = \frac{2}{\pi} \mathrm{arctan}(e^x).
pdf <- function(x) { 2/pi * q(x) }
cdf <- function(x) { 2/pi * arctan(exp(x)) }

Part (c)

Generate a sample of XX using inverse transform method and find the Monte Carlo estimate of σ2\sigma^2.

Solution

To apply the inverse transform method, we solve for xx in p=F(x)p = F(x),

p=2πarctan(ex)π2p=arctan(ex)tan(π2p)=exx=log(tan(π2p)). \begin{align*} p &= \frac{2}{\pi} \mathrm{arctan}(e^x)\\ \frac{\pi}{2} p &= \mathrm{arctan}(e^x)\\ \tan\left(\frac{\pi}{2} p\right) &= e^x\\ x &= \log\left(\tan\left(\frac{\pi}{2} p\right)\right). \end{align*}

Thus, a sampler for XX is given by

X=log(tan(π2U)) X = \log\left(\tan\left(\frac{\pi}{2} U\right)\right)

where UUNIF(0,1)U \sim \mathrm{UNIF}(0,1).

rx.transform <- function(n)
{
  us <- runif(n)
  log(tan(pi/2*us))
}

We estimate σ2\sigma^2 with:

x <- rx.transform(10000)
print(mean(x^2))
## [1] 2.508885

Part (d)

Repeat the estimation using importance sampling with standardized weights.

Solution

#' importance sampling
#'
#' estimates E{h(X)} by taking n sample points from g and then taking a
#' weighted mean using the standardized weights.
rx.importance <- function(n,h)
{
  w = function(x) {
     out <- q(x)/g(x)
     out/sum(out)
  }

  ys <- ry(n)
  sum(w(ys)*h(ys))
}

We apply the procedure to h(x)=x2h(x) = x^2 to estimate E(h(X))E(h(X)):

print(rx.importance(10000,function(x) { x^2} ))
## [1] 2.437009

Problem 2

Consider the inverse Gaussian distribution with density

f(xθ1,θ2)x1.5exp{θ1xθ2x+ψ(θ1,θ2)} f(x|\theta_1,\theta_2) \propto x^{-1.5} \exp\left\{-\theta_1 x - \frac{\theta_2}{x} + \psi(\theta_1,\theta_2)\right\}

where ψ(θ1,θ2)=2θ1θ2+log(2θ2)\psi(\theta_1,\theta_2) = 2\sqrt{\theta_1\theta_2} + \log(2\theta_2). Estimate E(X)E(X) using MCMC. You may take the proposal distribution as a Gamma(θ2/θ1,1)\mathrm{Gamma}(\sqrt{\theta_2/\theta_1},1).

Solution

Here is our implementation of the Metropolis-Hastings algorithm:

# A sampling procedure from the pdf f using Metropolis-Hastings algorithm
rf <- function(n, theta1, theta2, burn=0)
{
    g <- function(x) { dgamma(x,shape=sqrt(theta2/theta1),rate=1) }
    rg <- function(n) { rgamma(n,shape=sqrt(theta2/theta1),rate=1) }
    ker <- function(x) { x^(-1.5)*exp(-theta1*x-theta2/x) }
  
    m <- n + burn
    xs <- vector(length = m)
    xs[1] <- rg(1)
    
    for (i in 2:m) {
        v <- rg(1)
        u <- xs[i-1]
        R <- ker(v) * g(u) / (ker(u) * g(v))
        if (runif(1) <= R) { xs[i] <- v }
        else { xs[i] <- u }
    }
    xs[(burn+1):m]
}

We apply the algorithm to θ1=5\theta_1 = 5 and θ2=3\theta_2 = 3:

n <- 100000
theta1 <- 3
theta2 <- 5
xs <- rf(n,theta1,theta2,burn=20000)

Now, we estimate E(X)E(X) with:

print(mean(xs))
## [1] 1.28597

We have deduced that the true mean is given by μ=θ2/θ1\mu = \sqrt{\theta_2/\theta_1}, and so we see the algorithm provides a reasonable estimation.

We would like to plot the histogram with the density superimposed on top of it. So, we implement the density function with:

f.make <- function(theta1,theta2)
{
  k <- function(x) { x^(-1.5) * exp(-theta1*x - theta2/x) }
  Z <- right_riemann_sum(k,0,20,10000)
  function(x) { k(x) / Z }
}
f <- f.make(theta1,theta2)
hist(xs,breaks=200,freq=F)
ps <- seq(0,10,by=.01)
lines(x=ps,y=f(ps))

plot of chunk unnamed-chunk-17

Problem 3

Consider the data on coal-mining disasters from 1851 to 1962 (coal.txt data on blackboard). The rate of accidents per year appears to decrease around 1900, so we consider a change-point model for these data. Let XjX_j be the number of accidents in year jj. XjPoisson(λ1)X_j \sim \mathrm{Poisson}(λ_1), j=1,,θj = 1, \ldots, θ, and XjPoisson(λ2)X_j \sim \mathrm{Poisson}(λ_2), j=θ+1,,112j = θ + 1, \ldots, 112. The change-point occurs after the θθ-th year in the series. This model has parameters are θ,λ1,λ2θ, λ_1, λ_2. Below are three sets of priors for a Bayesian analysis of this model. Assume prior λiGamma(3,1)λ_i \sim \mathrm{Gamma}(3, 1) for i=1,2i = 1, 2, and assume θθ follows a discrete uniform distribution over 1,,111{1, \ldots, 111}.

Part (a)

Derive the posterior distribution of (θ,λ1,λ2)(θ, λ_1, λ_2).

Solution

First, we derive the distribution of Xi(λ1,λ2,θ)X_i|(\lambda_1,\lambda_2,\theta),

Xif(xiλ1,λ2,θ) X_i \sim f(x_i | \lambda_1,\lambda_2,\theta)

where

f(xiλ1,λ2,θ)=λ1xieλ1xi!I(iθ)+λ2xieλ2xi!I(i>θ). f(x_i | \lambda_1,\lambda_2,\theta) = \frac{\lambda_1^{x_i} e^{-\lambda_1}}{x_i!} I(i \leq \theta) + \frac{\lambda_2^{x_i} e^{-\lambda_2}}{x_i!} I(i > \theta).

Thus, the likelihood is

L(λ1,λ2,θx)=i=1θλ1xieλ1xi!θ+1nλ2xieλ2xi! L(\lambda_1,\lambda_2,\theta|\vec{x}) = \prod_{i=1}^{\theta} \frac{\lambda_1^{x_i} e^{-\lambda_1}}{x_i!} \prod_{\theta+1}^{n} \frac{\lambda_2^{x_i} e^{-\lambda_2}}{x_i!}

which up to proportionality of the parameters may be written as

L(λ1,λ2,θx)λ1t(θ)eθλ1λ2t(n)t(θ)e(nθ)λ2. L(\lambda_1,\lambda_2,\theta|\vec{x}) \propto \lambda_1^{t(\theta)} e^{-\theta \lambda_1} \lambda_2^{t(n)-t(\theta)} e^{-(n-\theta) \lambda_2}.

where t(θ)=i=1θxit(\theta) = \sum_{i=1}^{\theta} x_i.

The posterior distribution is given by

f(θ,λ1,λ2x)=f(xθ,λ1,λ2)f(θ,λ1,λ2)/f(x) f(\theta,\lambda_1,\lambda_2|\vec{x}) = f(\vec{x}|\theta,\lambda_1,\lambda_2)f(\theta,\lambda_1,\lambda_2)/f(\vec{x})

which up to proportionality may be rewritten as

f(θ,λ1,λ2x)L(θ,λ1,λ2x)f(θ)f(λ1)f(λ2). f(\theta,\lambda_1,\lambda_2|\vec{x}) \propto L(\theta,\lambda_1,\lambda_2|\vec{x})f(\theta)f(\lambda_1)f(\lambda_2).

where f(θ)I(0θ<n)f(\theta) \propto I(0 \leq \theta < n) and f(λj)λj2exp(λj)f(\lambda_j) \propto \lambda_j^2 \exp(-\lambda_j).

We are given the priors λjGAM(3,1)\lambda_j \sim \mathrm{GAM}(3,1), and thus

f(λj)λj2eλj. f(\lambda_j) \propto \lambda_j^2 e^{-\lambda_j}.

Putting it all together, the posterior distribution is given by

f(θ,λ1,λ2x)λ1t(θ)eθλ1λ2t(n)t(θ)e(nθ)λ2λ12eλ1λ22eλ2 f(\theta,\lambda_1,\lambda_2|\vec{x}) \propto \lambda_1^{t(\theta)} e^{-\theta \lambda_1} \lambda_2^{t(n)-t(\theta)} e^{-(n-\theta) \lambda_2} \lambda_1^2 e^{-\lambda_1} \lambda_2^2 e^{-\lambda_2}

where t(θ)=i=1θxit(\theta) = \sum_{i=1}^{\theta} x_i, 1θ<n1 \leq \theta < n, and λ1,λ2>0\lambda_1,\lambda_2 > 0.

Part (b)

Derive the conditional posterior distributions necessary to carry out Gibbs sampling for this change-point model.

Solution

First, we find the conditional posterior distribution for λ1\lambda_1 by simply dropping those factors in the posterior that are not a function of λ1\lambda_1,

f(λ1θ,λ2,x)λ1t(θ)+2e(θ+1)λ1, f(\lambda_1|\theta,\lambda_2,\vec{x}) \propto \lambda_1^{t(\theta) + 2} e^{-(\theta+1) \lambda_1},

which is the kernel for GAM(t(θ)+1,θ+1)\mathrm{GAM}(t(\theta)+1,\theta+1),

λ1GAM(t(θ)+1,θ+1). \lambda_1 \sim \mathrm{GAM}(t(\theta)+1,\theta+1).

Next, we find the conditional posterior distribution for λ2\lambda_2 using the same argument,

f(λ2θ,λ1,x)λ2t(n)t(θ)+2e(nθ+1)λ2, f(\lambda_2|\theta,\lambda_1,\vec{x}) \propto \lambda_2^{t(n)-t(\theta)+2} e^{-(n-\theta+1) \lambda_2},

which is the kernel for GAM(t(n)t(θ)+1,nθ+1)\mathrm{GAM}(t(n)-t(\theta)+1,n-\theta+1),

λ2GAM(t(n)t(θ)+1,nθ+1). \lambda_2 \sim \mathrm{GAM}(t(n) - t(\theta) + 1,n-\theta+1).

Finally, we find the conditional posterior distribution for θ\theta. First, we drop factors that are not a function of θ\theta,

f(θλ1,λ2,x)λ1t(θ)eθλ1λ2t(θ)eθλ2. f(\theta|\lambda_1,\lambda_2,\vec{x}) \propto \lambda_1^{t(\theta)} e^{-\theta \lambda_1} \lambda_2^{-t(\theta)} e^{\theta \lambda_2}.

Simplifying,

f(θλ1,λ2,x)(λ1λ2)t(θ)e(λ2λ1)θI(θ{1,,n1}), f(\theta|\lambda_1,\lambda_2,\vec{x}) \propto \left(\frac{\lambda_1}{\lambda_2}\right)^{t(\theta)} e^{(\lambda_2 - \lambda_1) \theta} I(\theta \in \{1,\ldots,n-1\}),

which is the kernel of a probability mass function parameterized by λ1\lambda_1 and λ2\lambda_2.

To generate random variates from the conditional posterior of θ\theta, we sample from 1,,n1{1,\ldots,n-1} where θ=k\theta=k is realized with probability f(kλ1,λ2,x)f(k|\lambda_1,\lambda_2,\vec{x}). Algorithmically, let UUNIF(0,1)U \sim \mathrm{UNIF(0,1)} and then θ\theta realizes kk if

θ=1k1f(θ)<Uθ=1kf(θ). \sum_{\theta=1}^{k-1} f(\theta|\cdot) < U \leq \sum_{\theta=1}^{k} f(\theta|\cdot).

Additional analysis

Note that θ\theta has a prior that only assigns non-zero values to θ1,,n1\theta \in {1,\ldots,n-1} where n=112n=112 (there are 112112 data rows in the ``coal.txt’’ data file). Thus, given a prior that assigns 00 probability to the outcome θ=n\theta=n, the conditional posterior distribution for θ\theta also assigns zero probability to θ=n\theta=n. Intuitively, it makes sense that no amount of evidence is sufficient to overcome a prior of zero probability.

Furthermore, recall that θ\theta is the change-point. If the change-point occurs at θ=n\theta=n, then in fact there was no change-point and XiPOI(λ1)X_i \sim \mathrm{POI}(\lambda_1) for i=1,,ni=1,\ldots,n. In other words, assigning θ=n\theta=n a prior probability of 00 is equivalent to claiming there is a change point.

We may generalize this so that if we are given prior information that θK\theta \notin \mathbb{K}, we may assign zero probability to that event in the prior, i.e., kKfθ(k)=0\sum_{k \in \mathbb{K}} f_{\theta}(k) = 0. Of course, we may assign a non-uniform prior over the support, as well. The posterior distribution fθf_{\theta|\cdot} from a previous experiment is a likely candidate for such a prior.

Part (c)

Implement the Gibbs sampler. Use a suite of convergence diagnostics to evaluate the convergence and mixing of your sampler.

Solution

We load the data with:

x <- read.table("coal.txt",header=T)[,2]
N <- length(x)

We implement the conditional samplers with:

t <- function(theta) { sum(x[1:theta]) }
rlambda1 <- function(n,theta) { rgamma(n,t(theta)+1,theta+1) }
rlambda2 <- function(n,theta) { rgamma(n,t(N)-t(theta)+1,N-theta+1) }

ktheta <- Vectorize(function(theta,lambda1,lambda2)
{
  if (theta < 1 || theta >= N) { return(0) }  
  (lambda1/lambda2)^t(theta)*exp(theta*(lambda2-lambda1))
},"theta")

ptheta <- function(theta,lambda1,lambda2)
{
  Z <- sum(ktheta(1:(N-1),lambda1,lambda2))
  ktheta(theta,lambda1,lambda2)/Z
}

rtheta = function(n,lambda1,lambda2)
{ 
  sample(x=1:(N-1),n,replace=T,prob=ptheta(1:(N-1),lambda1,lambda2)) 
}

rlambda1.prior <- function() { rgamma(1,shape=3,rate=1) }
rlambda2.prior <- function() { rgamma(1,shape=3,rate=1) }
rtheta.prior <- function() { sample(1:(N-1),1,replace=T) }

We implement the Gibbs sampling with:

gibbs <- function(n,burn=1000)
{
  nn <- n+burn
  thetas <- matrix(nrow=nn,ncol=3)
  thetas[1,] <- c(rtheta.prior(),rlambda1.prior(),rlambda2.prior())
  
  for (i in 1:(nn-1))
  {
    theta.new <- rtheta(1,thetas[i,2],thetas[i,3])
    lambda1.new <- rlambda1(1,theta.new)
    lambda2.new <- rlambda2(1,theta.new)
    thetas[i+1,] <- c(theta.new,lambda1.new,lambda2.new)
  }

  thetas <- thetas[(burn+1):nn,]
  theta.est <- mean(thetas[,1])
  lambda1.est <- mean(thetas[,2])
  lambda2.est <- mean(thetas[,3])
  list(theta.dist=thetas,
       theta.est=theta.est,
       lambda1.est=lambda1.est,
       lambda2.est=lambda2.est)
}
n <- 5000
burn <- 0
res <- gibbs(n,burn)
param.est <- c(res$theta.est,res$lambda1.est,res$lambda2.est)
names(param.est) <- c("theta","lambda1","lambda2")
knitr::kable(data.frame(param.est))
param.est
theta40.0884000
lambda13.0650183
lambda20.9216624

Now, we plot the marginals for the estimator.

par(mfrow=c(1,3))
hist(res$theta.dist[,1],freq=F,main="theta",xlab="",breaks=50)
hist(res$theta.dist[,2],freq=F,main="lambda1",xlab="",breaks=50)
hist(res$theta.dist[,3],freq=F,main="lambda2",xlab="",breaks=50)

plot of chunk unnamed-chunk-22

plot(res$theta.dist[1:100,1],type="l",main="",ylab="theta")
plot(res$theta.dist[1:100,2],type="l",main="initial sample path",ylab="lambda1")
plot(res$theta.dist[1:100,3],type="l",main="",ylab="lambda2")

plot of chunk unnamed-chunk-22

plot(res$theta.dist[2000:2100,1],type="l",main="",ylab="theta")
plot(res$theta.dist[2000:2100,2],type="l",main="later sample path",ylab="lambda1")
plot(res$theta.dist[2000:2100,3],type="l",main="",ylab="lambda2")

plot of chunk unnamed-chunk-22

acf(res$theta.dist[,1],main="theta")
acf(res$theta.dist[,2],main="lambda1")
acf(res$theta.dist[,3],main="lambda2")

plot of chunk unnamed-chunk-22

They all satisfy normality (approximately symmetric around the mean), converge quickly, exhibit low autocorrelation (ACF decays quickly), and show good support (they vigorously jiggle around the mean).

Problem 4

Compare bootstrapped CIs for the population 90th percentile to the large sample estimate as in the notes for (a) EXP(1)\mathrm{EXP}(1) data, (b) N(0,1)N(0,1) data, (c) U(0,1)U(0, 1) data, and (d) χ2(1)\chi^2(1) data. For sample sizes of n=100n = 100 and replicate B=500B = 500.

Solution

Suppose we have a sample Xii.i.d.F{X_i} i.i.d. F where FF is the cdf. An estimate of FF is given by

Fn(x)=1ni=1nI(xix). F_n(x) = \frac{1}{n} \sum_{i=1}^{n} I(x_i \leq x).

Assuming FF is continuous, the pp-th \emph{quantile} of a distribution FF is given by

q=F1(p) q = F^{-1}(p)

which may be estimated with $q_n = F^{-1}n(p) = x{[n p]}$.

Delta method

Asymptotically,

qnAN(q,p(1p)nf2(q)). q_n \sim \mathrm{AN}\left(q, \frac{p(1-p)}{n f^2(q)}\right).

Since the pdf ff is not known, we must estimate it with fnf_n (note that we cannot simply take the derivative of FmF_m since it is a step function). We use the built-in R function density\mathrm{density} to estimate ff.

Using the asymptotic normality, we see that the variance of our estimator qnq_n may be estimated as

Var(qn)=p(1p)nfn2(q) \mathrm{Var}(q_n) = \frac{p(1-p)}{n f_n^2(q)}

and thus an (1α)×100(1-\alpha) \times 100% confidence interval is given by

qn±z1α/2p(1p)nfn2(qn). q_n \pm z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n f_n^2(q_n)}}.

We implement the confidence interval estimator for qnq_n with the following:

q.ci.delta <- function(samp,p,alpha,bw="bcv")
{
  z <- qnorm(1-alpha/2)
  q.est <- quantile(samp,p)
  f <- density(samp,bw=bw,n=1,from=q.est,to=q.est)$y
  se <- sqrt(p*(1-p)/(n*f^2))
  list(estimate=q.est,p=p,alpha=alpha,ci=c(q.est-z*se,q.est+z*se))
}

Bootstrap method

Next, we consider the Bootstrap method. The statistic for the Bootstrap method, the pp-th quantile, is given by:

# p-quantile statistic that we provide as input to the
# bootstrap function
q.stat <- function(p=0.9)
{
  function(x,indices)
  {
    f <- quantile(x[indices],p)
    names(f) <- NULL
    f
  }
}

We implement the Bootstrap confidence interval estimator for qnq_n with the following:

q.ci.bs <- function(samp,p,alpha,B)
{
  library(boot)
  b <- boot(samp, q.stat(p), B)
  ci <- boot.ci(b,conf=1-alpha,type="perc")$percent
  list(estimate=quantile(samp,p),p=p,alpha=alpha,ci=c(ci[4],ci[5]))
}

We simply sample from the empirical distribution (resample the observed sample) and apply the quantile statistic to each resample, geting a sampling distribution

qn(1),,qn(B) q_n^{(1)},\ldots,q_n^{(B)}

and then pick the pp-th percentile.

Part (a)

Compute coverage probabilities of the two intervals and average interval length. (You need to run the intervals for MM times.)

Solution

To capture the data for each random variable, we encapsulate the process into a procedure that may be invoked for any function that models a random variable and whose first parameter denotes the sample size.

#' Retrieve CI stats for the specified method and random variable
#'
#' @param q.star the true value for the p-th quantile
#' @param rv the random variable generator, of type n -> R^n
#' @param n size of sample to generate
#' @param M number of trials
#' @param type "bootstrap" or "delta"
#' @param B number of bootstrap replicates (only relevant if type == "bootstrap")
#' @returns a vector of the relevant results
ci.stats <- function(q.star, rv, p, n, M=250, type="bootstrap", B=500)
{
  coverage.prop <- 0
  avg.length <- 0

  for (i in 1:M)
  {
    samp <- rv(n)
    res <- NULL
    if (type=="bootstrap") { res <- q.ci.bs(samp,p,alpha,B) }
    else                   { res <- q.ci.delta(samp,p,alpha) }
    
    avg.length <- avg.length + (res$ci[2] - res$ci[1])
    if (q.star >= res$ci[1] && q.star <= res$ci[2]) {
      coverage.prop <- coverage.prop + 1
    }
  }
  avg.length <- avg.length / M
  coverage.prop <- coverage.prop / M

  c(q.star,res$estimate,coverage.prop,avg.length)
}

Now, we use ci.stats\mathrm{ci.stats} to generate and compute the data:

p <- 0.9 # p-th quantile
alpha <- 0.05 # CI alpha level
n <- 100 # sample size
B <- 500 # bootstrap replicates
M <- 500

# simulate and compute the data
tab <- matrix(nrow=8,ncol=4)
rownames(tab) <- c("exp(1): delta",
                   "exp(1): bootstrap",
                   "N(0,1): delta",
                   "N(0,1): bootstrap",
                   "U(0,1): delta",
                   "U(0,1): bootstrap",
                   "X^2(1): delta",
                   "X^2(1): bootstrap")
colnames(tab) <- c("q.star", "q.est", "coverage", "average length")
tab[1,] <- ci.stats(qexp(p),rexp,p,n,M,"delta")
tab[2,] <- ci.stats(qexp(p),rexp,p,n,M,"bootstrap",B)
tab[3,] <- ci.stats(qnorm(p),rnorm,p,n,M,"delta")
tab[4,] <- ci.stats(qnorm(p),rnorm,p,n,M,"bootstrap",B)
tab[5,] <- ci.stats(qunif(p),runif,p,n,M,"delta")
tab[6,] <- ci.stats(qunif(p),runif,p,n,M,"bootstrap",B)
tab[7,] <- ci.stats(qchisq(p,df=1),function(n) { rchisq(n,df=1) },p,n,M,"delta")
tab[8,] <- ci.stats(qchisq(p,df=1),function(n) { rchisq(n,df=1) },p,n,M,"bootstrap",B)

Part (b)

Summarizing your results in a table. Comment on your findings. Which is better?

Solution

knitr::kable(data.frame(tab))
q.starq.estcoverageaverage.length
exp(1): delta2.3025852.34828750.8761.0330318
exp(1): bootstrap2.3025851.83523200.9381.1304459
N(0,1): delta1.2815521.31457020.9240.6211391
N(0,1): bootstrap1.2815521.11136860.9440.6633263
U(0,1): delta0.9000000.89501800.9700.1482522
U(0,1): bootstrap0.9000000.88921060.9380.1217511
X^2(1): delta2.7055442.78348840.8581.5984722
X^2(1): bootstrap2.7055442.09845500.9441.8483615

Overall, the Bootstrap estimator of CI is more compatible with the claimed confidence interval.

Since both are asymptotically normal, if we were to increase the sample size, they should converge to the coverage (1α)=0.95(1-\alpha) = 0.95 with an expected length given by

2z1α/2σq 2 z_{1-\alpha/2} \sigma_q

where

σq=p(1p)nf2(q). \sigma_q = \sqrt{\frac{p(1-p)}{n f^2(q^*)}}.

Supplementary Material

Problem 1 supplementary material: inverse transform method

In problem 1, YY is the double exponential random variable and we sampled from it by recognizing we can simply sample from the exponential and taking its negative with probability 0.50.5.

Now, we consider the inverse transform method. We are given the kernel gg, but for the inverse transform method we need the cdf. The normalizing constant ZZ is given by

1Zg(x)dx=1, \frac{1}{Z} \int_{-\infty}^{\infty} g(x) dx = 1,

which may be rewritten as

0exdx+0exdx=Z, \int_{-\infty}^{0} e^{x} dx + \int_{0}^{\infty} e^{-x} dx = Z,

which has the solution Z=2Z=2. Thus, random variable YY that has a kernel gg has a density

fY(y)=12exp(x). f_Y(y) = \frac{1}{2} \exp(-|x|).
g.density <- function(y) { 0.5*exp(-abs(y)) }

The cdf FYF_Y is defined as

FY(y)=12yexp(s)ds. F_Y(y) = \frac{1}{2}\int_{-\infty}^{y} \exp(-|s|) ds.

If y0y \leq 0, then

FY(y)=12yexp(s)ds=12exp(y) F_Y(y) = \frac{1}{2} \int_{-\infty}^{y} \exp(s) ds = \frac{1}{2}\exp(y)

and if y>0y > 0, then

FY(y)=FY(0)+120yexp(s)ds=12+12(1exp(y)), F_Y(y) = F_Y(0) + \frac{1}{2} \int_{0}^{y} \exp(-s) ds = \frac{1}{2} + \frac{1}{2}(1 - \exp(-y)),

which may be written as the piece-wise function

FY(y)={12exp(y)y0112exp(y)y>0. F_Y(y) = \begin{cases} \frac{1}{2} \exp(y) & y \leq 0\\ 1 - \frac{1}{2} \exp(-y) & y > 0. \end{cases}

Now, we find the inverse of FYF_Y. Over y0y \leq 0, the inverse of FYF_Y is given by

p=12exp(y). p = \frac{1}{2} \exp(y).

Solving for yy, we obtain y=log(2p)y = \log(2p). Note that given y0y \leq 0, p(0,0.5]p \in (0,0.5]. Over y>0y > 0, the inverse of FYF_Y is given by

p=112exp(y). p = 1 - \frac{1}{2} \exp(-y).

Solving for yy, we obtain

y=log(22p) y = -\log(2-2p)

where p(0.5,1)p \in (0.5,1). We thus have a piecewise function

F1(p)={log(2p)p(0,0.5]log(22p)p(0.5,1). F^{-1}(p) = \begin{cases} \log(2p) & p \in (0,0.5]\\ -\log(2-2p) & p \in (0.5,1). \end{cases}

Thus, to sample from YY, we observe uu from U(0,1)U(0,1) and, if u0.5u \leq 0.5, we let y=log(2u)y = \log(2u) and otherwise let y=log(22u)y = -\log(2-2u).

We implement this sampler with:

ry.itm <- function(n)
{
  ys <- numeric(n)
  for (i in 1:n)
  {
    u <- runif(1)
    if (u <= 0.5) { ys[i] <- log(2*u) }
    else { ys[i] <- -log(2-2*u)}
  }
  ys
}

Inverse transform method for sampling from random variable with kernel gg

First, we can use the inverse-transform method. The kernel of gg is exp(x)\exp(-|x|), and so the normalizing constant ZZ is given by

1Zg(x)dx=1, \frac{1}{Z} \int_{-\infty}^{\infty} g(x) dx = 1,

which may be rewritten as

0exdx+0exdx=Z, \int_{-\infty}^{0} e^{x} dx + \int_{0}^{\infty} e^{-x} dx = Z,

which has the solution Z=2Z=2. Thus, random variable YY that has a kernel gg has a density

fY(y)=12exp(x). f_Y(y) = \frac{1}{2} \exp(-|x|).
g.density <- function(x) { 0.5*exp(-abs(x)) }

The cdf FYF_Y is defined as

FY(y)=12yexp(s)ds. F_Y(y) = \frac{1}{2}\int_{-\infty}^{y} \exp(-|s|) ds.

If y0y \leq 0, then

FY(y)=12yexp(s)ds=12exp(y) F_Y(y) = \frac{1}{2} \int_{-\infty}^{y} \exp(s) ds = \frac{1}{2}\exp(y)

and if y>0y > 0, then

FY(y)=FY(0)+120yexp(s)ds=12+12(1exp(y)), F_Y(y) = F_Y(0) + \frac{1}{2} \int_{0}^{y} \exp(-s) ds = \frac{1}{2} + \frac{1}{2}(1 - \exp(-y)),

which may be written as the piece-wise function

FY(y)={12exp(y)y0112exp(y)y>0. F_Y(y) = \begin{cases} \frac{1}{2} \exp(y) & y \leq 0\\ 1 - \frac{1}{2} \exp(-y) & y > 0. \end{cases}

Now, we find the inverse of FYF_Y. Over y0y \leq 0, the inverse of FYF_Y is given by

p=12exp(y). p = \frac{1}{2} \exp(y).

Solving for yy, we obtain y=log(2p)y = \log(2p). Note that given y0y \leq 0, p(0,0.5]p \in (0,0.5]. Over y>0y > 0, the inverse of FYF_Y is given by

p=112exp(y). p = 1 - \frac{1}{2} \exp(-y).

Solving for yy, we obtain

y=log(22p) y = -\log(2-2p)

where p(0.5,1)p \in (0.5,1). We thus have a piecewise function

F1(p)={log(2p)p(0,0.5]log(22p)p(0.5,1). F^{-1}(p) = \begin{cases} \log(2p) & p \in (0,0.5]\\ -\log(2-2p) & p \in (0.5,1). \end{cases}

Thus, to sample from YY, we observe uu from U(0,1)U(0,1) and, if u0.5u \leq 0.5, we let y=log(2u)y = \log(2u) and otherwise let y=log(22u)y = -\log(2-2u).

ry <- function(n)
{
  ys <- numeric(n)
  for (i in 1:n)
  {
    u <- runif(1)
    if (u <= 0.5) { ys[i] <- log(2*u) }
    else { ys[i] <- -log(2-2*u)}
  }
  ys
}