Problem 1
Consider the following integration
Part (a)
Evaluate the integral in closed form using .
Solution
The integrand is the kernel of a normal density, and so we evaluate the integral by transforming it into a problem recognizable as an expectation where , i.e.,
By basic pattern matching, we see that integrand is the kernel of .Substituting with , we obtain
which is equivalent to .
Part (b)
Estimate the above integral using Riemman’s Rule. Give a estimate of . Does it at least provide a couple digits worth of accuracy?
Solution
The right-handed Riemann sum is given by
where , which we straightforwardly implement with:
right_riemann_sum <- function(f, a, b, n) {
h <- (b - a)/n
h * sum(f(a + (1:n) * h))
}
We use this numerical integrator to approximate the integration problem whose solution is and then take its square to estimate , i.e.,
where , , , is the domain to numerically integrate over, and is the number of blocks in the partition. We implement this estimator with the following code:
riemann_pi <- Vectorize(function(n, r) {
right_riemann_sum(function(x) {
exp(-x^2)
}, -r, r, n)^2
})
We provide an estimate of with:
riemann_pi(10, 4)
## [1] 3.141595
The estimator provides decimals of accuracy based on Riemman’s rule with over . This provides unusually good accuracy for the right-handed Riemann rule due to the fact that the integrand is symmetric about the origin and thus the overestimate of the integral over is canceled by the underestimate over .
There are more efficient and more accurate ways to estimate , but in theory, if we ignore numerical errors on the computer,
One may have the insight that, since is symmetric about the origin, we can just sum over instead, which would obtain . Let us try:
4 * right_riemann_sum(function(x) {
exp(-x^2)
}, 0, 4, 10)^2
## [1] 1.88363
This is a very poor estimate of . Let us try with :
4 * right_riemann_sum(function(x) {
exp(-x^2)
}, 0, 4, 1e+05)^2
## [1] 3.141451
Remarkably, it still performs relatively poorly compared to .
Part (c)
Redo part (b) to estimate using Gauss-Hermite quadrature. You may use the fastGHQuad
function in R.
Solution
library(fastGHQuad)
hermite_pi <- Vectorize(function(n) {
aghQuad(function(x) {
exp(-x^2)
}, 0, 1.1, gaussHermiteData(n))^2
})
pi.hat.herm <- hermite_pi(10)
pi.hat.herm
## [1] 3.14025
We see that the estimator $\hat{\pi}{\mathrm{hermite}} = 3.14025n=10\hat{\pi}{\mathrm{riemann}}$. Let us find
and
Here is the code that finds the argument that minimizes these estimators, along with their respective errors:
arg.min <- function(xs, f) {
y.min <- Inf
x.min <- NULL
for (x in xs) {
y <- f(x)
if (y < y.min) {
x.min <- x
y.min <- y
}
}
list(x.min = x.min, y.min = y.min)
}
Now, we try it:
hermite.min <- arg.min(10:100, function(n) {
abs(hermite_pi(n) - pi)
})
riemann4.min <- arg.min(10:100, function(n) {
abs(riemann_pi(n, 4) - pi)
})
riemann5.min <- arg.min(10:100, function(n) {
abs(riemann_pi(n, 5) - pi)
})
riemann6.min <- arg.min(10:100, function(n) {
abs(riemann_pi(n, 6) - pi)
})
print(hermite.min)
## $x.min
## [1] 46
##
## $y.min
## [1] 4.440892e-16
print(riemann4.min)
## $x.min
## [1] 100
##
## $y.min
## [1] 1.002528e-07
print(riemann5.min)
## $x.min
## [1] 100
##
## $y.min
## [1] 1.046851e-11
print(riemann6.min)
## $x.min
## [1] 24
##
## $y.min
## [1] 4.440892e-16
We see that, likely due to numerical errors in the default implementation of numbers in R, $\hat{\pi}{\rm{hermite}}n=45\hat{\pi}{\mathrm{riemann}}n=24r=6$. Moreover,
Interesting.
Problem 2
Use Monte Carlo simulation to evaluate the confidence (coverage) level of CI for regression slope in the model
In each Monte Carlo sample, first generate a vector of (you may pick from any distribution, say a normal or a uniform). Then generate from and then according to the regression formula.
Use to fit the regression model, and to get the confidence interval for the slope parameter. Run the MC iterations for times and get the proportion of CI that covers the true slope . Verify the proportion is close to .
Solution
N <- 10000
covers <- 0
n <- 1000
for (i in 1:N) {
x <- runif(n, -100, 100)
e <- rnorm(n)
y <- 3 * x + e
fit <- lm(y ~ x)
ci <- confint(fit)
if (ci[2] <= 3 && 3 <= ci[4])
covers <- covers + 1
}
prop <- covers/N
prop
## [1] 0.9513
As we can see, the proportion of confidence intervals that cover the true parameter value is approximately .
Problem 3
Let and the conditional distribution of given is , where and .
Part (a)
Derive the marginal pdf of .
Solution
First, we compute the joint pdf of and , which is just
which is
The marginal pdf is computed by summing over ,
which is just
Clearly, this is a simple Gaussian mixture model.
Part (b)
Use iterated expectation and variance to find and exactly.
Solution
Using iterated expectation, we obtain
Using iterated variance, we obtain
Part (c)
Obtain a Monte Carlo sample of size . Use this sample to compute (i) , (ii) , (iii) -th percentile of .
Solution
First, we perform a MC simulation to obtain a sample from :
m <- 10000
p <- 0.7
mu_0 <- -2
mu_1 <- 2
xs <- vector(length = m)
ys <- rbinom(m, 1, p)
for (i in 1:m) {
if (ys[i] == 0)
xs[i] <- rnorm(1, mu_0) else xs[i] <- rnorm(1, mu_1)
}
Now, we just apply the necessary statistics to the sample to estimate the parameters.
Part (i)
The parameter is estimated to be:
mean(xs)
## [1] 0.7971801
Part (ii)
The parameter is estimated to be:
var(xs)
## [1] 4.353838
Part (iii)
The percentile is estimated to be:
quantile(xs, c(0.9))
## 90%
## 3.07488