Problem 1
Derive the E-M algorithm for right-censored normal data with known variance, say . Consider ’s that are i.i.d. from a , . We observe and , where , and . Let be the total number of censored (incomplete) observations. We denote the missing data as .
Part (a)
Derive the complete log-likelihood, .
Solution
The unobserved random variates are i.i.d. normally distributed,
where
The likelihood function is therefore
Taking the logarithm of ,
Anticipating that we will be maximizing the complete log-likelihood with respect to , we put any terms that are not a function of into , obtaining the result
Part (b)
Show the conditional expectation
and
where and are pdf and cdf of standard normal.
Solution
The distribution of given , is uncensored and therefore it is given that realized the value . Since the expectation of a constant is , that means .
If , is censored, i.e., . To take its expectation, we first need to derive the conditional conditional distribution of given and .
The probability is given by
which may be rewritten as
where is the cdf of the normal distribution with and .
We may rewrite in terms of the standard normal,
and thus we may rewrite the conditional distribution of as
and thus after further simplifying, we obtain the cdf of ,
which has a density given by
The expectation of is given by
Analytically, this is a tricky integration problem. Certainly, it would be trivial to numerically integrate this to obtain a solution, but we seek a closed-form solution.
I searched online, and discovered an interesting way to tackle this integration problem.
Let and respectively denote the pdf and cdf of the normally distributed . Then,
and
Then,
We may rewrite the last line as
Part (c)
Derive the -step and -step using parts (a) and (b). Give the updating equation.
Solution
E-step
The -step entails taking the conditional expectation of the complete log-likelihood function given the observed data and .
We have already solved the expectation of given and . We rewrite by substituting with its previously found solution,
Letting , , and separating out all terms that are independent of ,
M-step
We wish to solve
by solving
which may be written as
Solving for obtains the updating equation
where
and
Part (d)
Use your algorithm on the V.A. data to find the MLE of . Take the log of the event times first and standardize by sample standard deviation. You may simply use the censored data sample mean as your starting value.
Solution
In the following R code, we implement the updating equation derived in the previous step. We encapulsate the procedure into a function that takes its arguments in the form of a censored set, uncensorted set, starting value (), and an value to control stopping condition.
# assuming the uncensored and censored data are distributed normally,
# we use the EM algorithm to derive an estimator given censored and uncensored
# data.
mean_normal_censored_estimator_em <- function(uncensored,censored,theta,eps=1e-6,debug=T)
{
dev <- sd(log(c(uncensored,censored)))
censored <- log(censored) / dev
uncensored <- log(uncensored) / dev
theta <- log(theta) / dev
n <- length(censored) + length(uncensored)
C <- length(censored)
R <- sum(uncensored)
s <- function(theta)
{
sum <- 0
for (i in 1:C)
{
num <- dnorm(censored[i],mean=theta,sd=1)
denom <- 1-pnorm(censored[i],mean=theta,sd=1)
sum <- sum + (num / denom)
}
sum
}
i <- 1
repeat
{
theta.new <- R/n + C/n * theta + (1/n)*s(theta)
if (debug==T) { cat("theta[", i, "] =",theta,", theta[", i+1, "] =",theta.new,"\n") }
if (abs(theta.new - theta) < eps)
{
theta <- theta.new * dev
theta <- exp(theta)
return(theta)
}
i <- i + 1
theta <- theta.new
}
}
We apply this procedure to the indicated data set.
library(MASS) # has VA data
VAs <- subset(VA,prior==0)
censored <- VAs$status == 0
censored_xs <- VAs[censored,c("stime")]
uncensored_xs <- VAs[!censored,c("stime")]
mu <- mean(uncensored_xs)
cat("mean of the uncensored sample is ", mu, ".")
## mean of the uncensored sample is 112.1648 .
sol <- mean_normal_censored_estimator_em(uncensored_xs,censored_xs,mu)
## theta[ 1 ] = 3.857928 , theta[ 2 ] = 3.424258
## theta[ 2 ] = 3.424258 , theta[ 3 ] = 3.415443
## theta[ 3 ] = 3.415443 , theta[ 4 ] = 3.415286
## theta[ 4 ] = 3.415286 , theta[ 5 ] = 3.415283
## theta[ 5 ] = 3.415283 , theta[ 6 ] = 3.415283
sol
## [1] 65.2625
We see that our estimate of is . (The before transforming it to the appropriate scale was .)
This mean is somewhat lower than anticipated, which makes me suspect something is wrong with my updating equation. If I have the time, I will revisit it.
Problem 2

Solution
Part (a)
There are gay men in the survey sample where denotes the -th persons response to the number of risky sexual encounters he had in the previous days. Thus, we observe a sample .
We assume there are groups in the population, denoted by , , and . Group members report risky sexual encounters regardless of the truth where the probability of being a member of group is denoted by ,
Group members accurately report risky sexual encounters and represent typical behavior where the probability of being a member of group is denoted by . We assume this group’s number of sexual encounters follows a poisson with mean .
Group members accurately report risky sexual encounters and represent high-risk behavior where the probability of being a member of group is . We assume this group’s number of sexual encounters follows a poisson with mean .
This represents a finite mixture model with a pdf
with a parameter vector
Let the uncertain group that the -th person belongs to be denoted by . If we observe group membership data, , then
where
and thus
The complete likelihood function is thus given by
which may be rewritten as
We wish to rewrite this so that the data is explicitly represented. First, we do the transformation
We let denote the (unobserved) cardinality of , thus
is the complete likelihood. The complete log-likelihood is thus
which simplies to
Anticipating taking to solve for the maximum of the log-likelihood, we remove any terms that are not a function of , resulting in the kernel
E-step
The conditional expectation to solve in the EM algorithm is given by
where are random and and are given. We rewrite this as
Using the linearity of expectations, we rewrite the above to
given and .
Consider . To solve this expectation, we must first derive the distribution of .
Suppose , then probability that the -th person belongs to group is given by
We note that is equivalent to , is the definition of , and is .
Making the substitutions yields the result
Assuming are i.i.d., observe that , the distribution of given , is binomial distributed with a probability of success . Thus,
The same logic holds for and , and thus
and
which means
M-step
We wish to solve
by solving
We use the Lagrangian to impose the restriction , thus we seek to perform the constrained maximization of
Thus, when we solve for ,
we get the result
Similar results hold for and , obtaining
and
This does not look too promising until we realize that
Thus, , which means since . Making this substitution obtains the result
Solving an estimator for at iteration ,
The same derivation essentially follows for , and thus
Part (b)
Estimate the parameters of the model, using the observed data.
Solution
# we observe n = (n0,n1,...,n16)
ns <- c(379,299,222,145,109,95,73,59,45,30,24,12,4,2,0,1,1)
N <- sum(ns)
# theta := (alpha, beta, mu, lambda)'
# note that there is an implicit parameter gamma s.t.
# alpha + beta + gamma = 1
# the initial value assumes each category z, t, or p
# is equally probable, and so we let
# (alpha^(0),beta^(0)) = (1/3,1/3)
# and mu^(0) and lambda^(0) are just arbitrarily chosen to be 2 and 3,
# with the insight that group 3 is more risky than group 2.
theta <- c(1/3,1/3,2,3)
# theta := (alpha, beta, mu, lambda)
Pi <- function(i,theta)
{
res <- 0
if (i == 0)
res <- theta[1]
res <- res + theta[2] * theta[3]^i * exp(-theta[3])
res <- res + (1 - theta[1] - theta[2]) * theta[4]^i * exp(-theta[4])
res
}
z0 <- function(theta)
{
theta[1] / Pi(0,theta)
}
t <- function(i,theta)
{
theta[2] * theta[3]^i * exp(-theta[3]) / Pi(i,theta)
}
p <- function(i,theta)
{
(1-theta[1] - theta[2]) * theta[4]^i * exp(-theta[4]) / Pi(i,theta)
}
# update algorithm, based on EM algorithm
update <- function(theta,ns)
{
# note: n0 := ns[1] instead of ns[0] since R does not use zero-based indexes
alpha <- ns[1] * z0(theta) / N
beta <- 0
mu_num <- 0
mu_denom <- 0
lam_num <- 0
lam_denom <- 0
for (i in 0:16)
{
ti <- t(i,theta)
pi <- p(i,theta)
beta <- beta + ns[i+1] * ti
mu_num <- mu_num + i * ns[i+1] * ti
mu_denom <- mu_denom + ns[i+1] * ti
lam_num <- lam_num + i * ns[i+1] * pi
lam_denom <- lam_denom + ns[i+1] * pi
}
beta <- beta / N
mu <- mu_num / mu_denom
lam <- lam_num / lam_denom
c(alpha,beta,mu,lam)
}
em <- function(theta,ns,steps=10000,debug=T)
{
for(i in 1:steps)
{
theta = update(theta,ns)
if (debug==T)
{
if (i %% 1000 == 0) { cat("iteration =",i," theta = (",theta,")'\n") }
}
}
theta
}
# solution theta = (alpha, beta, mu, lambda)
sol <- em(theta,ns,10000,T)
## iteration = 1000 theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 2000 theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 3000 theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 4000 theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 5000 theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 6000 theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 7000 theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 8000 theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 9000 theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 10000 theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
We see that the solution is .
Part (c)
Estimate the standard errors and pairwise correlations of your parameters, using any available method.
Solution
We have chosen to use the Bootstrap method.
# ns = (379,299,222,145,109,95,73,59,45,30,24,12,4,2,0,1,1)
# 379 responded 0 encounters
# 299 responded 1 encounters
# 222 responded 2 encounters
# ...
# 1 responded 16 encounters
#
# to resample, we resample from the data set that includes each
# persons response, as determined by ns.
data <- NULL
for (i in 1:length(ns))
{
data <- append(data,rep((i-1),ns[i]))
}
make_into_counts <- function(data)
{
ns <- NULL
for (i in 0:16)
{
ni <- data[data == i]
l <-length(ni)
ns <- append(ns,l)
}
ns
}
m <- 1000 # bootstrap replicates
em_steps <- 100
theta.bs <- em(theta,ns,em_steps,F)
thetas <- rbind(theta.bs)
for (i in 2:m)
{
indices <- sample(N,N,replace=T)
resampled <- make_into_counts(data[indices])
theta.bs <- em(theta,resampled,em_steps,F)
thetas <- rbind(thetas,theta.bs)
if (i %% 100 == 0)
{
cat("iteration ", i, "\n")
print(cov(thetas))
}
}
## iteration 100
## [,1] [,2] [,3] [,4]
## [1,] 0.0004197254 -0.0001718468 0.0017266699 0.001591205
## [2,] -0.0001718468 0.0005295261 0.0002895402 0.001980314
## [3,] 0.0017266699 0.0002895402 0.0131179548 0.015108087
## [4,] 0.0015912053 0.0019803141 0.0151080872 0.047009875
## iteration 200
## [,1] [,2] [,3] [,4]
## [1,] 0.0003864557 -1.678292e-04 1.590662e-03 0.00154358
## [2,] -0.0001678292 4.671077e-04 9.764038e-05 0.00133111
## [3,] 0.0015906622 9.764038e-05 1.228480e-02 0.01297749
## [4,] 0.0015435804 1.331110e-03 1.297749e-02 0.03894848
## iteration 300
## [,1] [,2] [,3] [,4]
## [1,] 0.0003735625 -1.785107e-04 1.479671e-03 0.001351610
## [2,] -0.0001785107 4.678372e-04 6.137843e-06 0.001259075
## [3,] 0.0014796708 6.137843e-06 1.123608e-02 0.011748534
## [4,] 0.0013516098 1.259075e-03 1.174853e-02 0.036388361
## iteration 400
## [,1] [,2] [,3] [,4]
## [1,] 0.0003625489 -0.0001755464 0.0014276201 0.001221715
## [2,] -0.0001755464 0.0004540625 0.0000184334 0.001288138
## [3,] 0.0014276201 0.0000184334 0.0110764497 0.010780774
## [4,] 0.0012217155 0.0012881380 0.0107807741 0.034067537
## iteration 500
## [,1] [,2] [,3] [,4]
## [1,] 0.0003718052 -0.0001842823 0.0014502414 0.001310802
## [2,] -0.0001842823 0.0004847405 0.0001080019 0.001398247
## [3,] 0.0014502414 0.0001080019 0.0114792580 0.011828494
## [4,] 0.0013108018 0.0013982467 0.0118284941 0.036489243
## iteration 600
## [,1] [,2] [,3] [,4]
## [1,] 0.0003791435 -1.814512e-04 1.497719e-03 0.001364054
## [2,] -0.0001814512 4.674232e-04 7.293112e-05 0.001410218
## [3,] 0.0014977193 7.293112e-05 1.157800e-02 0.011903853
## [4,] 0.0013640538 1.410218e-03 1.190385e-02 0.037292937
## iteration 700
## [,1] [,2] [,3] [,4]
## [1,] 0.0003819013 -1.782221e-04 1.521678e-03 0.001410964
## [2,] -0.0001782221 4.546723e-04 8.545081e-05 0.001405131
## [3,] 0.0015216781 8.545081e-05 1.178236e-02 0.012322621
## [4,] 0.0014109640 1.405131e-03 1.232262e-02 0.037738946
## iteration 800
## [,1] [,2] [,3] [,4]
## [1,] 0.0003730646 -0.0001676221 0.0015068905 0.001443218
## [2,] -0.0001676221 0.0004405397 0.0001354924 0.001470624
## [3,] 0.0015068905 0.0001354924 0.0118808619 0.012889971
## [4,] 0.0014432180 0.0014706239 0.0128899710 0.038858338
## iteration 900
## [,1] [,2] [,3] [,4]
## [1,] 0.0003676784 -0.0001699529 0.0014571801 0.001358486
## [2,] -0.0001699529 0.0004554077 0.0001823193 0.001590474
## [3,] 0.0014571801 0.0001823193 0.0117524614 0.012790632
## [4,] 0.0013584862 0.0015904740 0.0127906324 0.039055305
## iteration 1000
## [,1] [,2] [,3] [,4]
## [1,] 0.0003697739 -0.0001652985 0.0014912066 0.001367237
## [2,] -0.0001652985 0.0004523873 0.0002111503 0.001638727
## [3,] 0.0014912066 0.0002111503 0.0120934727 0.013311533
## [4,] 0.0013672372 0.0016387267 0.0133115328 0.039376614
cov.bs <- cov(thetas)
cor.bs <- cor(thetas)
The Bootstrap estimator of the covariance matrix is given by
## [,1] [,2] [,3] [,4]
## [1,] 0.0003697739 -0.0001652985 0.0014912066 0.001367237
## [2,] -0.0001652985 0.0004523873 0.0002111503 0.001638727
## [3,] 0.0014912066 0.0002111503 0.0120934727 0.013311533
## [4,] 0.0013672372 0.0016387267 0.0133115328 0.039376614
and the correlation matrix is given by
## [,1] [,2] [,3] [,4]
## [1,] 1.0000000 -0.40415269 0.70517057 0.3583080
## [2,] -0.4041527 1.00000000 0.09027364 0.3882685
## [3,] 0.7051706 0.09027364 1.00000000 0.6100050
## [4,] 0.3583080 0.38826847 0.61000496 1.0000000
Let’s try using the Hessian of the observed information matrix.
library(numDeriv)
loglike <- function(theta)
{
s <- 0
for (x in data)
{
s <- s + log(theta[1]*as.numeric(x==0) +
theta[2]*dpois(x,theta[3]) +
(1-theta[1]-theta[2])*dpois(x,theta[4]))
}
s
}
mle <- c(0.1221661,0.5625419,1.4674746,5.9388889)
solve(-hessian(loglike,mle))
## [,1] [,2] [,3] [,4]
## [1,] 0.0003799048 -1.909698e-04 1.438556e-03 0.001184057
## [2,] -0.0001909698 4.657702e-04 7.132638e-05 0.001417409
## [3,] 0.0014385560 7.132638e-05 1.111722e-02 0.011376017
## [4,] 0.0011840568 1.417409e-03 1.137602e-02 0.034664940