SIUe - Computational Statistics (STAT 575) - Problem Set 2

Problem 1

Derive the E-M algorithm for right-censored normal data with known variance, say 2=1. Consider Yi’s that are i.i.d. from a N(θ,1), i=1,2,n. We observe (x1,,xn) and (δ1,,δn), where xi=min(yi,c), and δi=I(yi<c). Let C be the total number of censored (incomplete) observations. We denote the missing data as {Zi:δi=0}.

Part (a)

Derive the complete log-likelihood, (θ|Y).

The unobserved random variates {Yi} are i.i.d. normally distributed, YifYi(y|θ) where fYi(y|θ)=(2π)12exp(12(yθ)2).

The likelihood function is therefore L(θ|{yi})=i=1n(2π)12exp(12(yiθ)2)=(2π)n2exp(i=1n12(yiθ)2).

Taking the logarithm of L, (θ|{yi})=logL(θ|{yi})=n2log(2π)12i=1n(yiθ)2=n2log(2π)12i=1nyi2+θi=1nyin2θ2.

Anticipating that we will be maximizing the complete log-likelihood with respect to θ, we put any terms that are not a function of θ into k, obtaining the result (θ|{yi})=k+θi=1nyin2θ2.

Part (b)

Show the conditional expectation E(Y|x,δ=1,θ(t))=x and E(Y|x,δ=0,θ(t))=E(Y|Y>x)=θ(t)+ϕ(xμ)1Φ(xμ) where ϕ and Φ are pdf and cdf of standard normal.

The distribution of Y given δ=1, is uncensored and therefore it is given that Y realized the value x. Since the expectation of a constant x is x, that means E(Y|Y=x)=x.

If δ=0, Y is censored, i.e., Y>x. To take its expectation, we first need to derive the conditional conditional distribution of Y given Y>x and θ(t).

The probability Pr(Yy|Y>x) is given by Pr(Yy|Y>x)=Pr(x<Yy)/Pr(Y>x) which may be rewritten as Pr(Yy|Y>x)=FY(y|θ(t))FY(x|θ(t))1FY(y|θ(t)). where FY|θ(t) is the cdf of the normal distribution with σ=1 and μ=θ(t).

We may rewrite FY|θ(t) in terms of the standard normal, FY(y|θ(t))=Φ(yθ(t)), and thus we may rewrite the conditional distribution of Y|Y>x as Pr(Yy|Y>x)=Φ(yθ(t))Φ(xθ(t))1Φ(xθ(t)) and thus after further simplifying, we obtain the cdf of Y|x, FY|x(y|θ(t))=11Φ(yθ(t))1Φ(xθ(t)) which has a density given by fY(y|x,θ(t))=ϕ(yθ(t))1Φ(xθ(t))I(y>x).

The expectation of Y|(x,θ(t)) is given by E(Y|x,θ(t))=xyfY(y|x,θ(t))dy=xy(ϕ(yθ(t))1Φ(xθ(t)))dy=11Φ(xθ(t))xyϕ(yθ(t))dy.

Analytically, this is a tricky integration problem. Certainly, it would be trivial to numerically integrate this to obtain a solution, but we seek a closed-form solution.

I searched online, and discovered an interesting way to tackle this integration problem.

Let f and F respectively denote the pdf and cdf of the normally distributed Y. Then, dfdy=(yθ)f(y) and abdfdydy=f(b)f(a).

Then, E(Y|x,θ(t))=11F(x)xyf(y)dy=11F(x)x(yθ(t))f(y)dy+θ(t)1F(x)xf(y)dy=11F(x)xdfdydy+θ(t)1F(x)(1F(x))=11F(x)(f()f(x))+θ(t)=f(x)1F(x)+θ(t).

We may rewrite the last line as E(Y|x,θ(t))=θ(t)+ϕ(xθ(t))1Φ(xθ(t)).

Part (c)

Derive the E-step and M-step using parts (a) and (b). Give the updating equation.

E-step

The E-step entails taking the conditional expectation of the complete log-likelihood function (θ|{Yi}) given the observed data {xi} and {δi}.

Q(θ|θ(t))=EYi|xi,δi((θ|{Yi})=EYi|xi,δi(k+θi=1nYin2θ2)=kn2θ2+θi=1nEYi|xi,δi(Yi).

We have already solved the expectation of Yi given xi and δi. We rewrite Q by substituting E(Yi|xi,δi) with its previously found solution, Q(θ|θ(t))=kn2θ2+θi=1nδixi+(1δi)(θ(t)+ϕ(xiθ(t))1Φ(xiθ(t))).

Letting C=i=1n(1δi), R=i=1nδixi, and separating out all terms that are independent of θ(t), Q(θ|θ(t))=kn2θ2+Cθθ(t)+Rθ+θi=1n(1δi)ϕ(xiθ(t))1Φ(xiθ(t)).

M-step

We wish to solve θ(t+1)=argmaxθQ(θ|θ(t)). by solving dQ(θ|θ(t))dθ|θ=θ(t+1)=0, which may be written as nθ(t+1)+Cθ(t)+R+i=1n(1δi)ϕ(xiθ(t))1Φ(xiθ(t))=0.

Solving for θ(t+1) obtains the updating equation θ(t+1)=Rn+Cnθ(t)+1ni=1n(1δi)ϕ(xiθ(t))1Φ(xiθ(t)). where R=i=1nδixi and C=i=1n(1δi).

Part (d)

Use your algorithm on the V.A. data to find the MLE of μ. Take the log of the event times first and standardize by sample standard deviation. You may simply use the censored data sample mean as your starting value.

In the following R code, we implement the updating equation derived in the previous step. We encapulsate the procedure into a function that takes its arguments in the form of a censored set, uncensorted set, starting value (θ(1)), and an ϵ value to control stopping condition.

# assuming the uncensored and censored data are distributed normally,
# we use the EM algorithm to derive an estimator given censored and uncensored
# data.
mean_normal_censored_estimator_em <- function(uncensored,censored,theta,eps=1e-6,debug=T)
{
  dev <- sd(log(c(uncensored,censored)))
  censored <- log(censored) / dev
  uncensored <- log(uncensored) / dev
  theta <- log(theta) / dev
  
  n <- length(censored) + length(uncensored)
  C <- length(censored)
  R <- sum(uncensored)
  
  s <- function(theta)
  {
    sum <- 0
    for (i in 1:C)
    {
      num <- dnorm(censored[i],mean=theta,sd=1)
      denom <- 1-pnorm(censored[i],mean=theta,sd=1)
      sum <- sum + (num / denom)
    }
    sum
  }
  
  i <- 1
  repeat
  {
    theta.new <- R/n + C/n * theta + (1/n)*s(theta)
    if (debug==T) { cat("theta[", i, "] =",theta,", theta[", i+1, "] =",theta.new,"\n") }
    if (abs(theta.new - theta) < eps)
    {
      theta <- theta.new * dev
      theta <- exp(theta)
      return(theta)
    }
    i <- i + 1
    theta <- theta.new
  }
}

We apply this procedure to the indicated data set.

library(MASS) # has VA data
VAs <- subset(VA,prior==0)
censored <- VAs$status == 0
censored_xs <- VAs[censored,c("stime")]
uncensored_xs <- VAs[!censored,c("stime")]

mu <- mean(uncensored_xs)
cat("mean of the uncensored sample is ", mu, ".")
## mean of the uncensored sample is  112.1648 .
sol <- mean_normal_censored_estimator_em(uncensored_xs,censored_xs,mu)
## theta[ 1 ] = 3.857928 , theta[ 2 ] = 3.424258 
## theta[ 2 ] = 3.424258 , theta[ 3 ] = 3.415443 
## theta[ 3 ] = 3.415443 , theta[ 4 ] = 3.415286 
## theta[ 4 ] = 3.415286 , theta[ 5 ] = 3.415283 
## theta[ 5 ] = 3.415283 , theta[ 6 ] = 3.415283
sol
## [1] 65.2625

We see that our estimate of θ is θ^=65.2624985. (The θ before transforming it to the appropriate scale was 3.415283.)

This mean is somewhat lower than anticipated, which makes me suspect something is wrong with my updating equation. If I have the time, I will revisit it.

Problem 2


Problem 4.2


Part (a)

There are N=1500 gay men in the survey sample where Xi denotes the i-th persons response to the number of risky sexual encounters he had in the previous 30 days. Thus, we observe a sample X=(X1,X2,,XN).

We assume there are 3 groups in the population, denoted by z=1, t=2, and p=3. Group 1 members report 0 risky sexual encounters regardless of the truth where the probability of being a member of group 1 is denoted by α,

Group 2 members accurately report risky sexual encounters and represent typical behavior where the probability of being a member of group 2 is denoted by β. We assume this group’s number of sexual encounters follows a poisson with mean μ.

Group 3 members accurately report risky sexual encounters and represent high-risk behavior where the probability of being a member of group 3 is γ=1αβ. We assume this group’s number of sexual encounters follows a poisson with mean λ.

This represents a finite mixture model with a pdf Xif(x|θ)=αI(x=0)+βPOI(x|μ)+(1αβ)POI(x|λ) with a parameter vector θ=(α,β,μ,λ).

Let the uncertain group that the i-th person belongs to be denoted by Zi. If we observe group membership data, Xi|Zi=zi, then Xi|Zi=1I(x=0),Xi|Zi=2POI(μ),Xi|Zi=3POI(λ), where ZifZi(zi|θ)=Pr(Zi=zi)={αzi=1,βzi=2,γ=1αβzi=3, and thus fXi,Zi(xi,zi|θ)=αI(zi=1)+βPOI(μ)I(zi=2)+(1αβ)POI(λ)I(zi=3).

The likelihood function is thus given by L(θ|X,Z)=i=1NfXi,Zi(xi,zi|θ), which may be rewritten as L(θ|X,Z)=({i|zi=1}αI(xi=0))({i|zi=2}βμxieμxi!)({i|zi=3}γλxieλxi!).

We wish to rewrite this so that the data is explicitly represented. First, we do the transformation L(θ|X,Z)=({i|zi=1,xi=0}α)k=016({i|zi=2,xi=k}βμkeμk!)k=016({i|zi=3,xi=k}γλkeλk!).

We let na,b denote the (unobserved) cardinality of {i|zi=a,xi=b}, thus L(θ|{nj,k})=αn1,0k=016βn2,kμkn2,keμn2,k(k!)n2,kk=016γn3,kλkn3,keλn3,k(k!)n3,k is the complete likelihood. The complete log-likelihood is thus (θ|{nj,k})=n1,0logα+k=016log(βn2,kμkn2,keμn2,k(k!)n2,k)+k=016log(γn3,kλkn3,keλn3,k(k!)n3,k) which simplies to (θ|{nj,k})=n1,0logα+k=016n2,k(logβ+klogμμlogk!)+n3,k(logγ+klogλλlogk!).

Anticipating taking ddθ to solve for the maximum of the log-likelihood, we remove any terms that are not a function of θ, resulting in the kernel (θ|{nj,k})=n1,0logα+k=016{n2,k(logβ+klogμμ)+n3,k(logγ+klogλλ)}.

E-step

The conditional expectation to solve in the EM algorithm is given by Q(θ|θ(t))=E((θ)) where {nk,j} are random and {nj} and θ(t) are given. We rewrite this as Q(θ|θ(t))=E(n1,0logα+k=016{n2,k(logβ+klogμμ)+n3,k(logγ+klogλλ)}).

Using the linearity of expectations, we rewrite the above to Q(θ|θ(t))=E(n1,0)logα+k=016{E(n2,k)(logβ+klogμμ)+E(n3,k)(logγ+klogλλ)} given {nj} and θ(t).

Consider E(n2,k|{nj},θ(t)). To solve this expectation, we must first derive the distribution of n2,k.

Suppose xj=k, then probability that the j-th person belongs to group 2 is given by Pr(Zj=2|xj=k)=Pr(Zj=2)Pr(xj=k|Zj=2)/Pr(xj=k). We note that Pr(xj=k) is equivalent to πk(θ), Pr(Zj=2) is the definition of β, and Pr(xj=k|Zj=2) is fXj|Zj(k|Zj=2)=POI(k|μ).

Making the substitutions yields the result tk(θ)=Pr(Zj=2|xj=k)=βPOI(k|μ)/πk(θ).

Assuming {Xi} are i.i.d., observe that k0, the distribution of n2,k given nk, θ(t) is binomial distributed with a probability of success tk(θ(t)). Thus, E(n2,k)=nktk(θ(t)). The same logic holds for n3,k and n1,0, and thus E(n3,k)=nkpk(θ(t)) and E(n1,0)=n0z0(θ(t)), which means Q(θ|θ(t))=n0z0(θ(t))logα+k=016{nktk(θ(t))(logβ+klogμμ)+nkpk(θ(t))(logγ+klogλλ)}

M-step

We wish to solve θ(t+1)=argmaxθQ(θ|θ(t)). by solving Q(θ|θ(t))|θ=θ(t+1)=0.

We use the Lagrangian to impose the restriction α+β+γ=1, thus we seek to perform the constrained maximization of Ql(θ,c|θ(t))=Q(θ|θ(t))+c(1αβγ).

Thus, when we solve for α, Qlα=n0z0(θ(t))αc=0, we get the result α(t+1)=1cn0z0(θ(t)).

Similar results hold for β and γ, obtaining β(t+1)=1ck=016nktk(θ(t)). and γ(t+1)=1ck=016nkpk(θ(t)).

This does not look too promising until we realize that n0z0(θ(t))+k=016nktk(θ(t))+k=016nkpk(θ(t))=N.

Thus, c(α(t)+β(t)+γ(t))=N, which means c=N since α(t)+β(t)+γ(t)=1. Making this substitution obtains the result α(t+1)=1Nn0z0(θ(t))β(t+1)=1Nk=016nktk(θ(t))γ(t+1)=1Nk=016nkpk(θ(t)).

Solving an estimator for μ at iteration (t+1), Qlμ|μ=μ(t+1=0k=016nktk(θ(t))(k/μ(t+1)1)=01μ(t+1)k=016nktk(θ(t))k=k=016nktk(θ(t))μ(t+1)=k=016knktk(θ(t))k=016nktk(θ(t)).

The same derivation essentially follows for λ, and thus λ(t+1)=k=016knkpk(θ(t))k=016nkpk(θ(t)).

Part (b)

Estimate the parameters of the model, using the observed data.

# we observe n = (n0,n1,...,n16)
ns <- c(379,299,222,145,109,95,73,59,45,30,24,12,4,2,0,1,1)
N <- sum(ns)

# theta := (alpha, beta, mu, lambda)'
# note that there is an implicit parameter gamma s.t.
# alpha + beta + gamma = 1
# the initial value assumes each category z, t, or p
# is equally probable, and so we let
#     (alpha^(0),beta^(0)) = (1/3,1/3)
# and mu^(0) and lambda^(0) are just arbitrarily chosen to be 2 and 3,
# with the insight that group 3 is more risky than group 2.
theta <- c(1/3,1/3,2,3)

# theta := (alpha, beta, mu, lambda)
Pi <- function(i,theta)
{
  res <- 0
  if (i == 0)
    res <- theta[1]
  
  res <- res + theta[2] * theta[3]^i * exp(-theta[3])
  res <- res + (1 - theta[1] - theta[2]) * theta[4]^i * exp(-theta[4])
  res
}

z0 <- function(theta)
{
  theta[1] / Pi(0,theta)
}

t <- function(i,theta)
{
  theta[2] * theta[3]^i * exp(-theta[3]) / Pi(i,theta)
}

p <- function(i,theta)
{
  (1-theta[1] - theta[2]) * theta[4]^i * exp(-theta[4]) / Pi(i,theta)
}

# update algorithm, based on EM algorithm
update <- function(theta,ns)
{
  # note: n0 := ns[1] instead of ns[0] since R does not use zero-based indexes
  alpha <- ns[1] * z0(theta) / N
  beta <- 0
  mu_num <- 0
  mu_denom <- 0
  
  lam_num <- 0
  lam_denom <- 0

  for (i in 0:16)
  {
    ti <- t(i,theta)
    pi <- p(i,theta)
    
    beta <- beta + ns[i+1] * ti
    
    mu_num <- mu_num + i * ns[i+1] * ti
    mu_denom <- mu_denom + ns[i+1] * ti
    
    lam_num <- lam_num + i * ns[i+1] * pi
    lam_denom <- lam_denom + ns[i+1] * pi
  }
  
  beta <- beta / N
  mu <- mu_num / mu_denom
  lam <- lam_num / lam_denom
  
  c(alpha,beta,mu,lam)
}

em <- function(theta,ns,steps=10000,debug=T)
{
  for(i in 1:steps)
  {
    theta = update(theta,ns)
    if (debug==T)
    {
      if (i %% 1000 == 0) { cat("iteration =",i," theta = (",theta,")'\n") }
    }
  }
  theta
}

# solution theta = (alpha, beta, mu, lambda)
sol <- em(theta,ns,10000,T)
## iteration = 1000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 2000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 3000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 4000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 5000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 6000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 7000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 8000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 9000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'
## iteration = 10000  theta = ( 0.1221661 0.5625419 1.467475 5.938889 )'

We see that the solution is 0.1221661,0.5625419,1.4674746,5.9388889.

Part (c)

Estimate the standard errors and pairwise correlations of your parameters, using any available method.

We have chosen to use the Bootstrap method.

# ns = (379,299,222,145,109,95,73,59,45,30,24,12,4,2,0,1,1)
# 379 responded 0 encounters
# 299 responded 1 encounters
# 222 responded 2 encounters
# ...
# 1 responded 16 encounters
#
# to resample, we resample from the data set that includes each
# persons response, as determined by ns.
data <- NULL
for (i in 1:length(ns))
{
  data <- append(data,rep((i-1),ns[i]))
}

make_into_counts <- function(data)
{
  ns <- NULL
  for (i in 0:16)
  {
    ni <- data[data == i]
    l <-length(ni)
    ns <- append(ns,l)
  }
  ns
}

m <- 1000 # bootstrap replicates
em_steps <- 100
theta.bs <- em(theta,ns,em_steps,F)
thetas <- rbind(theta.bs)
for (i in 2:m)
{
  indices <- sample(N,N,replace=T)
  resampled <- make_into_counts(data[indices])
  theta.bs <- em(theta,resampled,em_steps,F)
  thetas <- rbind(thetas,theta.bs)
  if (i %% 100 == 0)
  {
    cat("iteration ", i, "\n")
    print(cov(thetas))
  }
}
## iteration  100 
##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0003903744 -0.0001438786 0.0017263560 0.001571458
## [2,] -0.0001438786  0.0004087653 0.0004993799 0.001813153
## [3,]  0.0017263560  0.0004993799 0.0152443668 0.015694549
## [4,]  0.0015714585  0.0018131530 0.0156945490 0.040863019
## iteration  200 
##               [,1]          [,2]        [,3]        [,4]
## [1,]  0.0004023631 -0.0001703146 0.001697344 0.001484011
## [2,] -0.0001703146  0.0004447321 0.000419080 0.001903868
## [3,]  0.0016973442  0.0004190800 0.014398069 0.015896531
## [4,]  0.0014840108  0.0019038683 0.015896531 0.044389192
## iteration  300 
##               [,1]          [,2]         [,3]       [,4]
## [1,]  0.0003956313 -0.0001727811 0.0016888749 0.00159774
## [2,] -0.0001727811  0.0004628827 0.0003008066 0.00181497
## [3,]  0.0016888749  0.0003008066 0.0137001437 0.01583219
## [4,]  0.0015977404  0.0018149703 0.0158321853 0.04443171
## iteration  400 
##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0003966960 -0.0001719888 0.0016451275 0.001548858
## [2,] -0.0001719888  0.0004684167 0.0003622689 0.002003458
## [3,]  0.0016451275  0.0003622689 0.0137218437 0.016201510
## [4,]  0.0015488580  0.0020034577 0.0162015099 0.046175397
## iteration  500 
##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0004135674 -0.0001873768 0.0016694824 0.001624097
## [2,] -0.0001873768  0.0004681143 0.0003021341 0.001786532
## [3,]  0.0016694824  0.0003021341 0.0136412731 0.015939754
## [4,]  0.0016240973  0.0017865318 0.0159397544 0.043573619
## iteration  600 
##               [,1]          [,2]        [,3]        [,4]
## [1,]  0.0004111393 -0.0001937508 0.001653108 0.001534535
## [2,] -0.0001937508  0.0004822347 0.000260584 0.001829480
## [3,]  0.0016531076  0.0002605840 0.013451057 0.015368992
## [4,]  0.0015345352  0.0018294799 0.015368992 0.042992734
## iteration  700 
##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0004259534 -0.0001994680 0.0017338084 0.001631373
## [2,] -0.0001994680  0.0004943474 0.0002033533 0.001795465
## [3,]  0.0017338084  0.0002033533 0.0136916465 0.015681755
## [4,]  0.0016313734  0.0017954653 0.0156817555 0.043016583
## iteration  800 
##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0004134958 -0.0001932351 0.0016624262 0.001607135
## [2,] -0.0001932351  0.0004925245 0.0002242567 0.001774913
## [3,]  0.0016624262  0.0002242567 0.0131371751 0.015149211
## [4,]  0.0016071351  0.0017749127 0.0151492114 0.042293737
## iteration  900 
##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0004139774 -0.0002014407 0.0016479052 0.001583003
## [2,] -0.0002014407  0.0004968611 0.0001793993 0.001649937
## [3,]  0.0016479052  0.0001793993 0.0129764348 0.014710759
## [4,]  0.0015830026  0.0016499365 0.0147107588 0.041445055
## iteration  1000 
##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0004196823 -0.0001981646 0.0016697031 0.001606301
## [2,] -0.0001981646  0.0004857090 0.0001668595 0.001625460
## [3,]  0.0016697031  0.0001668595 0.0129729349 0.014645101
## [4,]  0.0016063014  0.0016254599 0.0146451012 0.041083606
cov.bs <- cov(thetas)
cor.bs <- cor(thetas)

The Bootstrap estimator of the covariance matrix is given by

##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0004196823 -0.0001981646 0.0016697031 0.001606301
## [2,] -0.0001981646  0.0004857090 0.0001668595 0.001625460
## [3,]  0.0016697031  0.0001668595 0.0129729349 0.014645101
## [4,]  0.0016063014  0.0016254599 0.0146451012 0.041083606

and the correlation matrix is given by

##            [,1]        [,2]       [,3]      [,4]
## [1,]  1.0000000 -0.43891212 0.71558269 0.3868409
## [2,] -0.4389121  1.00000000 0.06647277 0.3638764
## [3,]  0.7155827  0.06647277 1.00000000 0.6343647
## [4,]  0.3868409  0.36387642 0.63436466 1.0000000

Let’s try using the Hessian of the observed information matrix.

library(numDeriv)
loglike <- function(theta)
{
  s <- 0
  for (x in data)
  {
    s <- s + log(theta[1]*as.numeric(x==0) +
                 theta[2]*dpois(x,theta[3]) +
                 (1-theta[1]-theta[2])*dpois(x,theta[4]))
  }
  s
}
mle <- c(0.1221661,0.5625419,1.4674746,5.9388889)
solve(-hessian(loglike,mle))
##               [,1]          [,2]         [,3]        [,4]
## [1,]  0.0003799048 -1.909698e-04 1.438556e-03 0.001184057
## [2,] -0.0001909698  4.657702e-04 7.132638e-05 0.001417409
## [3,]  0.0014385560  7.132638e-05 1.111722e-02 0.011376017
## [4,]  0.0011840568  1.417409e-03 1.137602e-02 0.034664940
Alex Towell
Alex Towell

Alex Towell has a masters in computer science and a masters in mathematics (statistics) from SIUe.