Skip to main content

Discrete Multivariate Analysis - STAT 579 - Problem Set 6

Problem 1

If we have a non-linear function of some random variable, in the delta method we use a Taylor approximation of the function centered around the random variables expected value such that the approximate variance of the function of the random variable is easily computed.

Let [g(π^)=log(π^/(1π^))\operatorname{g}(\hat\pi) = \log(\hat\pi/(1-\hat\pi))]{.math .inline}. A linear approximation of [g\operatorname{g}]{.math .inline} is given by [g^(π^)=g(π)+g(π)(π^π),\operatorname{\hat g}(\hat\pi) = \operatorname{g}(\pi) +\operatorname{g'}(\pi)(\hat\pi - \pi),]{.math .display} the derivative of [g\operatorname{g}]{.math .inline} is given by [g(π)=1π(1π),\operatorname{g'}(\pi) = \frac{1}{\pi(1-\pi)},]{.math .display} and the variance of [g^(π^)\operatorname{\hat g}(\hat\pi)]{.math .inline} is given by [Var(g^(π^))=(g(π))2Var(π^)=1π2(1π)2π(1π)n=1π(1π)1n.\begin{aligned} \operatorname{Var}(\operatorname{\hat g}(\hat \pi)) &= \left(\operatorname{g'}(\pi)\right)^2 \operatorname{Var}(\hat\pi)\\ &= \frac{1}{\pi^2(1-\pi)^2} \frac{\pi(1-\pi)}{n}\\ &= \frac{1}{\pi(1-\pi)} \frac{1}{n}. \end{aligned}]{.math .display}

Since we do not know [π\pi]{.math .inline}, we approximate it with [π^\hat\pi]{.math .inline}, thus [σ2(log(π^/(1π^)))=1π^(1π^)1n.\sigma^2(\log(\hat\pi/(1-\hat\pi))) = \frac{1}{\hat\pi(1-\hat\pi)} \frac{1}{n}.]{.math .display} # Problem 2

Consider data from a retrospective study on the relationship between daily alcohol consumption and the onset of esophagus cancer.

                                      cancer   no cancer

3-4   [\\(\> 80\\)]{.math .inline}g       71          82
      [\\(\< 80\\)]{.math .inline}g       60         441
                              total      131         523

Part (a)

In a retrospective study, the table draws samples from the conditional probabilities [P(X=iY=j)\operatorname{P}(X = i | Y = j)]{.math .inline} and we denote [P(X=1Y=j)\operatorname{P}(X = 1 | Y = j)]{.math .inline} by [pjp_j]{.math .inline}.

The ML estimator of [pjp_j]{.math .inline} from the data in a retrospective study is given by [p^j=n1jn+j.\hat p_j = \frac{n_{1 j}}{n_{+j}}.]{.math .display}

In a retrospective study, an estimator of [σ(logθ^)\sigma(\log \hat\theta)]{.math .inline} is given by [σ^(logθ^)=[(1p^1+11p^1)1n+1+(1p^2+11p^2)1n+2+]1/2.\hat\sigma(\log \hat\theta) = \left[ \left(\frac{1}{\hat{p}_1} + \frac{1}{1-\hat{p}_1}\right)\frac{1}{n_{+1}} + \left(\frac{1}{\hat{p}_2} + \frac{1}{1-\hat{p}_2}\right)\frac{1}{n_{+2}} + \right]^{1/2}.]{.math .display}

Part (b)

It does not depend on the sampling scheme. When asymptotic normality holds, replacing parameters with statistics invokes the likelihood principle.

When we substitute the MLE [p^j\hat p_j]{.math .inline} into the equation for [σ^(logθ^)\hat \sigma(\log \hat \theta)]{.math .inline}, we get the result [σ^(logθ^)=(1n11+1n21+1n12+1n22)1/2,\hat\sigma(\log \hat\theta) = \left(\frac{1}{n_{1 1}} + \frac{1}{n_{2 1}} + \frac{1}{n_{1 2}} + \frac{1}{n_{2 2}} \right)^{1/2},]{.math .display} which is the same as for retrospective studies and cross-sectional studies.

Part (c)

Recall [θ=p1/(1p1)p2/(1p2)\theta = \frac{p_1 / (1 - p_1)}{p_2 / (1 - p_2)}]{.math .inline}. If we replace [pjp_j]{.math .inline} by its ML estimator [p^j\hat p_j]{.math .inline}, then [p^1=n11n+1=711310.542\hat p_1 = \frac{n_{1 1}}{n_{+1}} = \frac{71}{131} \approx 0.542]{.math .display} and [p^2=n12n+2=825230.157.\hat p_2 = \frac{n_{1 2}}{n_{+2}} = \frac{82}{523} \approx 0.157.]{.math .display} Thus, by the invariance property of MLEs, [θ^=p^1/(1p^1)p^2/(1p^2)0.542/0.4580.157/0.8436.354\hat\theta = \frac{\hat p_1 / (1 - \hat p_1)}{\hat p_2 / (1 - \hat p_2)} \approx \frac{0.542 / 0.458}{0.157 / 0.843} \approx 6.354]{.math .display} and therefore [logθlog6.3541.850\log \theta \approx \log 6.354 \approx 1.850]{.math .inline}. Next, we need to find the standard deviation of the [logθ^\log \hat\theta]{.math .inline}, [σ^(logθ^)=[(10.542+10.458)1131+(10.157+10.843)1523]1/20.213.\hat\sigma(\log \hat\theta) = \left[ \left(\frac{1}{0.542} + \frac{1}{0.458}\right)\frac{1}{131} + \left(\frac{1}{0.157} + \frac{1}{0.843}\right)\frac{1}{523} \right]^{1/2} \approx 0.213.]{.math .display}

Thus, a [95%95\%]{.math .inline} confidence interval for [logθ\log \theta]{.math .inline} is [logθ^±1.96σ^(logθ^).\log \hat \theta \pm 1.96 \hat\sigma(\log \hat \theta).]{.math .display} Plugging in these computed values, we get the [95%95\%]{.math .inline} confidence interval [[1.8500.417,1.850+0.417]=[1.433,2.267].[1.850 - 0.417,1.850 + 0.417] = [1.433,2.267].]{.math .display}

Part (d)

Observe that [γ^=θ^1θ^+1.\hat\gamma = \frac{\hat\theta-1}{\hat\theta+1}.]{.math .display} The MLE [θ^6.354\hat\theta \approx 6.354]{.math .inline}. Thus, [γ^6.35416.354+10.728\hat\gamma \approx \frac{6.354-1}{6.354+1} \approx 0.728]{.math .inline}.

We estimate that there is a large size, positive association between daily alcohol consumption and the onset of cancer.

Problem 3

For the counts array, we populated it with the values [(7,7,2,3,2,8,3,7,1,5,4,9,2,8,9,14)(7,7,2,3,2,8,3,7,1,5,4,9,2,8,9,14)]{.math .inline} and ran the code.

##       2.5%        25%        50%        75%      97.5% 
## 0.09879317 0.24392244 0.31425372 0.38217439 0.50166928

Thus, a [95%95\%]{.math .inline} interval estimate is approximately [[0.1,0.5][0.1,0.5]]{.math .inline}.