Discrete Multivariate Analysis - STAT 579 - Problem Set 10

May 1, 2021 Alex Towell (lex@metafunctor.com) 4 min read Updated: December 16, 2025

Supplemental material

We refactored the code for independence testing as a function that returns all relevant calculations given a matrix of cross-sectional data.

snugshade Highlighting [# csdata is cross sectional data (observed counts) entered as an IxJ matrix.]{style=“color: 0.56,0.35,0.01”} [# tests for independence P(X,Y) = P(X)P(Y) using the X^2 statistic.]{style=“color: 0.56,0.35,0.01”} independence_test_X2 [<-]{style=“color: 0.56,0.35,0.01”} [function]{style=“color: 0.13,0.29,0.53”}(csdata) { [# define the dimensions of the table]{style=“color: 0.56,0.35,0.01”} I [=]{style=“color: 0.56,0.35,0.01”} [dim]{style=“color: 0.13,0.29,0.53”}(csdata)[[1]{style=“color: 0.00,0.00,0.81”}] J [=]{style=“color: 0.56,0.35,0.01”} [dim]{style=“color: 0.13,0.29,0.53”}(csdata)[[2]{style=“color: 0.00,0.00,0.81”}]

[# compute the overall sample size, the row sample sizes, and the column sample sizes]{style=“color: 0.56,0.35,0.01”} total [=]{style=“color: 0.56,0.35,0.01”} [sum]{style=“color: 0.13,0.29,0.53”}(csdata) row.sum [=]{style=“color: 0.56,0.35,0.01”} [apply]{style=“color: 0.13,0.29,0.53”}(csdata,[1]{style=“color: 0.00,0.00,0.81”},sum) col.sum [=]{style=“color: 0.56,0.35,0.01”} [apply]{style=“color: 0.13,0.29,0.53”}(csdata,[2]{style=“color: 0.00,0.00,0.81”},sum)

[# use matrix algebra to compute the table of expected cell counts]{style=“color: 0.56,0.35,0.01”} expected [=]{style=“color: 0.56,0.35,0.01”} [matrix]{style=“color: 0.13,0.29,0.53”}(row.sum) [%*%]{style=“color: 0.81,0.36,0.00”} [t]{style=“color: 0.13,0.29,0.53”}([matrix]{style=“color: 0.13,0.29,0.53”}(col.sum)) [/]{style=“color: 0.81,0.36,0.00”} total [dimnames]{style=“color: 0.13,0.29,0.53”}(expected) [=]{style=“color: 0.56,0.35,0.01”} [dimnames]{style=“color: 0.13,0.29,0.53”}(csdata)

[# compute the X^2 statistic, degrees of freedom, and p-value]{style=“color: 0.56,0.35,0.01”} X2 [=]{style=“color: 0.56,0.35,0.01”} [sum]{style=“color: 0.13,0.29,0.53”}((obs[-]{style=“color: 0.81,0.36,0.00”}expected)[^]{style=“color: 0.81,0.36,0.00”}[2]{style=“color: 0.00,0.00,0.81”}[/]{style=“color: 0.81,0.36,0.00”}expected) df [=]{style=“color: 0.56,0.35,0.01”} (I[-1]{style=“color: 0.00,0.00,0.81”})[*****]{style=“color: 0.81,0.36,0.00”}(J[-1]{style=“color: 0.00,0.00,0.81”}) p.value.X2 [=]{style=“color: 0.56,0.35,0.01”} [pchisq]{style=“color: 0.13,0.29,0.53”}(X2,df,[lower.tail =]{style=“color: 0.13,0.29,0.53”} [FALSE]{style=“color: 0.56,0.35,0.01”})

[# return computed values as a map]{style=“color: 0.56,0.35,0.01”} [list]{style=“color: 0.13,0.29,0.53”}([expected_counts =]{style=“color: 0.13,0.29,0.53”} expected, [estimate =]{style=“color: 0.13,0.29,0.53”} expected[/]{style=“color: 0.81,0.36,0.00”}[sum]{style=“color: 0.13,0.29,0.53”}(obs), [X2 =]{style=“color: 0.13,0.29,0.53”} X2, [df =]{style=“color: 0.13,0.29,0.53”} df, [p.value =]{style=“color: 0.13,0.29,0.53”} p.value.X2) }

We also refactored the code for independence testing under the assumption of a monotonic association ([ $\gamma$ ]{.math .inline} correlation).

snugshade

Problem 1

The observed data is cross-sectional with a general model given by [ $\{n_{i j}\} \sim \operatorname{MULT}(n, \{\pi_{i j}\}).$ ]{.math .display}

We consider a simpler model where [ $X$ ]{.math .inline} and [ $Y$ ]{.math .inline} are independent, i.e., [ $\Pr(X,Y) = \Pr(X)\Pr(Y)$ ]{.math .inline}. Then, our task is to measure how compatible the observed data is to the null hypothesis [ $H_0 : \pi_{i j} = \pi_{i+}\pi_{+j}.$ ]{.math .display} We prefer [ $H_0$ ]{.math .inline} to a more general model that requires more parameters to estimate (and thus will have greater variance) and interpret.

We perform a chi-square test for independence. The test statistic is given by [ $X^2 = \sum_{i=1}^{4} \frac{(n_{i j} - \hat{m}_{i j})^2}{\hat{m}_{i j}}$ ]{.math .display} where [ $\hat{m}_{i j} = n \hat{\pi}_{i j 0}$ ]{.math .inline} and [ $\hat{\pi}_{i j 0} = \frac{n_{i+}n_{+j}}{n^2}$ ]{.math .inline}.

Under the null model, [ $X^2$ ]{.math .inline} is distributed [ $\chi^2$ ]{.math .inline} with [ $\rm{df} = 9$ ]{.math .inline} degrees of freedom.

We compute the [ $X^2$ ]{.math .inline} statistic and [ $p$ ]{.math .inline}-value with the following R code:

snugshade Highlighting obs [=]{style=“color: 0.56,0.35,0.01”} [matrix]{style=“color: 0.13,0.29,0.53”}([c]{style=“color: 0.13,0.29,0.53”}([7]{style=“color: 0.00,0.00,0.81”},[7]{style=“color: 0.00,0.00,0.81”},[2]{style=“color: 0.00,0.00,0.81”},[3]{style=“color: 0.00,0.00,0.81”},[2]{style=“color: 0.00,0.00,0.81”},[8]{style=“color: 0.00,0.00,0.81”},[3]{style=“color: 0.00,0.00,0.81”},[7]{style=“color: 0.00,0.00,0.81”},[1]{style=“color: 0.00,0.00,0.81”},[5]{style=“color: 0.00,0.00,0.81”},[4]{style=“color: 0.00,0.00,0.81”},[9]{style=“color: 0.00,0.00,0.81”},[2]{style=“color: 0.00,0.00,0.81”},[8]{style=“color: 0.00,0.00,0.81”},[9]{style=“color: 0.00,0.00,0.81”},[14]{style=“color: 0.00,0.00,0.81”}), [nrow=]{style=“color: 0.13,0.29,0.53”}[4]{style=“color: 0.00,0.00,0.81”}, [byrow=]{style=“color: 0.13,0.29,0.53”}[TRUE]{style=“color: 0.56,0.35,0.01”}) [colnames]{style=“color: 0.13,0.29,0.53”}(obs) [<-]{style=“color: 0.56,0.35,0.01”} [c]{style=“color: 0.13,0.29,0.53”}(["never/occassionally"]{style=“color: 0.31,0.60,0.02”}, ["fairly often"]{style=“color: 0.31,0.60,0.02”}, ["very often"]{style=“color: 0.31,0.60,0.02”}, ["almost always"]{style=“color: 0.31,0.60,0.02”}) [rownames]{style=“color: 0.13,0.29,0.53”}(obs) [<-]{style=“color: 0.56,0.35,0.01”} [colnames]{style=“color: 0.13,0.29,0.53”}(obs)

[print]{style=“color: 0.13,0.29,0.53”}(obs)

##                     never/occassionally fairly often very often almost always
## never/occassionally                   7            7          2             3
## fairly often                          2            8          3             7
## very often                            1            5          4             9
## almost always                         2            8          9            14

snugshade Highlighting x2_test [<-]{style=“color: 0.56,0.35,0.01”} [independence_test_X2]{style=“color: 0.13,0.29,0.53”}(obs)

The observed [ $X_0^2$ ]{.math .inline} is 16.955 and the [ $p$ ]{.math .inline}-value is 0.049. We consider this [ $p$ ]{.math .inline}-value to be moderate evidence against the null model. Stated differently, the observed data is moderately incompatible with the independence model.

For completeness, the estimates of [ $\pi_{i j}$ ]{.math .inline} under the independence model (with no additional assumptions) for the given cross-sectional data is given by:

snugshade

##                     never/occassionally fairly often very often almost always
## never/occassionally               0.028        0.064      0.041         0.076
## fairly often                      0.029        0.068      0.043         0.080
## very often                        0.028        0.064      0.041         0.076
## almost always                     0.048        0.112      0.072         0.132

Problem 2

The observed data is cross-sectional with a general model given by [ $\{n_{i j}\} \sim \operatorname{MULT}(n, \{\pi_{i j}\}).$ ]{.math .display}

We consider a simpler model using [ $\gamma$ ]{.math .inline}, [ $H_0 : \gamma = 0,$ ]{.math .display} i.e., [ $\gamma$ ]{.math .inline} is zero if [ $X$ ]{.math .inline} and [ $Y$ ]{.math .inline} are independent.

The test statistic is given by [ $Z^* = \frac{\hat\gamma-0}{\hat\sigma(\hat\gamma)},$ ]{.math .display} which is normally distributed [ $\mathcal{N}(0,1)$ ]{.math .inline} under the null model.

We compute [ $Z^*$ ]{.math .inline} and [ $p$ ]{.math .inline}-value with the following R code:

snugshade Highlighting gamma_test [<-]{style=“color: 0.56,0.35,0.01”} [independence_test_gamma]{style=“color: 0.13,0.29,0.53”}(obs)

The observed [ $Z^*$ ]{.math .inline} is 3.207 and the [ $p$ ]{.math .inline}-value is 0.001. We consider this [ $p$ ]{.math .inline}-value to be very strong evidence against the null model. That is, the observed data provides very strong evidence against the independence model when testing for support of a monotonic association model.

For completeness, the estimate for [ $\gamma$ ]{.math .inline} is [ $\hat\gamma=0.36$ ]{.math .inline}.

Problem 3

Estimates from simpler models have smaller variances than for estimates from more complicated models.