Problem 1
We are interested in modeling the relationship among the predictor variables for the body fat example. Specifically, we wish to model midarm circumference () as a function of triceps skinfold thickness () and thigh circumference (). Refer to the data from Table 7.1. The data for is listed in the first column, is listed in the second column, and is listed in the third column. We are not interested in the body fat measurements, listed in the fourth column, for this problem.
Part (a)
Compute the correlation matrix for , , .
# drop the last column of data (original response variable in the experiment)
exp8_1.data = read.csv('TABLE0701.csv')[,1:3]
exp8_1.data.cor = cor(exp8_1.data)
print(exp8_1.data.cor)
## triceps thigh midarm
## triceps 1.0000000 0.9238425 0.4577772
## thigh 0.9238425 1.0000000 0.0846675
## midarm 0.4577772 0.0846675 1.0000000
Part (b)
Test for a marginal effect of on against a model which includes no other input variables. (Compute the test statistic and -value.) Provide an interpretation of the result, stated in the context of the problem.
We test for a marginal effect of on by comparing the effects model with the no effects models where
and
names(exp8_1.data) = c("x1","x2","w")
m0 = lm(w~1, data=exp8_1.data)
m2 = lm(w~x2, data=exp8_1.data)
print(anova(m0,m2))
## Analysis of Variance Table
##
## Model 1: w ~ 1
## Model 2: w ~ x2
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 19 252.73
## 2 18 250.92 1 1.8117 0.13 0.7227
We see that with -value .
This is a very large -value, and so (thigh) is not adding much explanatary power compared to the model with no explanatary inputs. In other words, provides very little predictive power of (midarm).
Interpretation
The observed data is compatible with the reduced (no effects) model . It is not necessary to add thigh measurement to the no effects model for predicting midarm measurment.
Part (c)
Test for a partial effect of on against a model which includes . (Compute the test statistic and -value.) Provide an interpretation of the result, stated in the context of the problem.
We test for a partial effect of on given that is already in the model by comparing models and where
and
m1 = lm(w~x1, data=exp8_1.data)
m12 = lm(w~x1+x2, data=exp8_1.data)
print(anova(m1,m12))
## Analysis of Variance Table
##
## Model 1: w ~ x1
## Model 2: w ~ x1 + x2
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 18 199.769
## 2 17 2.416 1 197.35 1388.6 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Part (d)
Fit the regression model for which includes both and .
summary(m12)
##
## Call:
## lm(formula = w ~ x1 + x2, data = exp8_1.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58200 -0.30625 0.02592 0.29526 0.56102
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 62.33083 1.23934 50.29 <2e-16 ***
## x1 1.88089 0.04498 41.82 <2e-16 ***
## x2 -1.60850 0.04316 -37.26 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.377 on 17 degrees of freedom
## Multiple R-squared: 0.9904, Adjusted R-squared: 0.9893
## F-statistic: 880.7 on 2 and 17 DF, p-value: < 2.2e-16
We estimate
plot(exp8_1.data$x1,exp8_1.data$x2)
Part (e)
What feature of multidimensional modeling is illustrated in this problem?
Answer: Multicollinearity.
Specifically, observe that and are strongly positively correlated, , but and have, respectively, a positive and negative partial effect on . The combination of these partial effects and the correlation of and cancels out their partial effects on .
If we look at the scatterplots of versus and vs , they seem uncorrelated. However, the joint distribution of and are highly explanatory of . Investigating relationships in higher dimensions requires higher level statistical methods, such as regression analysis, rather than two-dimensional methods and graphs.