Skip to main content

Regression Analysis - SIUe - STAT 482 - Probem Set 8

Problem 1

We are interested in modeling the relationship among the predictor variables for the body fat example. Specifically, we wish to model midarm circumference (ww) as a function of triceps skinfold thickness (x1x_1) and thigh circumference (x2x_2). Refer to the data from Table 7.1. The data for x1x_1 is listed in the first column, x2x_2 is listed in the second column, and ww is listed in the third column. We are not interested in the body fat measurements, listed in the fourth column, for this problem.

Part (a)

Compute the correlation matrix for ww, x1x_1, x2x_2.

# drop the last column of data (original response variable in the experiment)
exp8_1.data = read.csv('TABLE0701.csv')[,1:3]
exp8_1.data.cor = cor(exp8_1.data)
print(exp8_1.data.cor)
##           triceps     thigh    midarm
## triceps 1.0000000 0.9238425 0.4577772
## thigh   0.9238425 1.0000000 0.0846675
## midarm  0.4577772 0.0846675 1.0000000

Part (b)

Test for a marginal effect of x2x_2 on ww against a model which includes no other input variables. (Compute the test statistic and pp-value.) Provide an interpretation of the result, stated in the context of the problem.

We test for a marginal effect of x2x_2 on ww by comparing the effects model m2m_2 with the no effects models m0m_0 where

m0:wi=β0+ϵi m_0 : w_i = \beta_0 + \epsilon_i

and

m2:wi=β0+β1x2i+ϵi. m_2 : w_i = \beta_0 + \beta_1 x_{2 i} + \epsilon_i.
names(exp8_1.data) = c("x1","x2","w")
m0 = lm(w~1,  data=exp8_1.data)
m2 = lm(w~x2, data=exp8_1.data)
print(anova(m0,m2))
## Analysis of Variance Table
## 
## Model 1: w ~ 1
## Model 2: w ~ x2
##   Res.Df    RSS Df Sum of Sq    F Pr(>F)
## 1     19 252.73                         
## 2     18 250.92  1    1.8117 0.13 0.7227

We see that F2=.130F_2 = .130 with pp-value .723.723.

This is a very large pp-value, and so x2x_2 (thigh) is not adding much explanatary power compared to the model m0m_0 with no explanatary inputs. In other words, x2x_2 provides very little predictive power of ww (midarm).

Interpretation

The observed data is compatible with the reduced (no effects) model m0m_0. It is not necessary to add thigh measurement to the no effects model for predicting midarm measurment.

Part (c)

Test for a partial effect of x2x_2 on ww against a model which includes x1x_1. (Compute the test statistic and pp-value.) Provide an interpretation of the result, stated in the context of the problem.

We test for a partial effect of x2x_2 on ww given that x1x_1 is already in the model by comparing models m1m_1 and m12m_{1 2} where

m1:wi=β0+β1xi1+ϵi m_1 : w_i = \beta_0 + \beta_1 x_{i 1} + \epsilon_i

and

m12:wi=β0+β1xi1+β1x2i+ϵi. m_{1 2} : w_i = \beta_0 + \beta_1 x_{i 1} + \beta_1 x_{2 i} + \epsilon_i.
m1 = lm(w~x1, data=exp8_1.data)
m12 = lm(w~x1+x2, data=exp8_1.data)
print(anova(m1,m12))
## Analysis of Variance Table
## 
## Model 1: w ~ x1
## Model 2: w ~ x1 + x2
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1     18 199.769                                  
## 2     17   2.416  1    197.35 1388.6 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Part (d)

Fit the regression model for ww which includes both x1x_1 and x2x_2.

summary(m12)
## 
## Call:
## lm(formula = w ~ x1 + x2, data = exp8_1.data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.58200 -0.30625  0.02592  0.29526  0.56102 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 62.33083    1.23934   50.29   <2e-16 ***
## x1           1.88089    0.04498   41.82   <2e-16 ***
## x2          -1.60850    0.04316  -37.26   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.377 on 17 degrees of freedom
## Multiple R-squared:  0.9904,	Adjusted R-squared:  0.9893 
## F-statistic: 880.7 on 2 and 17 DF,  p-value: < 2.2e-16

We estimate

w^=62.331+1.881x11.608x2. \hat{w} = 62.331 + 1.881 x_1 - 1.608 x_2.
plot(exp8_1.data$x1,exp8_1.data$x2)

plot of chunk unnamed-chunk-5

Part (e)

What feature of multidimensional modeling is illustrated in this problem?

Answer: Multicollinearity.

Specifically, observe that x1x_1 and x2x_2 are strongly positively correlated, r12=0.924r_{1 2} = 0.924, but x1x_1 and x2x_2 have, respectively, a positive and negative partial effect on ww. The combination of these partial effects and the correlation of x1x_1 and x2x_2 cancels out their partial effects on ww.

If we look at the scatterplots of x1x_1 versus ww and x2x_2 vs ww, they seem uncorrelated. However, the joint distribution of x1x_1 and x2x_2 are highly explanatory of ww. Investigating relationships in higher dimensions requires higher level statistical methods, such as regression analysis, rather than two-dimensional methods and graphs.