Skip to main content

Statistical Methods - STAT 581 - Problem Set 2

A product developer is investigating the tensile strength of a new synthetic fiber that will be used to make cloth for men’s shirts. Strength may be affected by the percentage of cotton used in the blend of material for the fiber. A completely randomized experiment with five levels of cotton content is performed.

Part 1

\fbox{State the statistical hypothesis of interest.}

The hypothesis of interest is whether the percentage of cotton at five different levels effects the tensile strength of a new synthetic fiber.

We may formulate this as a hypothesis test of the form \begin{align*} H_0 &: \mu_1 = \cdots = \mu_5\ H_A &: \text{μiμj\mu_i \neq \mu_j for at least one pair (i,j)(i,j), iji\neq j,} \end{align*} where μk\mu_k is the expected tensile strength at the kk-th level of cotton.

If H0H_0 is true, the percentage of cotton has no effect on tensile strength.

Part 2

\fbox{Briefly explain how the form of the alternative hypothesis requires a need for further investigation.}

If there are differences in the cotton level means, further investigation is required to determine where the differences occur.

Part 3

\fbox{Create a Boxplot as a graphical display of the data.}

library(printr)
library("readxl")

h2.data = read_excel("./handout2data.xlsx")
strength = na.omit(h2.data$strength)
percent = na.omit(as.factor(h2.data$percent))

boxplot(strength~percent)

Part 4

\fbox{Compute the sample mean and sample variance of tensile strength for each level of cotton percentage.}

means = by(strength,percent,mean)
variances = by(strength,percent,var)
cbind(means,variances)
meansvariances
159.811.2
2015.49.8
2517.64.3
3021.66.8
3510.88.2

Part 5

\fbox{State the ANOVA model using treatment level effects. Compute estimates of the model parameters.}

In this CRD experiment, we observe n=5n=5 responses at each of a=5a=5 levels of cotton (we treat the cotton percentage level as categorical, even though if we show that the cotton level has a practical effect on tensile strength, we may treat it as a quantitative input in, say, a regression model).

The data is given by \begin{align*} Y_{1 1},\ldots,Y_{1 5} &\sim \mathcal{N}(\mu+\tau_1,\sigma^2)\ &\vdots\ Y_{5 1},\ldots,Y_{5 5} &\sim \mathcal{N}(\mu+\tau_5,\sigma^2). \end{align*}

The model is given by

Yij=μ+τi+ϵij{i=1,,5j=1,,5, Y_{i j} = \mu + \tau_i + \epsilon_{i j} \begin{cases} i = 1,\ldots,5\\ j = 1,\ldots,5, \end{cases}

where \begin{align*} Y_{i j} &\qquad\text{is the jj-th response for tensile strength for the ii-th cotton level},\ \mu &\qquad\text{is the overall mean of the tensile strength},\ \tau_i &\qquad\text{is the effect that the ii-th cotton level has on the tensile strength},\ \epsilon_{i j} &\qquad\overset{\rm{iid}}{\sim} \mathcal{N}(0,\sigma^2),\ \sum_{i=1}^{5} \tau_i &\qquad= 0. \end{align*}

The estimates of the model parameters are given by \begin{align*} \hat{\mu} &= \bar{y}{\cdot \cdot},\ \hat{\tau}i &= \bar{y}{i \cdot} - \bar{y}{\cdot \cdot}. \end{align*}

We compute these estimates with the following R code:

mu.hat = mean(strength)
tau.hat = means - mu.hat

We see that μ^=15.04\hat{\mu} = 15.04 and

cotton levelτ^\hat{\tau}
15-5.24
200.36
252.56
306.56
35-4.24

Part 6

Part (a)

\fbox{\begin{minipage}{\textwidth} Compute the F0F_0 statistic and the pp-value. Perform the statistical test at level α=.05\alpha = .05. Provide an interpretation, stated in the context of the problem. \end{minipage}}

model.2 = aov(strength~percent)
summary(model.2)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## percent      4  475.8  118.94   14.76 9.13e-06 ***
## Residuals   20  161.2    8.06                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We see that F0=14.76F_0 = 14.76 and the pp-value =.000= .000. Thus, since p=.000<α=.05p = .000 < \alpha = .05, the null model where cotton level has no effect on tensile strength is not compatible with the data.

Part (b)

\fbox{\begin{minipage}{\textwidth} Compute the t0t_0 statistic and the pp-value for testing the 3030% group versus the 2525% group. Provide an interpretation, stated in the context of the problem. \end{minipage}}

All pair-wise hypothesis tests \begin{align*} H_0^{(i,j)} &: \tau_i = \tau_j\ H_A^{(i,j)} &: \tau_i \neq \tau_j. \end{align*} may be computed with the following R code: