Suppose that [Z:=(Z1,Z2,Z3)′]{.math .inline} is a random vector with a mean vector [μ=E(Z)=(0,1,−1)′]{.math .inline} and a variance-covariance matrix [Σ=Var(Z)=1−0.50−0.521.501.53.]{.math .display}
Part (a)
First, given a matrix [A]{.math .inline}, we denote the [(i,j)]{.math .inline}-th element by [Aij]{.math .inline}. If [A]{.math .inline} is a vector, we simplify this notation and denote the [i]{.math .inline}-th element by [Ai]{.math .inline}.
The expectation of [Z1−3Z2−2Z3]{.math .inline} is given by [E(Z1−3Z2−2Z3)=E(Z1)−E(3Z2)−E(2Z3)=E(Z1)−3E(Z2)−2E(Z3)=μ1−3μ2−2μ3.]{.math .display}
It is given that [μ=(0,1,−1)]{.math .inline} and thus [E(Z1−3Z2−2Z3)=0−3(1)−2(−1)=0−3+2=−1.]{.math .display}
Part (b)
We use the theorems [Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y),]{.math .display} and [Cov(aX,bY)=abCov(X,Y).]{.math .display} The variance of [2Z1+Z3]{.math .inline} is given by [Var(2Z1+Z3)=Var(2Z1)+Var(Z3)+2Cov(2Z1,Z3)=22Var(Z1)+Var(Z3)+2(2Cov(Z1,Z3))=4Σ11+Σ33+4Σ13=4(1)+(3)+4(0)=7.]{.math .display}
Part (c)
We use the computational variance theorem [Cov(X,Y)=E(XY)−E(X)E(Y)]{.math .inline}, which also means [E(XY)=Cov(X,Y)+E(X)E(Y)]{.math .inline}.
The covariance of [3Z1−Z2]{.math .inline} and [Z2+2Z3]{.math .inline} is given by [Cov(3Z1−Z2,Z2+2Z3)=E[(3Z1−Z2)(Z2+2Z3)]−E(3Z1−Z2)E(Z2+2Z3)=E(3Z1Z2+6Z1Z3−Z22−2Z2Z3)−(3E(Z1)−E(Z2))(E(Z2)−2E(Z3))=3E(Z1Z2)+6E(Z1Z3)−E(Z22)−2E(Z2Z3)−(3μ1−μ2)(μ2−2μ3).]{.math .display}
Since [E(ZiZj)=Σij+μiμj]{.math .inline}, we may rewrite the above as [Cov(3Z1−Z2,Z2+2Z3)=3(Σ12+μ1μ2)+6(Σ13+μ1μ3)−(Σ22+μ22)−2(Σ23+μ2μ3)−(3μ1−μ2)(μ2−2μ3).]{.math .display} We are given the values of [Σij]{.math .inline} and [μi]{.math .inline} for [i,j∈{1,2,3}]{.math .inline}. We may rewrite the above by making these substitutions, [Cov(3Z1−Z2,Z2+2Z3)=3(−0.5+0⋅1)+6(0+0⋅(−1))−(2+12)−2(1.5+1(−1))−(3⋅0−1)(1−2(−1)).]{.math .display} The final calculation results in [Cov(3Z1−Z2,Z2+2Z3)=−3/2−3−1+3=−3/2−2/2=−5/2.]{.math .display}
Problem 1.2
Let [{et}]{.math .inline} be a normal white noise process with mean zero and variance [σ2]{.math .inline}. Consider the process [Yt=etet−1]{.math .inline}.
Part (a)
The expectation of [Yt]{.math .inline} is given by [E(Yt)=E(etet−1).]{.math .display} By independence, [E(Yt)]{.math .inline} may be rewritten as [E(Yt)=E(et)E(et−1)=0.]{.math .display}
The variance of [Yt]{.math .inline} is given by [Var(Yt)=E(Yt2)−E2(Yt).]{.math .display} We showed that [E(Yt)=0]{.math .inline}, thus the variance of [Yt]{.math .inline} may be rewritten as [Var(Yt)=E((etet−1)2)=E(et2et−12).]{.math .display} Since [et]{.math .inline} and [et−1]{.math .inline} are independent, [et2]{.math .inline} and [et−12]{.math .inline} are independent, and thus the variance of [Yt]{.math .inline} may be rewritten as [Var(Yt)=E(et2)E(et−12)=(σ2+0)(σ2+0)=σ4.]{.math .display} We observe that [{Yt}]{.math .inline} has constant mean [0]{.math .inline} and constant variance [σ4]{.math .inline}.
Part (b)
The autocovariance of [Yt]{.math .inline} and [Yt−ℓ]{.math .inline}, [ℓ≥0]{.math .inline}, is given by [Cov(Yt,Yt−ℓ)=E(YtYt−ℓ)−E(Yt)E(Yt−ℓ).]{.math .display} We have already shown in part (a) that [E(Yk)=0]{.math .inline} for any [k]{.math .inline}, thus [Cov(Yt,Yt−ℓ)=E(YtYt−ℓ).]{.math .display} Substituting the definition of [Yt]{.math .inline} and [Yt−ℓ]{.math .inline} into the above covariance gives [Cov(Yt,Yt−ℓ)=E(etet−1et−ℓet−ℓ−1).]{.math .display} We perform a case analysis to derive the autocovariance for different values of [ℓ]{.math .inline}.
If [ℓ=0]{.math .inline}, then [Cov(Yt,Yt)=Var(Yt)=σ4]{.math .inline}.
If [ℓ=1]{.math .inline}, then [Cov(Yt,Yt−1)=E(etet−1et−1et−2)]{.math .inline}. Let [W:=et−12]{.math .inline}, in which case [Cov(Yt,Yt−1)=E(etWet−2)]{.math .inline}. Since these are independent random variables, [Cov(Yt,Yt−1)=E(et)E(W)E(et−2)=0⋅E(W)⋅0=0.]{.math .display}
If [ℓ=2]{.math .inline}, then [Cov(Yt,Yt−2)=E(etet−1et−2et−3).]{.math .display} Since these are independent random variables, [Cov(Yt,Yt−1)=E(et)E(et−1)E(et−2)E(et−3)=0.]{.math .display} Any [ℓ≥2]{.math .inline} resutls in two elements of [{Yt}]{.math .inline} that have no random noise elements in common and thus also have a covariance of zero.
Since the autocovariance function is symmetric about [t]{.math .inline}, [Cov(Yt,Yt+ℓ)=Cov(Yt,Yt−ℓ)]{.math .inline}, and thus we see that the autocovariance function is strictly a function of the lag [ℓ]{.math .inline}.
We reparameterize the autocovariance function [γ]{.math .inline} with respect to [ℓ]{.math .inline}, [γℓ={σ40ℓ=0ℓ=0.]{.math .display} The autocorrelation function ls[ρℓ]{.math .inline} is defined as [γℓ/γ0]{.math .inline}, thus [ρℓ={10ℓ=0ℓ=0.]{.math .display}
Part (c)
The process is weakly stationary. It is weakly stationary because its mean is a constant [0]{.math .inline}, its variance is a constant [σ4]{.math .inline}, and its autocorrelation function is strictly a function of lag [ℓ]{.math .inline}.
Problem 1.3
First, we find an estimator [β^1]{.math .inline} given a random sample [{Yi}]{.math .inline} generated from the model [Yi=β1xi1+β2xi2+ϵi,]{.math .display} except we incorrectly or approximately assume [Yi=β1xi1+ϵi]{.math .inline}. The true statistical error of the [i]{.math .inline}-th random variable [Yi]{.math .inline} is given by [ϵi=Yi−β1xi1−β2xi2]{.math .display} and we are interested in minimizing the sum of the squares of the statistical errors [L=i=1∑nϵi2.]{.math .display} We (incorrectly or approximately) assume [ϵi=Yi−β1xi1]{.math .inline} and parameterize [L]{.math .inline} with respect to [β1]{.math .inline}, resulting in the function [L(β1)=i=1∑n(Yi−β1xi1)2.]{.math .display} We obtain an estimator for [β1]{.math .inline} by solving for [β1^]{.math .inline} in [∂β1∂Lβ1^=0.]{.math .display} Thus, [−2i∑(Yi−β^1xi1)xi1i∑(Yixi1−β^1xi12)β^1i∑xi12=0=0=i∑Yixi1]{.math .display} which finally simplifies to [β^1=∑i=1nxi12∑i=1nYixi1]{.math .display}
We are interested in the bais of [β^1]{.math .inline}, denoted by [b(β^1):=E(β^1−β1).]{.math .display} By the linearity of expectation, the bias may be rewritten as [b(β^1)=E(β^1)−β1.]{.math .display} The expectation of [β^1]{.math .inline} is given by [E(β^1)=E(∑ixi12∑iYixi1)=∑ixi12E(∑iYixi1)=∑ixi12∑iE(Yixi1)=∑ixi12∑ixi1E(Yi).]{.math .display} The true model of [{Yi}]{.math .inline} is given by [Yi=β1xi1+β2xi2+ϵi]{.math .inline}, so [E(β^1)=∑ixi12∑ixi1E(β1xi1+β2xi2+ϵi)=∑ixi12∑ixi1(β1xi1+β2xi2)=∑ixi12∑iβ1xi12+β2xi1xi2=∑ixi12∑iβ1xi12+∑ixi12∑iβ2xi1xi2=β1∑ixi12∑ixi12+β2∑ixi12∑ixi1xi2=β1+β2∑ixi12∑ixi1xi2]{.math .display} The bias is defined as [b(β^1)=E(β^1)−β1]{.math .inline}, thus [b(β^1)=β2∑i=1nxi12∑i=1nxi1xi2.]{.math .display}
The bias of [β^1]{.math .inline} is a function of the observed values [{xi1}]{.math .inline} and [{xi2}]{.math .inline} and the magnitude of [β2]{.math .inline}.
Problem 1.4
Consider the simple linear regression model [y=β0+β1x+ϵ]{.math .inline}, where [β0]{.math .inline} is known. [ϵ]{.math .inline}’s are i.i.d. with mean [0]{.math .inline} and variance [σ2]{.math .inline}.
Part (a)
Since [β0]{.math .inline} is known, we only need to find an estimator for [β1]{.math .inline}. The least-squares estimator of [β1]{.math .inline} is defined as
[β^1=argminβ1L(β1∣β0)]{.math .display} where [L(β1∣β0)=∑i=1n(Yi−β0−β1xi)2]{.math .inline}. We obtain the estimator by solving for [β^1]{.math .inline} in [∂β1∂Lβ^1=0.]{.math .display}
Thus, [−2i=1∑n(Yi−β0−β^1xi)xii=1∑n(xiYi−β0xi−β^1xi2)i=1∑nxiYi−β0i=1∑nxi−β^1i=1∑nxi2β^1i=1∑nxi2=0=0=0=i=1∑nxiYi−β0i=1∑nxi,]{.math .display} which finally simplifies to [β^1=∑i=1nxi2∑i=1nxiYi−β0∑i=1nxi.]{.math .display}
Of course, when [{Yi}={yi}]{.math .inline}, [β1^]{.math .inline} realizes the particular vlaue [β^1=∑i=1nxi2∑i=1nxiyi−β0∑i=1nxi.]{.math .display}
The expectation of [β^1]{.math .inline} is given by [E(β^1)=E∑ixi2∑ixiYi−β0∑ixi=(i∑xi2)−1E(i∑xiYi−β0i∑xi)=(i∑xi2)−1(i∑xiE(β0+β1xi+ϵi)−β0i∑xi)=(i∑xi2)−1(β0i∑xi+β1i∑xi2−β0i∑xi)=β1(i∑xi2)−1(i∑xi2)=β1,]{.math .display} which shows that it is unbiased (as expected).
Part (b)
To construct the confidence interval, we must find the standard deviation of [β^1]{.math .inline}. The variance is given by [Var(β^1)=Var(∑i=1nxi2∑i=1nxiYi−β0∑i=1nxi)=(i=1∑nxi2)−2Var(i=1∑nxiYi−β0i=1∑nxi)]{.math .display}
Since [β0]{.math .inline} and [xi]{.math .inline} for [i=1,…,n]{.math .inline} are constants, the above simplies to
[Var(β^1)=(i=1∑nxi2)−2Var(i=1∑nxiYi)=(i=1∑nxi2)−2i=1∑nVar(xiYi)=(i=1∑nxi2)−2i=1∑nxi2Var(Yi)=(i=1∑nxi2)−2i=1∑nxi2Var(β0+β1xi+ϵi)=(i=1∑nxi2)−2i=1∑nxi2Var(ϵi)=(i=1∑nxi2)−2i=1∑nxi2σ2=σ2(i=1∑nxi2)−2i=1∑nxi2]{.math .display} which finally yields the result [Var(β^1)=∑i=1nxi2σ2.]{.math .display} The standard error is therefore [SE(β^1)=∑i=1nxi2σ.]{.math .display}
Since [β^1]{.math .inline} is a linear combination of standard normal deviates, [β^1]{.math .inline} is normally distributed with a mean [β1]{.math .inline} and a variance [Var(β^1)]{.math .inline}. Thus, a [100(1−α)%]{.math .inline} confidence interval for [β1]{.math .inline} is [β1∈[β^1−zα/2SE(β^1),β^1+zα/2SE(β^1)]]{.math .display} or, substituting the expression for the standard deviation, [β1∈[β^1−∑ixi2zα/2σ,β^1+∑ixi2zα/2σ].]{.math .display}
To compare this estimator with the estimator for unknown [β=(β0,β1)]{.math .inline}, we choose to use the matrix equations for this part as a demonstration of a simpler alternative approach. The unbiased least-squares estimator of [β]{.math .inline} is given by [β^f=(X′X)−1X′Y.]{.math .display} where [X=11⋮1x1x2⋮xn]{.math .display} and [Y=Y1⋮Yn.]{.math .display} Performing the matrix calculations, we see that [(X′X)=(n∑ixi∑ixi∑ixi2)]{.math .display} and [(X′X)−1=n∑ixi2−(∑ixi)21(∑ixi2−∑ixi−∑ixin)]{.math .display} The variance-covariance matrix of [β^f]{.math .inline} is therefore [Σ=Var(β^f)=σ2(X′X)−1]{.math .display} and therefore the variance of [β^1,f]{.math .inline} is given by [Var(β^1,f)=σ2Σ22=n∑ixi2−(∑ixi)2nσ2=∑ixi2−n1(∑ixi)2σ2]{.math .display}
Recall that [Var(β^1)=∑ixi2σ2]{.math .inline}. The ratio of [Var(β^1)]{.math .inline} to [Var(β^1,f)]{.math .inline} is therefore [w=∑ixi2∑ixi2−n1(∑ixi)2=1−∑ixi2(∑ixi)2=1−x2nxˉ2,]{.math .display} which implies [0<w<1]{.math .inline}, i.e., [Var(β^1,f)]{.math .inline} is larger than [Var(β^1)]{.math .inline}. In particular, since [w=Var(β^1,f)Var(β^1)]{.math .inline}, [σβ^1=wσβ^1,f]{.math .display} and thus we may conclude that the confidence interval for [β^1]{.math .inline} is a factor [w]{.math .inline} as wide as the confidence interval for [β^1,f]{.math .inline}, where [w<1]{.math .inline}.
It was immediately obvious that knowing the value of [β0]{.math .inline} should reduce the uncertainty of [β1]{.math .inline} given a sample, but I thought this proof was fairly interesting.
Finally, to answer the question, since a larger variance yields a larger confidence interval, the confidence interval for [β^1]{.math .inline} is narrower than the confidence interval for [β^1,f]{.math .inline}.
Part 2
Problem 2.1
The EmployeeData data set gives the number of employees (in thousands) for a metal fabricator and one of their primary vendors for each month over a 5-year period. You may find the data in .txt file on blackboard and read the data into R using read.table command.
Part (a)
emp_data <- read.table("EmployeeData.txt", header=TRUE)
# fit a multiple regression model
ols.fit <- lm(metal~vendor, data=emp_data)
# get details from the regression output
summary(ols.fit)
##
## Call:
## lm(formula = metal ~ vendor, data = emp_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2348 -1.2393 -0.0311 1.0022 3.7077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.847911 3.299962 0.863 0.392
## vendor 0.122442 0.009423 12.994 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.59 on 58 degrees of freedom
## Multiple R-squared: 0.7443, Adjusted R-squared: 0.7399
## F-statistic: 168.8 on 1 and 58 DF, p-value: < 2.2e-16
# get the anova table
anova(ols.fit)
## Analysis of Variance Table
##
## Response: metal
## Df Sum Sq Mean Sq F value Pr(>F)
## vendor 1 426.72 426.72 168.83 < 2.2e-16 ***
## Residuals 58 146.59 2.53
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Part (b)
# vendor and metal seem to be positively correlated.
with(emp_data,plot(vendor,metal))
abline(ols.fit)
The residuals do not appear to be i.i.d. normally distributed around [0]{.math .inline}.
Part (d)
library("lmtest")
dwtest(ols.fit)
##
## Durbin-Watson test
##
## data: ols.fit
## DW = 0.35924, p-value < 2.2e-16
## alternative hypothesis: true autocorrelation is greater than 0
# with(emp_data,dwtest(metal~vendor))
If the test statistic [DW]{.math .inline} is around [2]{.math .inline}, there is strong evidence that there is no autocorrelation. In this case, the statistic is quite small, and so we have strong evidence to reject the null hypothesis of no autocorrelation.
Part (e)
# calculte phi fot the Cochrane Method
phi.hat=lm(ols.fit$residual[2:N]~0+ols.fit$residual[1:N-1])$coeff
# transform y and x according to the Cochrane Method
y.trans=emp_data$metal[2:N]-phi.hat*emp_data$metal[1:N-1]
x.trans=emp_data$vendor[2:N]-phi.hat*emp_data$vendor[1:N-1]
# fit OLS regression with transformed data
coch.or=lm(y.trans~x.trans)
summary(coch.or)
##
## Call:
## lm(formula = y.trans ~ x.trans)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1944 -0.4425 0.1461 0.5125 1.2218
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.87560 0.78655 6.199 6.78e-08 ***
## x.trans 0.04795 0.01300 3.688 0.000505 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7342 on 57 degrees of freedom
## Multiple R-squared: 0.1927, Adjusted R-squared: 0.1785
## F-statistic: 13.6 on 1 and 57 DF, p-value: 0.0005054
acf(coch.or$residual)
{width=“672”}
The standard error of the simple linear regression model for the intercept was [3.299962]{.math .inline}. The standard error for the intercept in the Cochrane-Orcutt model is significantly smaller at [0.78655]{.math .inline}.
The standard error of the simpler linear regression model for the slope is [0.009423]{.math .inline}. The standard error for the intercept in the Cochrane-Orcutt model is slightly larger at [0.01300]{.math .inline}.
The ACF for the Cochrane-Orcutt fit exhibits far less correlation, most of the lag times being within the bands.
Problem 2.2
The following analysis are based on the data in HomePrice.txt file on blackboard. You may read the data into R using read.table command. This HomePrice dataset has the following variables:
Part (a)
hp_data <- read.table("HomePrice.txt", header=TRUE)
colnames(hp_data) = c("t","Y","X1","X2")
# fit a multiple regression model
hp_model <- lm(Y~X1+X2, data=hp_data)
# get details from the regression output
summary(hp_model)
##
## Call:
## lm(formula = Y ~ X1 + X2, data = hp_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -228421 -38178 -5506 25494 383423
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.027e+05 1.265e+04 -8.121 3.39e-15 ***
## X1 1.560e+02 4.871e+00 32.019 < 2e-16 ***
## X2 1.151e+00 2.964e-01 3.882 0.000117 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 78070 on 519 degrees of freedom
## Multiple R-squared: 0.6808, Adjusted R-squared: 0.6796
## F-statistic: 553.5 on 2 and 519 DF, p-value: < 2.2e-16
# get the anova table
anova(hp_model)
## Analysis of Variance Table
##
## Response: Y
## Df Sum Sq Mean Sq F value Pr(>F)
## X1 1 6.6555e+12 6.6555e+12 1091.875 < 2.2e-16 ***
## X2 1 9.1880e+10 9.1880e+10 15.073 0.0001168 ***
## Residuals 519 3.1635e+12 6.0955e+09
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Part (b)
# studentized residualsr.unweighted=rstudent(hp_model)# the following plot of the fitted values vs the residuals suggests# non-constant variance. the variance seems to be increasing with respect# to y. the residuals do seem to have a zero mean though.plot(r.unweighted,hp_model$fitted)
{width=“672”}
# variance seems more constant with respect to home price (X1)plot(r.unweighted,hp_data$X1)
{width=“672”}
plot(r.unweighted,hp_data$X2)
{width=“672”}
The studentized residuals of the regression seems to have a non-costant variance with respect to the fitted values [y^]{.math .inline}, [X1]{.math .inline}, and [X2]{.math .inline}. Primarily, they exhibit the fanning out characteristic that you typically see with non-constant variance.
Part (c)
# the absolute value of the OLS residuals.
abs_residuals = abs(residuals(hp_model))
# we're fitting
# s(i) = gamma0 + gamma1 y(i) + zeta(i)
# where zeta(i) is the random error.
abs_residuals_fit=lm(abs_residuals~hp_model$fitted)
Part (d)
# weighted least squares. we're taking the squared reciprocal of the estimated
# residuals from the regression model as the weight matrix.
wts=1/(fitted(abs_residuals_fit))^2
# fit the weighted regression model to the data
hp_model.weighted=lm(Y~X1+X2, data=hp_data,weights=wts)
anova(hp_model.weighted)
## Analysis of Variance Table
##
## Response: Y
## Df Sum Sq Mean Sq F value Pr(>F)
## X1 1 4095.8 4095.8 1754.0376 <2e-16 ***
## X2 1 0.7 0.7 0.3068 0.5799
## Residuals 519 1211.9 2.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(hp_model.weighted)
##
## Call:
## lm(formula = Y ~ X1 + X2, data = hp_data, weights = wts)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -9.4644 -0.9364 -0.2118 0.6141 8.0706
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8918.7876 3619.3749 -2.464 0.0141 *
## X1 123.1438 3.3186 37.107 <2e-16 ***
## X2 -0.1274 0.2300 -0.554 0.5799
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.528 on 519 degrees of freedom
## Multiple R-squared: 0.7717, Adjusted R-squared: 0.7708
## F-statistic: 877.2 on 2 and 519 DF, p-value: < 2.2e-16
The studentized residuals of the weighted regression seems to have a more constant variance with respect to the fitted values [y^]{.math .inline}, [X1]{.math .inline}, and [X2]{.math .inline}. Primarily, they do not exhibit the fanning out characteristic that you typically see with non-constant variance.
In conclusion, the weighted regression seems to generate residuals that are more presentative of random white noise with zero mean and constant variance.