\newcommand{\var}{\operatorname{Var}} \newcommand{\expect}{\operatorname{E}} \newcommand{\corr}{\operatorname{Corr}} \newcommand{\cov}{\operatorname{Cov}} \newcommand{\ssr}{\operatorname{SSR}} \newcommand{\se}{\operatorname{SE}} \newcommand{\mat}[1]{\bm{#1}} \newcommand{\eval}[2]{\left. #1 \right\vert_{#2}} \newcommand{\argmin}{\operatorname{arg,min}}
Problem 1
Consider the process, where all the values are independent white noise with variance .
Prelimary analysis
In general, a process has the form
where is i.i.d. WN with mean and variance .
We denote such a moving average process by which we model with the following R function:
MA2 <- function(mu, theta1, theta2, sigma=1)
{
return(function(N)
{
et <- rnorm(n=N+2,mean=0,sd=sigma)
yt <- mu +
et[1:N] -
theta1 * et[2:(N+1)] -
theta2 * et[3:(N+2)]
return(yt)
})
}
This function takes three parameters, , , and , and optionally a fourth parameter , and returns an anonymous function that accepts a single parameter, , which specifies the number of points to generate from the process.
Matching and term by term, we see that , , and , or in other words
Part (a)
\fbox{Find $\expect(Y_t)$.} A moving average process of order with has an expectation of zero. However, just this once, we will manually derive the characteristic. By the linearity of expectation, \begin{align} \expect(Y_t) &= \expect(e_t) - 0.5 \expect(e_{t-1}) - 0.3 \expect(e_{t-2)}\ &= 0 - 0.5 (0) - 0.3 (0) = 0. \end{align}
Part (b)
\fbox{Find $\cov(Y_t,Y_t) = \var(Y_t)$.}
Since , the variance is given by
$$ \var(Y_t) = \sigma^2(1 + \theta_1^2 + \theta_2^2) = \sigma^2(1 + 0.5^2 + 0.3^2) = 1.34 \sigma^2. $$However, we may manually derive it using the computational variance formula, \begin{align*} \var(Y_t) &= \expect(Y_t^2) - \expect^2(Y_t)\ &= \expect(e_t − 0.5 e_{t−1} − 0.3 e_{t−2})^2\ &= \expect!\left(e_t^2 − e_t e_{t−1} − \frac{3}{5} e_t e_{t−2} + \frac{1}{4} e_{t-1}^2 + \frac{3}{10} e_{t-1} e_{t-2} + \frac{9}{100} e_{t-2}^2\right)\ &= \sigma^2 - 0 - \frac{3}{5} 0 + \frac{1}{4} \sigma^2 + \frac{3}{10} 0 + \frac{9}{100}\sigma^2\ &= \sigma^2 \left(1 + \frac{1}{4} \frac{9}{100}\right)\ &= 1.34 \sigma^2. \end{align*}
Part (c)
\fbox{Find $\gamma_k = \cov(Y_t,Y_{t+k})$. and, from this, find the ACF, .}
Since , the covariance of and is given by \begin{align} \cov(Y_t,Y_{t+k}) = \begin{cases} \var(Y_t) & k = 0,\ \sigma^2(-\theta_1 + \theta_1 \theta_2) & k = 1,\ \sigma^2(-\theta_2) & k = 2,\ 0 & k > 2. \end{cases} \end{align}
We already derived the variance of and, since and , we may rewrite the above as \begin{align} \gamma_k = \begin{cases} 1.34 \sigma^2 & k = 0,\ -0.35 \sigma^2 & k = 1,\ -0.3 \sigma^2 & k = 2,\ 0 & k > 2 \end{cases} \end{align} and since , \begin{equation} \rho_k = \begin{cases} 1 & k = 0,\ -0.261 & k = 1,\ -0.224 & k = 2,\ 0 & k > 2. \end{cases} \end{equation}
Part (d)
\fbox{ \begin{minipage}{.9\textwidth} Generate time series datasets of length according to this process. Plot the observed time series and the sample ACF and PACF. Do the plots agree with what you know to be true? \end{minipage}}
In code, we model with:
Yt <- MA2(theta1=.5, theta2=.3, mu=0)
We sample values from with
y200 <- Yt(N=200)
We plot the realization of the time series with the following R code:
plot(y200,type="l")
points(y200)
The mean appears to be centered around with a constant variance, as expected.
The sample ACF and PACF are given by:
par(mfrow=c(1,2),oma=c(0,0,0,0))
acf(y200)
pacf(y200)
As expected, since models , the sample ACF cuts off after lag . The PACF is not as informative, other than suggesting that is not a good fit for an autoregressive model.
Problem 2
\fbox{\begin{minipage}{.9\textwidth} Consider the process: , where all the values are independent white noise with variance . \end{minipage}}
Preliminary analysis
In general, a process is given by
where is i.i.d. WN with mean and variance .
We denote such an autoregressive process by , which we model with the following R function:
AR1 <- function(phi, delta, sigma=1)
{
return(function(N)
{
yt <- vector(length=N)
yt[1] <- 0 #rnorm(n=1,mean=0,sd=sigma)
for (i in 2:N)
{
et <- rnorm(n=1,mean=0,sd=sigma)
yt[i] <- delta+phi*yt[i-1]+et
}
return(yt)
})
}
This function takes three parameters, , , and , and returns an anonymous function that accepts a single parameter, , which specifies the number of points to generate from the process.
Observe that an autoregressive process has a mean and variance .
Part (a)
\fbox{Show that if , the process cannot be stationary.}
To be stationary, the variance must be constant. The variance of when is given by \begin{align*} \var(Y_t) &= \var(Y_{t−1}) + \sigma^2\ &= \var(Y_{t−2}) + 2 \sigma^2\ &\vdots\ &= \var(Y_{t-(t-2)}) + (t-2) \sigma^2\ &= \var(Y_{t-(t-1)}) + (t-1) \sigma^2\ &= t \sigma^2. \end{align*} Thus, if , has a variance of , which is a function of time and is thus non-stationary.
Part (b)
\fbox{Take , calculate find the ACF, .}
For a time series that models , the autocorrelation function is given by . Thus, since models , has the autocorrelation function
for .
Part (c)
\fbox{\begin{minipage}{.9\textwidth} Take . Generate time series datasets of length according to the process. Plot the observed time series and the sample ACF and PACF. Do the plots agree with what you know to be true? \end{minipage}}
First, we know that . Matching this term by term to Matching shows that (and therefore $\expect(Y_t) = 0$).
In code, we model with:
Yt <- AR1(phi=-0.6, delta=0, sigma=1)
We sample values from with:
y200 <- Yt(N=200)
We plot the realization of the time series with the following R code:
plot(y200,type="l")
points(y200)
The mean appears to be centered around with a constant variance as expected. Also, since is negative, and are negatively correlated and thus exhibits jittery up and down movements as expected.
The sample ACF and PACF are given by:
par(mfrow=c(1,2),oma=c(0,0,0,0))
acf(y200)
pacf(y200)
The sample PACF cuts off (with a single outlier) after lag , as expected of a process.
Part (d)
\fbox{\begin{minipage}{.8\textwidth} What happens when we increase the sample size? Repeat part (c) when . Comment on your findings. \end{minipage}}
We sample values from with
y1000 <- Yt(N=1000)
We plot the realization of the time series with the following R code:
plot(y1000,type="l")
points(y1000)
The mean appears to be centered around with a constant variance, as expected.
The sample ACF and PACF are given by:
par(mfrow=c(1,2),oma=c(0,0,0,0))
acf(y1000)
pacf(y1000)
We see the same general pattern as before in each case, except due to the increased sample size the confidence intervals have narrowed.
Problem 3
A data set of consecutive measurements from a machine tool are in the \emph{deere3} object in the TSA package.
Preliminary steps
We load the requisite time series data with the following R code:
library(TSA)
data(deere3)
Part (a)
\fbox{\begin{minipage}{.8\textwidth} Plot the time series. What basic pattern do you see from the plot? Might a stationary model be appropriate for this plot? \end{minipage}}
We plot the time series with the following R code:
plot(deere3)
A stationary model may be appropriate since it fluctuates randomly around some central tendency and the fluctuates seem relatively constant (albeit quite large).
Part (b)
\fbox{\begin{minipage}{.8\textwidth} Plot the sample ACF and PACF. Tentatively specify the type of model (AR, MA, or ARMA) as well as the order(s) of the model. Write up detailed notes that describe how you decided on the model. \end{minipage}}
par(mfrow=c(1,2),oma=c(0,0,0,0))
acf(deere3)
pacf(deere3)
We seek a parsimonious model that sufficiently explains the data. Tentatively, we choose a model.
The sample ACF decays and the sample PACF drops off to values compatible with after lag . Thus, the sample ACF and PACF are compatible with a process. Due to its parsimony and compatibility with the data, we choose .
Part (c)
\fbox{\begin{minipage}{.8\textwidth} Fit an model using arima function in R and use it to forecast the next ten values of the series, and provides the forecasted values. \end{minipage}}
We seek to fit the time series to a first-order autoregressive model of the form
where is white noise with mean and variance and .
We perform the fit using the following R code:
ar1 <- arima(deere3,order=c(1,0,0)) # fit an AR(1) model
ar1
##
## Call:
## arima(x = deere3, order = c(1, 0, 0))
##
## Coefficients:
## ar1 intercept
## 0.5255 124.3832
## s.e. 0.1108 394.2066
##
## sigma^2 estimated as 2069355: log likelihood = -495.51, aic = 995.02
We see that , , , and . Thus, we estimate that
We perform a -step ahead forecast with the following R code:
predict(ar1,n.ahead=10)$pred
## Time Series:
## Start = 58
## End = 67
## Frequency = 1
## [1] -335.145917 -117.120758 -2.538374 57.680010 89.327578 105.959850
## [7] 114.700885 119.294706 121.708973 122.977783
For small look-aheads, since nearby values in the time series are correlated, nearby previous values should have some measurable effect on the forecast. However, we expect that, as the look-ahead time for the forecast goes to infinity, the forecast converges to .
Additional experimentation
For fun, we construct a generative autoregressive model of order with:
Xt <- AR1(phi=0.5255, delta=59.0198284, sigma=sqrt(2069355))
If we sample a large number of points from this generative model and then fit a model to it, we get estimates similiar to before:
arima(Xt(N=1000),order=c(1,0,0))
##
## Call:
## arima(x = Xt(N = 1000), order = c(1, 0, 0))
##
## Coefficients:
## ar1 intercept
## 0.5434 180.1728
## s.e. 0.0265 98.9118
##
## sigma^2 estimated as 2044761: log likelihood = -8684.51, aic = 17373.02
Problem 4
A data set of durations until payment for consecutive orders from a Winegrad distributor are in the days object in the TSA package.
Preliminary steps
We load the data with the following R code:
library(TSA)
data(days)
Part (a)
\fbox{\begin{minipage}{.8\textwidth} Plot the time series. What basic pattern do you see from the plot? Might a stationary model be appropriate for this plot? \end{minipage}}
We plot the time series with the following R code:
plot(days)
A stationary model may be appropriate since it fluctuates randomly around some central tendency and the fluctuates seem relatively constant (albeit quite large).
Part (b)
\fbox{\begin{minipage}{.8\textwidth} Plot the sample ACF and PACF. Tentatively specify the type of model (AR, MA, or ARMA) as well as the order(s) of the model. Write up detailed notes that describe how you decided on the model. \end{minipage}}
The sample ACF and PACF are given by:
par(mfrow=c(1,2),oma=c(0,0,0,0))
acf(days)
pacf(days)
There does not seem to be much of a pattern to the data. The ACF and PACF suggest that the data is compatible with the hypothesis that the values in the time series are linearly uncorrelated.
We choose a model that is a white noise process with a non-zero mean. We might think of this as a degenerate case of the ARMA model denoted by .
Model selection
In what follows, we analyze model selection for this time series in more depth. Given a set of candidate models that may hopefully sufficiently explain the data, one strategy of model selection is to choose the model with the lowest Akaike information criterion (AIC) on the given data set.
In particular, suppose the set of candidate models is given by
where and are subsets of the natural numbers and we let the selected model be defined as
If is a relatively small set, we can simply perform a brute-force exhaustive search through. We let and and thus . We perform the exhaustive search with the following R code:
P <- c(0,1,2,3)
Q <- c(0,1,2,3)
aics <- matrix(nrow=length(P)*length(Q),ncol=3)
colnames(aics) <- c("AIC","p","q")
i <- 1
for (p in P)
{
for (q in Q)
{
aics[i,1] <- arima(days,order=c(p,0,q))$aic
aics[i,2] <- p
aics[i,3] <- q
i <- i + 1
}
}
aics <- aics[order(aics[,1],decreasing=FALSE),]
aics
## AIC p q
## [1,] 886.2239 1 2
## [2,] 886.9111 0 0
## [3,] 886.9546 0 2
## [4,] 887.0030 2 1
## [5,] 887.2617 3 0
## [6,] 887.3115 2 0
## [7,] 887.6010 1 0
## [8,] 887.8065 0 3
## [9,] 887.8736 0 1
## [10,] 887.9656 2 2
## [11,] 888.5414 3 1
## [12,] 888.8066 3 2
## [13,] 888.9095 1 1
## [14,] 890.0033 2 3
## [15,] 890.8047 3 3
## [16,] 890.8777 1 3
cat("m* = ARIMA(",aics[1,2],",",aics[1,3],")\n")
## m* = ARIMA( 1 , 2 )
We see that . However, the next best model according to AIC on is . The \emph{third} best model is , which is identical to .
Given the simplicity of white noise (with a non-zero mean), as an ad hoc decision I remain inclined to accept as the most likely model for the data. In other words, my \emph{prior} on more heavily weighs the simpler models with smaller and than simply choosing the minimum AIC on .
Part (c)
\fbox{\begin{minipage}{.8\textwidth} Fit an model using arima function in R and use it to forecast the next ten values of the series, and list the forecasted values. \end{minipage}}
We seek to fit the time series to a second-order moving average model of the form
where is white noise with mean and variance .
We perform the fit using the following R code:
ma2 <- arima(days,order=c(0,0,2))
ma2
##
## Call:
## arima(x = days, order = c(0, 0, 2))
##
## Coefficients:
## ma1 ma2 intercept
## 0.1113 0.1557 28.6931
## s.e. 0.0894 0.0884 0.7946
##
## sigma^2 estimated as 51.33: log likelihood = -440.48, aic = 886.95
We see that , , and . Thus, we estimate that
We perform a -step ahead forecast with the following R code:
predict(ma2,n.ahead=10)$pred
## Time Series:
## Start = 131
## End = 140
## Frequency = 1
## [1] 33.43453 27.67666 28.69310 28.69310 28.69310 28.69310 28.69310 28.69310
## [9] 28.69310 28.69310
We see that the first two forecasts are a function of the last two observations. After that, the forecast is , which is consistent with the assumption of a constant mean stationary process.
Additional experimentation
For fun, we construct a generative moving average model of order with:
Zt <- MA2(mu=28.6931,theta1=-0.1113,theta2=-0.1557,sigma=51.33)
If we sample a large number of points from this generative model and then fit a model to it, we get estimates similiar to before:
par(mfrow=c(1,2),oma=c(0,0,0,0))
zt <- Zt(N=1000)
acf(zt)
pacf(zt)
arima(zt,order=c(0,0,2))
##
## Call:
## arima(x = zt, order = c(0, 0, 2))
##
## Coefficients:
## ma1 ma2 intercept
## 0.0842 0.1451 27.4917
## s.e. 0.0313 0.0320 1.9668
##
## sigma^2 estimated as 2562: log likelihood = -5343.15, aic = 10692.3