Skip to main content

Time Series Analysis - STAT 478 - Problem Set 5

\newcommand{\var}{\operatorname{Var}} \newcommand{\expect}{\operatorname{E}} \newcommand{\corr}{\operatorname{Corr}} \newcommand{\cov}{\operatorname{Cov}} \newcommand{\ssr}{\operatorname{SSR}} \newcommand{\se}{\operatorname{SE}} \newcommand{\mat}[1]{\bm{#1}} \newcommand{\eval}[2]{\left. #1 \right\vert_{#2}} \newcommand{\argmin}{\operatorname{arg,min}}

Problem 1

Consider the MA(2)\operatorname{MA}(2) process, where all the et{e_t} values are independent white noise with variance σ2\sigma^2.

Yt=et0.5et10.3et2 Y_t = e_t − 0.5 e_{t−1} − 0.3 e_{t−2}

Prelimary analysis

In general, a MA(2)\operatorname{MA}(2) process has the form

Yt=μ+etθ1et1θ2et2 Y_t = \mu + e_t - \theta_1 e_{t-1} - \theta_2 e_{t-2}

where et{e_t} is i.i.d. WN with mean 00 and variance σ2\sigma^2.

We denote such a moving average process by MA(2;μ,θ1,θ2,σ)\operatorname{MA}(2 ; \mu,\theta_1,\theta_2,\sigma) which we model with the following R function:

MA2 <- function(mu, theta1, theta2, sigma=1)
{
   return(function(N)
   {
      et <- rnorm(n=N+2,mean=0,sd=sigma)
      yt <- mu +
            et[1:N] -
            theta1 * et[2:(N+1)] -
            theta2 * et[3:(N+2)]
      return(yt)
   })
}

This function takes three parameters, μ\mu, θ1\theta_1, and θ2\theta_2, and optionally a fourth parameter σ\sigma, and returns an anonymous function that accepts a single parameter, nn, which specifies the number of points to generate from the process.

Matching et0.5et10.3et2e_t − 0.5 e_{t−1} − 0.3 e_{t−2} and μ+etθ1et1θ2et2\mu + e_t - \theta_1 e_{t-1} - \theta_2 e_{t-2} term by term, we see that μ=0\mu = 0, θ1=0.5\theta_1 = 0.5, and θ2=0.3\theta_2 = 0.3, or in other words

{Yt}MA2(μ=0,θ1=0.5,θ2=0.3,σ). \{Y_t\} \sim \operatorname{MA2}(\mu=0,\theta_1=0.5,\theta_2=0.3,\sigma).

Part (a)

\fbox{Find $\expect(Y_t)$.} A moving average process of order 22 with μ=0\mu = 0 has an expectation of zero. However, just this once, we will manually derive the characteristic. By the linearity of expectation, \begin{align} \expect(Y_t) &= \expect(e_t) - 0.5 \expect(e_{t-1}) - 0.3 \expect(e_{t-2)}\ &= 0 - 0.5 (0) - 0.3 (0) = 0. \end{align}

Part (b)

\fbox{Find $\cov(Y_t,Y_t) = \var(Y_t)$.}

Since YtMA(2;μ=0,θ=0.5,θ2=0.3,σ){Y_t} \sim \operatorname{MA}(2 ; \mu=0, \theta=0.5, \theta_2 = 0.3, \sigma), the variance is given by

$$ \var(Y_t) = \sigma^2(1 + \theta_1^2 + \theta_2^2) = \sigma^2(1 + 0.5^2 + 0.3^2) = 1.34 \sigma^2. $$

However, we may manually derive it using the computational variance formula, \begin{align*} \var(Y_t) &= \expect(Y_t^2) - \expect^2(Y_t)\ &= \expect(e_t − 0.5 e_{t−1} − 0.3 e_{t−2})^2\ &= \expect!\left(e_t^2 − e_t e_{t−1} − \frac{3}{5} e_t e_{t−2} + \frac{1}{4} e_{t-1}^2 + \frac{3}{10} e_{t-1} e_{t-2} + \frac{9}{100} e_{t-2}^2\right)\ &= \sigma^2 - 0 - \frac{3}{5} 0 + \frac{1}{4} \sigma^2 + \frac{3}{10} 0 + \frac{9}{100}\sigma^2\ &= \sigma^2 \left(1 + \frac{1}{4} \frac{9}{100}\right)\ &= 1.34 \sigma^2. \end{align*}

Part (c)

\fbox{Find $\gamma_k = \cov(Y_t,Y_{t+k})$. and, from this, find the ACF, ρk\rho_k.}

Since YtMA(2;μ=0,θ=0.5,θ2=0.3,σ){Y_t} \sim \operatorname{MA}(2; \mu=0,\theta=0.5, \theta_2=0.3, \sigma), the covariance of YtY_t and Yt+kY_{t+k} is given by \begin{align} \cov(Y_t,Y_{t+k}) = \begin{cases} \var(Y_t) & k = 0,\ \sigma^2(-\theta_1 + \theta_1 \theta_2) & k = 1,\ \sigma^2(-\theta_2) & k = 2,\ 0 & k > 2. \end{cases} \end{align}

We already derived the variance of YtY_t and, since θ1=0.5\theta_1 = 0.5 and θ2=0.3\theta_2 = 0.3, we may rewrite the above as \begin{align} \gamma_k = \begin{cases} 1.34 \sigma^2 & k = 0,\ -0.35 \sigma^2 & k = 1,\ -0.3 \sigma^2 & k = 2,\ 0 & k > 2 \end{cases} \end{align} and since ρk=γk/γ0\rho_k = \gamma_k / \gamma_0, \begin{equation} \rho_k = \begin{cases} 1 & k = 0,\ -0.261 & k = 1,\ -0.224 & k = 2,\ 0 & k > 2. \end{cases} \end{equation}

Part (d)

\fbox{ \begin{minipage}{.9\textwidth} Generate time series datasets of length n=200n=200 according to this MA(2)\operatorname{MA}(2) process. Plot the observed time series and the sample ACF and PACF. Do the plots agree with what you know to be true? \end{minipage}}

In code, we model YtMA2(θ1=0.5,θ2=0.3,μ=0){Y_t} \sim \operatorname{MA2}(\theta_1=0.5, \theta_2 = 0.3, \mu = 0) with:

Yt <- MA2(theta1=.5, theta2=.3, mu=0)

We sample n=200n=200 values from Yt{Y_t} with

y200 <- Yt(N=200)

We plot the realization of the time series with the following R code:

plot(y200,type="l")
points(y200)

The mean appears to be centered around 00 with a constant variance, as expected.

The sample ACF and PACF are given by:

par(mfrow=c(1,2),oma=c(0,0,0,0))
acf(y200)
pacf(y200)

As expected, since Yt{Y_t} models MA(2)\operatorname{MA}(2), the sample ACF cuts off after lag k=2k=2. The PACF is not as informative, other than suggesting that Yt{Y_t} is not a good fit for an autoregressive model.

Problem 2

\fbox{\begin{minipage}{.9\textwidth} Consider the AR(1)\operatorname{AR}(1) process: Yt=ϕYt1+etY_t = \phi Y_{t−1} + e_t, where all the et{e_t} values are independent white noise with variance σ2\sigma^2. \end{minipage}}

Preliminary analysis

In general, a AR(1)\operatorname{AR}(1) process is given by

Yt=Δ+ϕYt1+et Y_t = \Delta + \phi Y_{t-1} + e_t

where et{e_t} is i.i.d. WN with mean 00 and variance σ2\sigma^2.

We denote such an autoregressive process by AR(1;ϕ,Δ,σ)\operatorname{AR}(1 ; \phi, \Delta, \sigma), which we model with the following R function:

AR1 <- function(phi, delta, sigma=1)
{
   return(function(N)
   {
      yt <- vector(length=N)
      yt[1] <- 0 #rnorm(n=1,mean=0,sd=sigma)
      for (i in 2:N)
      {
	      et <- rnorm(n=1,mean=0,sd=sigma)
	      yt[i] <- delta+phi*yt[i-1]+et
      }
      return(yt)
   })
}

This function takes three parameters, ϕ\phi, Δ\Delta, and σ\sigma, and returns an anonymous function that accepts a single parameter, nn, which specifies the number of points to generate from the process.

Observe that an autoregressive process AR(1;ϕ,Δ,σ2)\operatorname{AR}(1; \phi, \Delta, \sigma^2) has a mean μ=Δ/(1ϕ)\mu = \Delta / (1-\phi) and variance σYt2=σ2/(1ϕ2)\sigma_{Y_t}^2 = \sigma^2 / (1-\phi^2).

Part (a)

\fbox{Show that if ϕ=1|\phi| = 1, the process cannot be stationary.}

To be stationary, the variance must be constant. The variance of YtY_t when ϕ=1|\phi|=1 is given by \begin{align*} \var(Y_t) &= \var(Y_{t−1}) + \sigma^2\ &= \var(Y_{t−2}) + 2 \sigma^2\ &\vdots\ &= \var(Y_{t-(t-2)}) + (t-2) \sigma^2\ &= \var(Y_{t-(t-1)}) + (t-1) \sigma^2\ &= t \sigma^2. \end{align*} Thus, if ϕ=1|\phi|=1, YtY_t has a variance of tσ2t \sigma^2, which is a function of time and is thus non-stationary.

Part (b)

\fbox{Take ϕ=0.6\phi = −0.6, calculate find the ACF, ρk\rho_k.}

For a time series that models AR(1)\operatorname{AR}(1), the autocorrelation function ρk\rho_k is given by ϕk\phi^k. Thus, since Yt{Y_t} models AR(1;ϕ=0.6,Δ=0,σ2)\operatorname{AR}(1 ; \phi=-0.6, \Delta = 0, \sigma^2), Yt{Y_t} has the autocorrelation function

ρk=(0.6)k \rho_k = (-0.6)^k

for k=0,1,2,k = 0,1,2,\ldots.

Part (c)

\fbox{\begin{minipage}{.9\textwidth} Take ϕ=0.6\phi = −0.6. Generate time series datasets of length n=200n = 200 according to the AR(1)\operatorname{AR}(1) process. Plot the observed time series and the sample ACF and PACF. Do the plots agree with what you know to be true? \end{minipage}}

First, we know that Yt=ϕYt1+etY_t = \phi Y_{t−1} + e_t. Matching this term by term to Matching ΔϕYt1+et\Delta \phi Y_{t−1} + e_t shows that Δ=0\Delta = 0 (and therefore $\expect(Y_t) = 0$).

In code, we model YtAR1(ϕ=0.6,Δ=0,σ=1){Y_t} \sim \operatorname{AR1}(\phi=-0.6, \Delta = 0, \sigma=1) with:

Yt <- AR1(phi=-0.6, delta=0, sigma=1)

We sample n=200n=200 values from Yt{Y_t} with:

y200 <- Yt(N=200)

We plot the realization of the time series with the following R code:

plot(y200,type="l")
points(y200)

The mean appears to be centered around 00 with a constant variance as expected. Also, since ϕ\phi is negative, YtY_t and Yt+1Y_{t+1} are negatively correlated and thus exhibits jittery up and down movements as expected.

The sample ACF and PACF are given by:

par(mfrow=c(1,2),oma=c(0,0,0,0))
acf(y200)
pacf(y200)

The sample PACF cuts off (with a single outlier) after lag 11, as expected of a AR(1)\operatorname{AR}(1) process.

Part (d)

\fbox{\begin{minipage}{.8\textwidth} What happens when we increase the sample size? Repeat part (c) when n=1000n = 1000. Comment on your findings. \end{minipage}}

We sample n=1000n=1000 values from Yt{Y_t} with

y1000 <- Yt(N=1000)

We plot the realization of the time series with the following R code:

plot(y1000,type="l")
points(y1000)

The mean appears to be centered around 00 with a constant variance, as expected.

The sample ACF and PACF are given by:

par(mfrow=c(1,2),oma=c(0,0,0,0))
acf(y1000)
pacf(y1000)

We see the same general pattern as before in each case, except due to the increased sample size the confidence intervals have narrowed.

Problem 3

A data set of 5757 consecutive measurements from a machine tool are in the \emph{deere3} object in the TSA package.

Preliminary steps

We load the requisite time series data with the following R code:

library(TSA)
data(deere3)

Part (a)

\fbox{\begin{minipage}{.8\textwidth} Plot the time series. What basic pattern do you see from the plot? Might a stationary model be appropriate for this plot? \end{minipage}}

We plot the time series with the following R code:

plot(deere3)

A stationary model may be appropriate since it fluctuates randomly around some central tendency and the fluctuates seem relatively constant (albeit quite large).

Part (b)

\fbox{\begin{minipage}{.8\textwidth} Plot the sample ACF and PACF. Tentatively specify the type of model (AR, MA, or ARMA) as well as the order(s) of the model. Write up detailed notes that describe how you decided on the model. \end{minipage}}

par(mfrow=c(1,2),oma=c(0,0,0,0))
acf(deere3)
pacf(deere3)

We seek a parsimonious model that sufficiently explains the data. Tentatively, we choose a AR(1)\operatorname{AR}(1) model.

The sample ACF decays and the sample PACF drops off to values compatible with 00 after lag k=1k=1. Thus, the sample ACF and PACF are compatible with a AR(1)\operatorname{AR}(1) process. Due to its parsimony and compatibility with the data, we choose AR(1)\operatorname{AR}(1).

Part (c)

\fbox{\begin{minipage}{.8\textwidth} Fit an AR(1)\operatorname{AR}(1) model using arima function in R and use it to forecast the next ten values of the series, and provides the forecasted values. \end{minipage}}

We seek to fit the time series to a first-order autoregressive model of the form

Yt=δ+ϕYt1+et Y_t = \delta + \phi Y_{t-1} + e_t

where ete_t is white noise with mean 00 and variance σ2\sigma^2 and δ=μ(1ϕ)\delta = \mu(1-\phi).

We perform the fit using the following R code:

ar1 <- arima(deere3,order=c(1,0,0)) # fit an AR(1) model
ar1
## 
## Call:
## arima(x = deere3, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.5255   124.3832
## s.e.  0.1108   394.2066
## 
## sigma^2 estimated as 2069355:  log likelihood = -495.51,  aic = 995.02

We see that μ^=124.3832\hat{\mu} = 124.3832, ϕ^=0.5255\hat{\phi} = 0.5255, σ^2=2069355\hat{\sigma}^2 = 2069355, and δ^=μ^(1ϕ^)=59.2\hat{\delta} = \hat{\mu}(1-\hat{\phi}) = 59.2. Thus, we estimate that

Y^t=59.02+0.5255Y^t1+et. \hat{Y}_t = 59.02 + 0.5255 \hat{Y}_{t-1} + e_t.

We perform a 1010-step ahead forecast with the following R code:

predict(ar1,n.ahead=10)$pred
## Time Series:
## Start = 58 
## End = 67 
## Frequency = 1 
##  [1] -335.145917 -117.120758   -2.538374   57.680010   89.327578  105.959850
##  [7]  114.700885  119.294706  121.708973  122.977783

For small look-aheads, since nearby values in the time series are correlated, nearby previous values should have some measurable effect on the forecast. However, we expect that, as the look-ahead time for the forecast goes to infinity, the forecast converges to μ^=124.3832\hat\mu = 124.3832.

Additional experimentation

For fun, we construct a generative autoregressive model of order 11 with:

Xt <- AR1(phi=0.5255, delta=59.0198284, sigma=sqrt(2069355))

If we sample a large number of points from this generative model and then fit a AR(1)\operatorname{AR}(1) model to it, we get estimates similiar to before:

arima(Xt(N=1000),order=c(1,0,0))
## 
## Call:
## arima(x = Xt(N = 1000), order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.5434   180.1728
## s.e.  0.0265    98.9118
## 
## sigma^2 estimated as 2044761:  log likelihood = -8684.51,  aic = 17373.02

Problem 4

A data set of durations until payment for 130130 consecutive orders from a Winegrad distributor are in the days object in the TSA package.

Preliminary steps

We load the data with the following R code:

library(TSA)
data(days)

Part (a)

\fbox{\begin{minipage}{.8\textwidth} Plot the time series. What basic pattern do you see from the plot? Might a stationary model be appropriate for this plot? \end{minipage}}

We plot the time series with the following R code:

plot(days)

A stationary model may be appropriate since it fluctuates randomly around some central tendency and the fluctuates seem relatively constant (albeit quite large).

Part (b)

\fbox{\begin{minipage}{.8\textwidth} Plot the sample ACF and PACF. Tentatively specify the type of model (AR, MA, or ARMA) as well as the order(s) of the model. Write up detailed notes that describe how you decided on the model. \end{minipage}}

The sample ACF and PACF are given by:

par(mfrow=c(1,2),oma=c(0,0,0,0))
acf(days)
pacf(days)

There does not seem to be much of a pattern to the data. The ACF and PACF suggest that the data is compatible with the hypothesis that the values in the time series are linearly uncorrelated.

We choose a model that is a white noise process with a non-zero mean. We might think of this as a degenerate case of the ARMA model denoted by ARMA(0,0)\operatorname{ARMA}(0,0).

Model selection

In what follows, we analyze model selection for this time series in more depth. Given a set of candidate models that may hopefully sufficiently explain the data, one strategy of model selection is to choose the model with the lowest Akaike information criterion (AIC) on the given data set.

In particular, suppose the set of candidate models MM is given by

M={ARMA(p,q)pP,qQ} M = \left\{ \operatorname{ARMA}(p,q) \, \vert \, p \in P, q \in Q \right\}

where PP and QQ are subsets of the natural numbers and we let the selected model mm^* be defined as

m=arg minmMAIC(m). m^* = \argmin_{m \in M} \operatorname{AIC}(m).

If MM is a relatively small set, we can simply perform a brute-force exhaustive search through. We let P=0,1,2,3P = {0,1,2,3} and Q=0,1,2,3Q = {0,1,2,3} and thus M=16|M|=16. We perform the exhaustive search with the following R code:

P <- c(0,1,2,3)
Q <- c(0,1,2,3)
aics <- matrix(nrow=length(P)*length(Q),ncol=3)
colnames(aics) <- c("AIC","p","q")
i <- 1
for (p in P)
{
   for (q in Q)
   {
      aics[i,1] <- arima(days,order=c(p,0,q))$aic
      aics[i,2] <- p
      aics[i,3] <- q
      i <- i + 1
   }
}   
aics <- aics[order(aics[,1],decreasing=FALSE),]
aics
##            AIC p q
##  [1,] 886.2239 1 2
##  [2,] 886.9111 0 0
##  [3,] 886.9546 0 2
##  [4,] 887.0030 2 1
##  [5,] 887.2617 3 0
##  [6,] 887.3115 2 0
##  [7,] 887.6010 1 0
##  [8,] 887.8065 0 3
##  [9,] 887.8736 0 1
## [10,] 887.9656 2 2
## [11,] 888.5414 3 1
## [12,] 888.8066 3 2
## [13,] 888.9095 1 1
## [14,] 890.0033 2 3
## [15,] 890.8047 3 3
## [16,] 890.8777 1 3
cat("m* = ARIMA(",aics[1,2],",",aics[1,3],")\n")
## m* = ARIMA( 1 , 2 )

We see that m=ARMA(1,2)m^* = \operatorname{ARMA}(1,2). However, the next best model according to AIC on MM is ARMA(0,0)\operatorname{ARMA}(0,0). The \emph{third} best model is ARMA(0,2)\operatorname{ARMA}(0,2), which is identical to MA(2)\operatorname{MA}(2).

Given the simplicity of white noise (with a non-zero mean), as an ad hoc decision I remain inclined to accept ARMA(0,0)\operatorname{ARMA}(0,0) as the most likely model for the data. In other words, my \emph{prior} on MM more heavily weighs the simpler models with smaller pp and qq than simply choosing the minimum AIC on MM.

Part (c)

\fbox{\begin{minipage}{.8\textwidth} Fit an MA(2)\operatorname{MA}(2) model using arima function in R and use it to forecast the next ten values of the series, and list the forecasted values. \end{minipage}}

We seek to fit the time series to a second-order moving average model of the form

Yt=μ+etθ1et1θ2et2 Y_t = \mu + e_t - \theta_1 e_{t-1} - \theta_2 e_{t-2}

where ete_t is white noise with mean 00 and variance σ2\sigma^2.

We perform the fit using the following R code:

ma2 <- arima(days,order=c(0,0,2))
ma2
## 
## Call:
## arima(x = days, order = c(0, 0, 2))
## 
## Coefficients:
##          ma1     ma2  intercept
##       0.1113  0.1557    28.6931
## s.e.  0.0894  0.0884     0.7946
## 
## sigma^2 estimated as 51.33:  log likelihood = -440.48,  aic = 886.95

We see that μ^=28.6931\hat\mu = 28.6931, θ^1=0.1113\hat\theta_1 = 0.1113, and θ^2=0.1557\hat\theta_2 = 0.1557. Thus, we estimate that

Y^t=28.6931+et0.1113et10.1557et2. \hat{Y}_t = 28.6931 + e_t - 0.1113 e_{t-1} - 0.1557 e_{t-2}.

We perform a 1010-step ahead forecast with the following R code:

predict(ma2,n.ahead=10)$pred
## Time Series:
## Start = 131 
## End = 140 
## Frequency = 1 
##  [1] 33.43453 27.67666 28.69310 28.69310 28.69310 28.69310 28.69310 28.69310
##  [9] 28.69310 28.69310

We see that the first two forecasts are a function of the last two observations. After that, the forecast is μ^=28.6931\hat\mu = 28.6931, which is consistent with the assumption of a constant mean stationary process.

Additional experimentation

For fun, we construct a generative moving average model of order 22 with:

Zt <- MA2(mu=28.6931,theta1=-0.1113,theta2=-0.1557,sigma=51.33)

If we sample a large number of points from this generative model and then fit a MA(2)\operatorname{MA}(2) model to it, we get estimates similiar to before:

par(mfrow=c(1,2),oma=c(0,0,0,0))
zt <- Zt(N=1000)
acf(zt)
pacf(zt)

arima(zt,order=c(0,0,2))
## 
## Call:
## arima(x = zt, order = c(0, 0, 2))
## 
## Coefficients:
##          ma1     ma2  intercept
##       0.0842  0.1451    27.4917
## s.e.  0.0313  0.0320     1.9668
## 
## sigma^2 estimated as 2562:  log likelihood = -5343.15,  aic = 10692.3