Time Series: Dynamic Regression

class: center, middle, inverse, title-slide

# Time Series: Dynamic Regression
## FPP3, Chapter 10
### Robert W. Walker
### AGSM
### 2021-03-19

---

# An Overview

Expanding the space of relevant models still operates in the circle of chapter 5.

![Forecasting Diagram](img/Workflow.PNG)

---
### An Overview

Expanding the space of relevant models still operates in the circle of chapter 5.

+ Linear Models: tidying up  
+ Univariate 1: Exponential Smoothing
+ Univariate 2: ARIMA models
+ Dynamic Regression **Tying it Together**

![Forecasting Diagram](img/Workflow.PNG)

---

## Packages

Getting started

```r
library(tidyverse)
library(fpp3)
library(purrr)
library(gganimate)
library(seasonal)
library(tidyquant)
library(magrittr)
vic_elec_daily <- vic_elec %>%
  filter(year(Time) == 2014) %>%
  index_by(Date = date(Time)) %>%
  summarise(
    Demand = sum(Demand)/1e3,
    Temperature = max(Temperature),
    Holiday = any(Holiday)
  ) %>%
  mutate(Day_Type = case_when(
    Holiday ~ "Holiday",
    wday(Date) %in% 2:6 ~ "Weekday",
    TRUE ~ "Weekend"
  ))
```
---

# The magic of `forecast`

The method:
1. Tidy  
2. Visualise  
3. Model  
  a. Specify  
  b. Estimate  
  c. Evaluate  
  d. Visualize  
4. Forecast

---

# Models

```
3. Model  
  a. Specify  
  b. Estimate  
  c. Evaluate  
  d. Visualize  
```

---
# First: The Linear Model

`$$y_{t} = \beta_{0} + \beta_{1}x_{t} + \epsilon_{t}$$`

---
### Strengths and Limitations

Strengths:
+ Trends can be linear or nonlinear [knots].  
+ Seasonality can be naturally accommodated.  
+ Can accommodate covariates and lag structures.

Weaknesses:
+ Persistence of unknown origin and the need for decay.  This will be inherently nonlinear.

---
## Second: Exponential Smoothing

Describe the outcome as two separate but combineable forms: a Forecast equation `$$\hat{y}_{t+h|t} = S_{t}$$` and a Smoothing Equation `$$S_{t} = \alpha y_{t} + (1-\alpha) S_{t-1}$$`.

If we combine them, we get:

`$$\hat{y}_{t+h|t} = \alpha y_{t} + (1-\alpha) S_{t-1}$$`

---

## The Model: `ETS`

the `error` component (either `M` multiplicative or `A` additive)

the `trend`

the `season`

Some comments on options -- the optimization method.

---
# ARIMA

AR: autoregressive (lagged observations as inputs)  
I: integrated  
MA: moving average (lagged errors as inputs)

Use Case:
An ARIMA model is rarely interpretable in terms of visible data structures like trend and seasonality. But it can capture a huge range of time series patterns.

---
# Regression with ARIMA errors

---
## Regression with ARIMA errors

`$$y_t = \beta_0 + \beta_1 x_{1,t} + \dots + \beta_k x_{k,t} + \varepsilon_t$$`

where

* `$y_t$` modeled as function of `$k$` explanatory variables
`$x_{1,t},\dots,x_{k,t}$`.
  * In regression, we assume that `$\varepsilon_t$` was  WN.
  * Now we want to allow `$\varepsilon_t$` to be autocorrelated.

---
### Example: ARIMA(1,1,1) errors

`$$y_t = \beta_0 + \beta_1 x_{1,t} + \dots + \beta_k x_{k,t} + \eta_t,\\ (1-\phi_1B)(1-B)\eta_t = (1+\theta_1B)\varepsilon_t$$`

where `$\varepsilon_t$` is white noise.

---
## Residuals and errors

### Example: `$\eta_t$` = ARIMA(1,1,1)
`$$y_t = \beta_0 + \beta_1 x_{1,t} + \dots + \beta_k x_{k,t} + \eta_t,\\
    (1-\phi_1B)(1-B)\eta_t = (1+\theta_1B)\varepsilon_t,$$`

* Be careful in distinguishing `$\eta_t$` from `$\varepsilon_t$`.
  * Only the errors `$\varepsilon_t$` are assumed to be white noise.
  * In ordinary regression, `$\eta_t$` is assumed to be white noise and so `$\eta_t = \varepsilon_t$`.

---
## Estimation

If we minimize `$\sum \eta_t^2$` (by using ordinary regression):

1. Estimated coefficients `$\hat{\beta}_0,\dots,\hat{\beta}_k$` are no longer optimal as some information ignored;
  2. Statistical tests associated with the model (e.g., t-tests on the coefficients) are incorrect.
  3. `$p$`-values for coefficients usually too small (`spurious regression`).
  4. AIC of fitted models misleading.

* Minimizing `$\sum \varepsilon_t^2$` avoids these problems.
 * Maximizing likelihood similar to minimizing `$\sum \varepsilon_t^2$`.

---
class: inverse

## Stationarity

**Regression with ARMA errors**

`$$y_t = \beta_0 + \beta_1 x_{1,t} + \dots + \beta_k x_{k,t} + \eta_t,$$`
where `$\eta_t$` is an ARMA process.

* All variables in the model must be stationary.
  * If we estimate the model while any of these are non-stationary, the estimated coefficients can be incorrect.
  * Difference variables until all stationary.
  * If necessary, apply same differencing to all variables.

---
## Stationarity

Model with ARIMA(1,1,1) errors
`$$y_t = \beta_0 + \beta_1 x_{1,t} + \dots + \beta_k x_{k,t} + \eta_t,\\
       (1-\phi_1B)(1-B)\eta_t = (1+\theta_1B)\varepsilon_t,$$`
       
Equivalent to model with ARIMA(1,0,1) errors

`$$y'_t = \beta_1 x'_{1,t} + \dots + \beta_k x'_{k,t} + \eta'_t,\\
        (1-\phi_1B)\eta'_t = (1+\theta_1B)\varepsilon_t,$$`

where `$y'_t=y_t-y_{t-1}$`, `$x'_{t,i}=x_{t,i}-x_{t-1,i}$` and  `$\eta'_t=\eta_t-\eta_{t-1}$`.

---
## Regression with ARIMA errors

Any regression with an ARIMA
error can be rewritten as a regression with an ARMA error by differencing all
variables with the same differencing operator as in the ARIMA model.

Original data

`$$y_t  = \beta_0 + \beta_1 x_{1,t} + \dots + \beta_k x_{k,t} + \eta_t\\
  \mbox{where}\quad
       \phi(B)(1-B)^d\eta_t = \theta(B)\varepsilon_t$$`

After differencing all variables
`$$y'_t  = \beta_1 x'_{1,t} + \dots + \beta_k x'_{k,t} + \eta'_t.\\
  \mbox{where}\quad
        \phi(B)\eta_t = \theta(B)\varepsilon_t$$`

`$$\text{and}\quad  y_t' = (1-B)^dy_t$$`

---
## Model selection

* Fit regression model with automatically selected ARIMA errors. (R will take care of differencing before estimation.)
  * Check that `$\varepsilon_t$` series looks like white noise.

---
### Selecting predictors

* AICc can be calculated for final model.
  * Repeat procedure for all subsets of predictors to be considered, and select model with lowest AICc value.

---
## US personal consumption and income

```r
us_change %>%
  gather(key='variable', value='value') %>%
  ggplot(aes(y=value, x=Quarter, group=variable, colour=variable)) +
  geom_line() + facet_grid(variable ~ ., scales='free_y') +
  xlab("Year") + ylab("") +
  ggtitle("Quarterly changes in US consumption and personal income") +
  guides(colour="none")
```

---
## US personal consumption and income

* No need for transformations or further differencing.
  *  Increase in income does not necessarily translate into instant increase in consumption (e.g., after the loss of a job, it may take a few months for expenses to be reduced to allow for the new circumstances). We will ignore this for now.

---
## US personal consumption and income

```r
fit <- us_change %>% model(ARIMA(Consumption ~ Income))
report(fit)
```

```
Series: Consumption 
Model: LM w/ ARIMA(1,0,2) errors

Coefficients:
         ar1      ma1     ma2  Income  intercept
      0.7070  -0.6172  0.2066  0.1976     0.5949
s.e.  0.1068   0.1218  0.0741  0.0462     0.0850

sigma^2 estimated as 0.3113:  log likelihood=-163.04
AIC=338.07   AICc=338.51   BIC=357.8
```

**Write down the equations for the fitted model.**

---
## US personal consumption and income

```r
residuals(fit, type='regression') %>%
  gg_tsdisplay(.resid, plot_type = 'partial') +
  ggtitle("Regression errors")
```

---
## US personal consumption and income

```r
residuals(fit, type='response') %>%
  gg_tsdisplay(.resid, plot_type = 'partial') +
  ggtitle("ARIMA errors")
```

---
## US personal consumption and income

```r
augment(fit) %>%
  features(.resid, ljung_box, dof = 5, lag = 12)
```

```
# A tibble: 1 x 3
  .model                      lb_stat lb_pvalue
  <chr>                         <dbl>     <dbl>
1 ARIMA(Consumption ~ Income)    5.54     0.595
```

---
## US personal consumption and income

```r
us_change_future <- new_data(us_change, 8) %>%
  mutate(Income = mean(us_change$Income))
forecast(fit, new_data = us_change_future) %>%
  autoplot(us_change) +
  labs(x = "Year", y = "Percentage change",
       title = "Forecasts from regression with ARIMA(1,0,2) errors")
```

---
## Forecasting

* To forecast a regression model with ARIMA errors, we need to forecast the
regression part of the model and the ARIMA part of the model and combine the
results.
  * Some predictors are known into the future (e.g., time, dummies).
  * Separate forecasting models may be needed for other predictors.
  * Forecast intervals ignore the uncertainty in forecasting the predictors.

---
## Daily electricity demand

Model daily electricity demand as a function of temperature using quadratic regression with ARMA errors.

```r
vic_elec_daily %>%
  ggplot(aes(x=Temperature, y=Demand, colour=Day_Type)) +
  geom_point() +
  labs(x = "Maximum temperature", y = "Electricity demand (GW)")
```

---
## Daily electricity demand

```r
vic_elec_daily %>%
  gather("var", "value", Demand, Temperature) %>%
  ggplot(aes(x = Date, y = value)) + geom_line() +
  facet_grid(vars(var), scales = "free_y")
```

---
## Daily electricity demand

```r
fit <- vic_elec_daily %>%
  model(ARIMA(Demand ~ Temperature + I(Temperature^2) +
                (Day_Type=="Weekday")))
report(fit)
```

```
Series: Demand 
Model: LM w/ ARIMA(2,1,2)(2,0,0)[7] errors

Coefficients:
          ar1     ar2      ma1      ma2    sar1    sar2  Temperature
      -0.1093  0.7226  -0.0182  -0.9381  0.1958  0.4175      -7.6135
s.e.   0.0779  0.0739   0.0494   0.0493  0.0525  0.0570       0.4482
      I(Temperature^2)  Day_Type == "Weekday"TRUE
                0.1810                    30.4040
s.e.            0.0085                     1.3254

sigma^2 estimated as 44.91:  log likelihood=-1206.11
AIC=2432.21   AICc=2432.84   BIC=2471.18
```

---
## Daily electricity demand

```r
gg_tsresiduals(fit)
```

---
## Daily electricity demand

```r
augment(fit) %>%
  features(.resid, ljung_box, dof = 9, lag = 14)
```

```
# A tibble: 1 x 3
  .model                                                       lb_stat lb_pvalue
  <chr>                                                          <dbl>     <dbl>
1 "ARIMA(Demand ~ Temperature + I(Temperature^2) + (Day_Type …    28.4 0.0000304
```

---
## Daily electricity demand

```r
# Forecast one day ahead
vic_next_day <- new_data(vic_elec_daily, 1) %>%
  mutate(Temperature = 26, Day_Type = "Holiday")
forecast(fit, vic_next_day)
```

```
# A fable: 1 x 6 [1D]
# Key:     .model [1]
  .model                        Date           Demand .mean Temperature Day_Type
  <chr>                         <date>         <dist> <dbl>       <dbl> <chr>   
1 "ARIMA(Demand ~ Temperature … 2015-01-01 N(161, 45)  161.          26 Holiday 
```

---
## Daily electricity demand

```r
vic_elec_future <- new_data(vic_elec_daily, 14) %>%
  mutate(
    Temperature = 26,
    Holiday = c(TRUE, rep(FALSE, 13)),
    Day_Type = case_when(
      Holiday ~ "Holiday",
      wday(Date) %in% 2:6 ~ "Weekday",
      TRUE ~ "Weekend"
    )
  )
```

---
## Daily electricity demand

```r
forecast(fit, new_data=vic_elec_future) %>%
  autoplot(vic_elec_daily) + ylab("Electricity demand (GW)")
```

---
# Stochastic and deterministic trends

---
## Stochastic \& deterministic trends

Deterministic trend
`$$y_t = \beta_0 + \beta_1 t + \eta_t$$`
where `$\eta_t$` is ARMA process.

Stochastic trend
`$$y_t = \beta_0 + \beta_1 t + \eta_t$$`
where `$\eta_t$` is ARIMA process with `$d\ge1$`.

Difference both sides until `$\eta_t$` is stationary:
`$$y'_t = \beta_1 + \eta'_t$$`
where `$\eta'_t$` is ARMA process.

---
## International visitors

```r
aus_visitors <- as_tsibble(fpp2::austa)
aus_visitors %>%
  autoplot(value) +
  labs(x = "Year", y = "millions of people",
       title = "Total annual international visitors to Australia")
```

---
## International visitors

**Deterministic trend**

```r
fit_deterministic <- aus_visitors %>%
  model(Deterministic = ARIMA(value ~ trend() + pdq(d = 0)))
report(fit_deterministic)
```

```
Series: value 
Model: LM w/ ARIMA(2,0,0) errors

Coefficients:
         ar1      ar2  trend()  intercept
      1.1127  -0.3805   0.1710     0.4156
s.e.  0.1600   0.1585   0.0088     0.1897

sigma^2 estimated as 0.02979:  log likelihood=13.6
AIC=-17.2   AICc=-15.2   BIC=-9.28
```

---

`$$y_t = 0.42 + 0.17 t + \eta_t$$`
`$$\eta_t = 1.11 \eta_{t-1} -0.38 \eta_{t-2} + \varepsilon_t$$`
`$$\varepsilon_t \sim \text{NID}(0,0.0298).$$`

---
## International visitors

**Stochastic trend**

```r
fit_stochastic <- aus_visitors %>%
  model(Stochastic = ARIMA(value ~ pdq(d=1)))
report(fit_stochastic)
```

```
Series: value 
Model: ARIMA(0,1,1) w/ drift

Coefficients:
         ma1  constant
      0.3006    0.1735
s.e.  0.1647    0.0390

sigma^2 estimated as 0.03376:  log likelihood=10.62
AIC=-15.24   AICc=-14.46   BIC=-10.57
```

---

`$$y_t-y_{t-1} = 0.173 + \varepsilon_t + 0.301 \varepsilon_{t-1} \\
  y_t = y_0 + 0.173 t + \eta_t \\
  \eta_t = \eta_{t-1} + 0.301 \varepsilon_{t-1} + \varepsilon_{t}\\
  \varepsilon_t \sim \text{NID}(0,0.034).$$`

---
## International visitors

```r
fc_deterministic <- forecast(fit_deterministic, h = 10)
fc_stochastic <- forecast(fit_stochastic, h = 10)
rbind(fc_deterministic, fc_stochastic) %>%
  autoplot(aus_visitors) +
  facet_grid(vars(.model)) +
  labs(x = "Year", y = "Visitors to Australia (millions)",
       title = "Forecasts from trend models") +
  guides(colour = FALSE)
```

---
## International visitors

---
## Forecasting with trend

* Point forecasts are almost identical, but prediction intervals differ.
  * Stochastic trends have much wider prediction intervals because the errors are non-stationary.
  * Be careful of forecasting with deterministic trends too far ahead.

---
# Dynamic harmonic regression

---
## Dynamic harmonic regression

**Combine Fourier terms with ARIMA errors**

### Advantages
   * it allows any length seasonality;
   * for data with more than one seasonal period, you can include Fourier terms of different frequencies;
   * the seasonal pattern is smooth for small values of `$K$` (but more wiggly seasonality can be handled by increasing `$K$`);
   * the short-term dynamics are easily handled with a simple ARMA error.

### Disadvantages
 * seasonality is assumed to be fixed

---
## Eating-out expenditure

```r
aus_cafe <- aus_retail %>% filter(
    Industry == "Cafes, restaurants and takeaway food services",
    year(Month) %in% 2004:2018
  ) %>% summarise(Turnover = sum(Turnover))
aus_cafe %>% autoplot(Turnover)
```

---
## Eating-out expenditure

```r
fit <- aus_cafe %>% model(
    `K = 1` = ARIMA(log(Turnover) ~ fourier(K = 1) + PDQ(0,0,0)),
    `K = 2` = ARIMA(log(Turnover) ~ fourier(K = 2) + PDQ(0,0,0)),
    `K = 3` = ARIMA(log(Turnover) ~ fourier(K = 3) + PDQ(0,0,0)),
    `K = 4` = ARIMA(log(Turnover) ~ fourier(K = 4) + PDQ(0,0,0)),
    `K = 5` = ARIMA(log(Turnover) ~ fourier(K = 5) + PDQ(0,0,0)),
    `K = 6` = ARIMA(log(Turnover) ~ fourier(K = 6) + PDQ(0,0,0)))
glance(fit)
```

|.model |    sigma2|  log_lik|       AIC|      AICc|       BIC|
|:------|---------:|--------:|---------:|---------:|---------:|
|K = 1  | 0.0017471| 317.2353| -616.4707| -615.4056| -587.7842|
|K = 2  | 0.0010732| 361.8533| -699.7066| -697.8271| -661.4579|
|K = 3  | 0.0007609| 393.6062| -763.2125| -761.3329| -724.9638|
|K = 4  | 0.0005386| 426.7838| -821.5676| -818.2096| -770.5694|
|K = 5  | 0.0003173| 473.7344| -919.4688| -916.9078| -874.8454|
|K = 6  | 0.0003163| 474.0307| -920.0614| -917.5004| -875.4380|

---
## Eating-out expenditure

```r
cafe_plot <- function(...){
  fit %>%
    select(...) %>%
    forecast() %>% autoplot(aus_cafe) +
    ggtitle(sprintf("Log transformed %s, fourier(K = %s)", model_sum(select(fit,...)[[1]][[1]]), deparse(..1))) +
    geom_label(
      aes(x = yearmonth("2007 Jan"), y = 4250, label = paste0("AICc = ", format(AICc))),
      data = glance(select(fit,...))
    ) +
    geom_line(aes(y = .fitted), colour = "red", augment(select(fit, ...))) +
    ylim(c(1500, 5100))
}
```

---
## Eating-out expenditure

---
## Eating-out expenditure

---
## Eating-out expenditure

---
## Eating-out expenditure

---
## Eating-out expenditure

<img src="index_files/figure-html/cafe6-1.png" width="576" />
---
## Example: weekly gasoline products

```r
fit <- us_gasoline %>%
  model(ARIMA(Barrels ~ fourier(K = 13) + PDQ(0,0,0)))
report(fit)
```

```
Series: Barrels 
Model: LM w/ ARIMA(0,1,1) errors

Coefficients:
          ma1  fourier(K = 13)C1_52  fourier(K = 13)S1_52
      -0.8934               -0.1121               -0.2300
s.e.   0.0132                0.0123                0.0122
      fourier(K = 13)C2_52  fourier(K = 13)S2_52
                    0.0420                0.0317
s.e.                0.0099                0.0099
      fourier(K = 13)C3_52  fourier(K = 13)S3_52
                    0.0832                0.0346
s.e.                0.0094                0.0094
      fourier(K = 13)C4_52  fourier(K = 13)S4_52
                    0.0185                0.0398
s.e.                0.0092                0.0092
      fourier(K = 13)C5_52  fourier(K = 13)S5_52
                   -0.0315                0.0009
s.e.                0.0091                0.0091
      fourier(K = 13)C6_52  fourier(K = 13)S6_52
                   -0.0522                 0.000
s.e.                0.0090                 0.009
      fourier(K = 13)C7_52  fourier(K = 13)S7_52
                   -0.0173                0.0053
s.e.                0.0090                0.0090
      fourier(K = 13)C8_52  fourier(K = 13)S8_52
                    0.0074                0.0048
s.e.                0.0090                0.0090
      fourier(K = 13)C9_52  fourier(K = 13)S9_52
                   -0.0024               -0.0035
s.e.                0.0090                0.0090
      fourier(K = 13)C10_52  fourier(K = 13)S10_52
                     0.0151                -0.0037
s.e.                 0.0090                 0.0090
      fourier(K = 13)C11_52  fourier(K = 13)S11_52
                    -0.0144                 0.0191
s.e.                 0.0090                 0.0090
      fourier(K = 13)C12_52  fourier(K = 13)S12_52
                    -0.0227                -0.0052
s.e.                 0.0090                 0.0090
      fourier(K = 13)C13_52  fourier(K = 13)S13_52  intercept
                    -0.0035                 0.0038     0.0014
s.e.                 0.0090                 0.0090     0.0007

sigma^2 estimated as 0.06168:  log likelihood=-21.96
AIC=101.93   AICc=103.24   BIC=253.04
```

---
## Example: weekly gasoline products

```r
forecast(fit, h = "3 years") %>%
  autoplot(us_gasoline)
```