Choice and Forecasting: Week 11

Robert W. Walker

Housekeeping

All of the data for today can be acquired using

load(url("https://github.com/robertwwalker/xaringan/raw/main/CMF-Week-11/data/FullWorkspace.RData"))

Plan

Finish last time’s work
Regression [on the quick]
Exponential Smoothing Models
ARIMA Models

Decomposition forecasting

The decomposition_model function allows for a separate approach to the seasonal and seasonally adjusted components. The latter is often combined with a model covered later including Holt’s method or ARIMA models.

Two Examples

us_retail_employment <- us_employment %>%
    filter(year(Month) >= 1990, Title == "Retail Trade")
fit_dcmp <- us_retail_employment %>%
    model(stlf = decomposition_model(STL(Employed ~ trend(window = 7), robust = TRUE),
        NAIVE(season_adjust)))
fit_dcmp %>%
    forecast() %>%
    autoplot(us_retail_employment) + labs(y = "Number of people", title = "US retail employment")

fit_dcmp %>%
    gg_tsresiduals()

fit_dcmp <- us_retail_employment %>%
    model(stlf = decomposition_model(STL(Employed ~ trend(window = 7), robust = TRUE),
        RW(season_adjust)))
fit_dcmp %>%
    forecast() %>%
    autoplot(us_retail_employment) + labs(y = "Number of people", title = "US retail employment")

fit_dcmp %>%
    gg_tsresiduals()

Evaluating Forecast Accuracy of Points

On training and test [usually 20%]
- filter and slice
Scale dependent errors
- MAE: mean absolute error
- RMSE: root mean squared error
Percentage errors
- MAPE: mean absolute percentage error
- sMAPE: symmetric MAPE
Scaled errors [def in 5.8]
- MASE: mean absolute scaled error
- RMSSE: root mean square scaled error

accuracy

Evaluating Accuracy of Distributions

accuracy(., list(...))

Quantile scores quantile_score
Winkler scores winkler_score
Continuous ranked probability scores CRPS
Skill scores skill_score() with a statistic for comparison.

Time Series Cross-Validation

Multiple training and test sets based on cross-validation ideas. NB: Dates are discontinuous.

G22 <- tidyquant::tq_get("GOOG", from = "2021-01-01") %>%
    select(date, symbol, close) %>%
    mutate(Date = row_number()) %>%
    as_tsibble(index = Date)
G22_tr <- G22 %>%
    stretch_tsibble(.init = 90, .step = 1) %>%
    relocate(date, symbol, .id)
G22_tr %>%
    model(rw = RW(close ~ drift())) %>%
    forecast(h = 1) %>%
    accuracy(G22)

# A tibble: 1 × 10
  .model .type     ME  RMSE   MAE    MPE  MAPE  MASE RMSSE    ACF1
  <chr>  <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>   <dbl>
1 rw     Test  -0.221  2.47  1.82 -0.194  1.50  1.05  1.04 -0.0402

Judgmental Forecasts

Limitations
Key principles
Delphi method
Analogy
Scenarios
New product forecasting
Judgmental adjustments

Key Principles

Set the forecasting task clearly and concisely
Implement a systematic approach
Document and justify
Systematically evaluate forecasts
Segregate forecasters and users

Delphi

Analogies

Structured Analogies

Adjustments

Use them sparingly.
Make them structured.

The Linear Model

y_{t} = \beta_{0} + \beta_{1}x_{t} + \epsilon_{t}

A Simplified Example

load("USEmployment.RData")
us_employment %>%
    data.frame() %>%
    group_by(Series_ID) %>%
    summarise(Title = first(Title)) %>%
    mutate(series_id = Series_ID) %>%
    ungroup() %>%
    select(-Series_ID) -> Names.List
US.Employment.T <- left_join(US.Employment, Names.List, by = c(series_id = "series_id")) %>%
    mutate(YM = yearmonth(date)) %>%
    rename(Employed = value) %>%
    as_tsibble(., index = YM, key = Title)
USET <- US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Retail Trade")) %>%
    as_tsibble(., index = YM, key = Title)

A Seasonal Linear Model

y_{t} = Month_{t} + \epsilon The fitted/predicted value is \hat{y} = \hat{\alpha} + \hat{\beta}_{M}*Month_{t} and the residual is:

\hat{e} = y - \hat{y} or expanding and distributing: \hat{e} = y - \hat{\alpha} - \hat{\beta}_{M}*Month_{t}

A Model

April is the baseline.

USRTM <- USET %>%
    mutate(Month = month(YM, label = TRUE)) %>%
    model(TSLM(Employed ~ as.character(Month)))
USRTM %>%
    report()

Series: Employed 
Model: TSLM 

Residuals:
   Min     1Q Median     3Q    Max 
 -2025   -376    225    630   1201 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)            14485.25     160.38   90.32  < 2e-16 ***
as.character(Month)Aug   257.11     226.81    1.13  0.25772    
as.character(Month)Dec   871.74     226.81    3.84  0.00014 ***
as.character(Month)Feb   -20.64     226.81   -0.09  0.92753    
as.character(Month)Jan   188.11     225.03    0.84  0.40373    
as.character(Month)Jul   241.95     226.81    1.07  0.28680    
as.character(Month)Jun   230.43     226.81    1.02  0.31032    
as.character(Month)Mar     1.29     226.81    0.01  0.99548    
as.character(Month)May   118.56     226.81    0.52  0.60147    
as.character(Month)Nov   687.00     226.81    3.03  0.00263 ** 
as.character(Month)Oct   318.75     226.81    1.41  0.16077    
as.character(Month)Sep   185.33     226.81    0.82  0.41440    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 893 on 361 degrees of freedom
Multiple R-squared: 0.0798, Adjusted R-squared: 0.0518
F-statistic: 2.85 on 11 and 361 DF, p-value: 0.001

`augment`

augment(USRTM) %>%
    ggplot(aes(x = YM)) + geom_line(aes(y = Employed, colour = "Data")) + geom_line(aes(y = .fitted,
    colour = "Fitted")) + scale_color_manual(values = c(Data = "black", Fitted = "red"))

The residual

USRTM %>%
    gg_tsresiduals()

A Trend?

fit_USET <- USET %>%
    mutate(Month = as.character(month(YM, label = TRUE))) %>%
    model(TSLM(Employed ~ trend() + Month))

fit_USET %>%
    report()

Series: Employed 
Model: TSLM 

Residuals:
   Min     1Q Median     3Q    Max 
 -2497   -434     81    443   1052 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 13316.60     112.44  118.43  < 2e-16 ***
trend()         6.35       0.27   23.51  < 2e-16 ***
MonthAug      231.70     142.64    1.62     0.11    
MonthDec      820.93     142.65    5.75  1.9e-08 ***
MonthFeb       -7.94     142.64   -0.06     0.96    
MonthJan      169.06     141.52    1.19     0.23    
MonthJul      222.89     142.64    1.56     0.12    
MonthJun      217.73     142.64    1.53     0.13    
MonthMar        7.64     142.63    0.05     0.96    
MonthMay      112.21     142.63    0.79     0.43    
MonthNov      642.54     142.65    4.50  9.0e-06 ***
MonthOct      280.64     142.64    1.97     0.05 *  
MonthSep      153.57     142.64    1.08     0.28    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 562 on 360 degrees of freedom
Multiple R-squared: 0.637,  Adjusted R-squared: 0.625
F-statistic: 52.7 on 12 and 360 DF, p-value: <2e-16

Residuals

fit_USET %>%
    gg_tsresiduals()

Diagnostics for residuals

Seek to falsify two desirable characteristics:

Residuals should be uncorrelated [remaining time series information]
Residuals should have mean zero [otherwise bias – systematically wrong].

Additional desiderata include: Constant variance and normalcy though the latter implies the former.

How Does it Look?

augment(fit_USET) %>%
    select(YM, Employed, .fitted) %>%
    pivot_longer(c(Employed, .fitted)) %>%
    autoplot()

Knots

fit_USET2 <- USET %>%
    mutate(Month = as.character(month(YM, label = TRUE))) %>%
    model(TSLM(Employed ~ trend(knots = yearmonth(c("2001-09", "2008-10", "2020-03"))) +
        Month))

augment(fit_USET2) %>%
    select(YM, Employed, .fitted) %>%
    pivot_longer(c(Employed, .fitted)) %>%
    autoplot()

fit_USET2 %>%
    gg_tsresiduals()

This Is Cheating

fit_USET2 <- USET %>%
    mutate(Year = as.character(year(YM)), Month = as.character(month(YM, label = TRUE))) %>%
    model(TSLM(Employed ~ Year + Month))

fit_USET2 %>%
    report()

Series: Employed 
Model: TSLM 

Residuals:
   Min     1Q Median     3Q    Max 
 -1477    -45      5     55    766 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 12930.29      53.32  242.52  < 2e-16 ***
Year1991     -286.03      64.78   -4.42  1.4e-05 ***
Year1992     -354.45      64.78   -5.47  8.8e-08 ***
Year1993     -161.91      64.78   -2.50   0.0129 *  
Year1994      308.46      64.78    4.76  2.9e-06 ***
Year1995      714.59      64.78   11.03  < 2e-16 ***
Year1996      960.43      64.78   14.83  < 2e-16 ***
Year1997     1206.93      64.78   18.63  < 2e-16 ***
Year1998     1427.48      64.78   22.04  < 2e-16 ***
Year1999     1788.32      64.78   27.61  < 2e-16 ***
Year2000     2098.04      64.78   32.39  < 2e-16 ***
Year2001     2056.84      64.78   31.75  < 2e-16 ***
Year2002     1843.42      64.78   28.46  < 2e-16 ***
Year2003     1736.08      64.78   26.80  < 2e-16 ***
Year2004     1877.15      64.78   28.98  < 2e-16 ***
Year2005     2098.93      64.78   32.40  < 2e-16 ***
Year2006     2172.87      64.78   33.54  < 2e-16 ***
Year2007     2339.96      64.78   36.12  < 2e-16 ***
Year2008     2103.23      64.78   32.47  < 2e-16 ***
Year2009     1342.36      64.78   20.72  < 2e-16 ***
Year2010     1260.46      64.78   19.46  < 2e-16 ***
Year2011     1488.07      64.78   22.97  < 2e-16 ***
Year2012     1661.25      64.78   25.64  < 2e-16 ***
Year2013     1899.17      64.78   29.32  < 2e-16 ***
Year2014     2177.82      64.78   33.62  < 2e-16 ***
Year2015     2425.35      64.78   37.44  < 2e-16 ***
Year2016     2645.97      64.78   40.85  < 2e-16 ***
Year2017     2660.14      64.78   41.06  < 2e-16 ***
Year2018     2600.68      64.78   40.15  < 2e-16 ***
Year2019     2434.29      64.78   37.58  < 2e-16 ***
Year2020     1678.04      64.78   25.90  < 2e-16 ***
Year2021     2072.78     167.39   12.38  < 2e-16 ***
MonthAug      257.11      40.30    6.38  6.0e-10 ***
MonthDec      871.74      40.30   21.63  < 2e-16 ***
MonthFeb      -20.64      40.30   -0.51   0.6089    
MonthJan      171.93      40.30    4.27  2.6e-05 ***
MonthJul      241.95      40.30    6.00  5.1e-09 ***
MonthJun      230.43      40.30    5.72  2.4e-08 ***
MonthMar        1.29      40.30    0.03   0.9745    
MonthMay      118.56      40.30    2.94   0.0035 ** 
MonthNov      687.00      40.30   17.05  < 2e-16 ***
MonthOct      318.75      40.30    7.91  4.0e-14 ***
MonthSep      185.33      40.30    4.60  6.1e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 159 on 330 degrees of freedom
Multiple R-squared: 0.973,  Adjusted R-squared: 0.97
F-statistic:  288 on 42 and 330 DF, p-value: <2e-16

augment(fit_USET2) %>%
    select(YM, Employed, .fitted) %>%
    pivot_longer(c(Employed, .fitted)) %>%
    autoplot()

fit_USET2 %>%
    gg_tsresiduals()

Back to the Story

The core ideas carry over from any regression context.

y_{t} = \alpha + \beta*X_{t} + \epsilon_{t}

with data subscripted by time. Patterns over time will be key but there is nothing special except that y will be a linear function of x with all the pattern inheritance that this implies.

The first part reviews linear regression. Least squares is the core tool for estimation.

Evaluation

fit %>% gg_tsresiduals() and augment(fit) %>% features(.innov, ljung_box, lag = ??, dof = ??) for time series information in residuals.
plot of .innov or .resid against the predictors. This many require joining the residuals to the original tsibble, e.g. data %>% left_join(residuals(fit), by="Index").
plot of residuals [.resid] and fitted.values [.fitted].
outliers and influence are always worthwhile to consider. Regular regression diagnostics are relevant here. It is just a tibble with a index: timetk.

Handy predictors

trend()
season() the dummy/binary variable month(label=TRUE), weekday and the trap
interventions, trading days, holidays, and distributed lags.
the Fourier series.

Predictor Selection

Fit criterion. A reference.

Best subsets and stepwise.

Inference and stepwise.

Forecasting

ex ante and and ex post dynamic models come up in chapter 10.

intervals and their dynamism merit attention.

Forecasting and Scenarios

Take the model of quarterly consumption as a function of income.

fit_cons <- us_change %>%
    model(TSLM(Consumption ~ Income))
new_cons <- scenarios(`Average increase` = new_data(us_change, 4) %>%
    mutate(Income = mean(us_change$Income)), `Extreme increase` = new_data(us_change,
    4) %>%
    mutate(Income = c(mean(us_change$Income), mean(us_change$Income) + 3, mean(us_change$Income) +
        6, mean(us_change$Income) + 9)), names_to = "Scenario")
fcast <- forecast(fit_cons, new_cons)
us_change %>%
    autoplot(Consumption) + autolayer(fcast) + labs(y = "% change in US consumption")

Knots and their Importance

This section points out splines and ties.

Correlation and Causation

ARE NOT THE SAME.

A Bit More on Linear Models

Nonlinearity; they argue against nonlinear trends. So now what? Knots. Trends are not always monotonic. Witness the example of population in Afghanistan. NB: But this probably gets unrealistic pretty fast.

global_economy %>%
    filter(Country == "Afghanistan") %>%
    autoplot(Population/1e+06) + labs(y = "Population (millions)") + geom_ribbon(aes(xmin = 1979.98,
    xmax = 1989.13), fill = "pink", alpha = 0.4) + annotate("text", x = 1984.5,
    y = 10, label = "Soviet-Afghan war", col = "red", size = 3)

Compare

fit <- global_economy %>%
    filter(Country == "Afghanistan") %>%
    model(linear = TSLM(Population ~ trend()), piecewise = TSLM(Population ~
        trend(knots = c(1980, 1989))))
fit %>%
    report()

Compare

# A tibble: 2 × 16
  Country     .model r_squ…¹ adj_r…²  sigma2 stati…³  p_value    df log_lik
  <fct>       <chr>    <dbl>   <dbl>   <dbl>   <dbl>    <dbl> <int>   <dbl>
1 Afghanistan linear   0.838   0.835 1.02e13    290. 8.37e-24     2   -950.
2 Afghanistan piece…   0.999   0.999 9.05e10  12929. 4.34e-77     4   -812.
# … with 7 more variables: AIC <dbl>, AICc <dbl>, BIC <dbl>, CV <dbl>,
#   deviance <dbl>, df.residual <int>, rank <int>, and abbreviated
#   variable names ¹r_squared, ²adj_r_squared, ³statistic

Handling Multiple Models

mable stores models. The rows are given by the key and the columns are given by the model names. For diagnostics on a particular model/subset, we will need a filter/select combination.

global_economy %>%
  model(
    linear = TSLM(Population ~ trend()),
    piecewise = TSLM(Population ~ trend(knots = c(1980, 1989)))
  ) %>%
  filter(Country=="Afghanistan") %>%
  select(piecewise) %>%
  gg_tsresiduals()

Residual Assessment

fit1 <- global_economy %>%
    filter(Country == "Afghanistan") %>%
    model(linear = TSLM(Population ~ trend()))
gg_tsresiduals(fit1)

Residual Assessment

fit2 <- global_economy %>%
    filter(Country == "Afghanistan") %>%
    model(piecewise = TSLM(Population ~ trend(knots = c(1980, 1989))))
gg_tsresiduals(fit2)

Residual Assessment

Fit Assessment

Forecasts?

fc <- fit %>%
    forecast(h = "5 years")
autoplot(fc) + autolayer(filter(global_economy %>%
    filter(Country == "Afghanistan")), Population)

Simple Exponential Smoothing

Appropriate for absence of seasons and trends. We could use naive (most recent observation receives all the weight, \alpha=1) and average (all values equally weighted, \alpha=0) methods for such data. Simple exponential smoothing occupies the space between with parameter \alpha such that

\hat{y}_{T+1|T} = \alpha y_{T} + \alpha(1 - \alpha)y_{T-1} + \alpha(1-\alpha)^{2}y_{T-2} + \ldots + \alpha(1-\alpha)^{k}y_{T-k}

Weighted Averages and the Component Form

Describe the outcome as two separate but combineable forms: a Forecast equation \hat{y}_{t+h|t} = S_{t} and a Smoothing Equation S_{t} = \alpha y_{t} + (1-\alpha) S_{t-1}.

If we combine them, we get:

\hat{y}_{t+h|t} = \alpha y_{t} + (1-\alpha) S_{t-1}

The Model: `ETS`

the error component (either M multiplicative or A additive)

the trend

the season

Some comments on options – the optimization method.

Holt’s Trends

Describe the outcome as three separate but combinable forms: - a Forecast equation \hat{y}_{t+h|t} = S_{t} + hb_{t}

a Level Equation S_{t} = \alpha y_{t} + (1-\alpha)(S_{t-1} + b_{t-1}) and
a trend equation b_{t} = \beta^{*}(S_{t} - S_{t-1}) + (1-\beta^{*})b_{t-1}

If we combine them, we get: \hat{y}_{t+h|t} = \alpha y_{t} + (1-\alpha)(S_{t-1} + b_{t-1}) + h(\beta^{*}(S_{t} - S_{t-1}) + (1-\beta^{*})b_{t-1})

Damping

Flatten the trend into the future. Expand Holt’s trend model with an additional damping parameter – \phi.

\hat{y}_{t+h|t} = S_{t} + (\phi + \phi^2 + \ldots + \phi^{h})b_{t}

S_{t} = \alpha y_{t} + (1-\alpha)(S_{t-1} + \phi b_{t-1})

b_{t} = \beta^{*}(S_{t} - S_{t-1}) + (1-\beta^{*})\phi b_{t-1}

The Holt-Winters Method(s)

Describe the outcome as three separate but [additively] combinable forms: a Forecast equation \hat{y}_{t+h|t} = S_{t} + hb_{t} + s_{t + h - m(k+1)}

a Level Equation S_{t} = \alpha(y_{t} - s_{t-m} + (1-\alpha)(S_{t-1} + b_{t-1}) a trend equation b_{t} = \beta^{*}(S_{t} - S_{t-1}) + (1-\beta^{*})b_{t-1} and a seasonal equation s_{t} = \gamma(y_{t} - S_{t-1} - b_{t-1}) + (1-\gamma)s_{t-m}

Multiplicative?

Holt-Winters

Damping

Add the damping terms in the trend to the previous:

HW-Damp

A Taxonomy [and the code]

A glorious taxonomy for the models that all fits in one simple code expression.

model(ETS(outcome ~ error("A"/"M") 
+ trend(method=c("N","A","Ad")) 
+ season(method = c("N", "A", "M"))))

Model Selection and IC [and Caveats]

NB: Multiplicative errors and negative values [also, next slide]

Information criterion [IC] provide a measure of goodness of fit for models that may differ in number of parameters or underlying form. Minimize it.

ETS will do this automagically if not given guidance.

aus_busines <- tourism %>%
    filter(Purpose == "Business") %>%
    summarise(Trips = sum(Trips)/1000)
fit <- aus_busines %>%
    model(ETS(Trips))
report(fit)

Result

Series: Trips 
Model: ETS(M,N,A) 
  Smoothing parameters:
    alpha = 0.473 
    gamma = 1e-04 

  Initial states:
 l[0]   s[0] s[-1]  s[-2]  s[-3]
 3.96 0.0727  0.39 0.0983 -0.561

  sigma^2:  0.0057

 AIC AICc  BIC 
 164  165  180

What does it look like?

components(fit) %>%
    autoplot() + labs(title = "ETS(M,N,A) components")

Innovations and Residuals [Multiplicative]

fit %>%
    augment() %>%
    select(.innov, .resid) %>%
    pivot_longer(c(.innov, .resid)) %>%
    autoplot()

Forecast Variances

Not all are known analytically, but we can always use distributions of step ahead paths to simulate and summarise them.

A Simplified Example

load(url("https://github.com/robertwwalker/rww-science/raw/master/static/xaringan/CH7HA/USEmployment.RData"))
us_employment %>%
    data.frame() %>%
    group_by(Series_ID) %>%
    summarise(Title = first(Title)) %>%
    mutate(series_id = Series_ID) %>%
    ungroup() %>%
    select(-Series_ID) -> Names.List
US.Employment.T <- left_join(US.Employment, Names.List, by = c(series_id = "series_id")) %>%
    mutate(YM = yearmonth(date)) %>%
    rename(Employed = value) %>%
    as_tsibble(., index = YM, key = Title)
USET <- US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Retail Trade")) %>%
    as_tsibble(., index = YM, key = Title)
USEM <- US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Manufacturing")) %>%
    as_tsibble(., index = YM, key = Title)

A Model

USETS <- USET %>%
    model(ETS(Employed))
USETS %>%
    report()

Series: Employed 
Model: ETS(M,N,A) 
  Smoothing parameters:
    alpha = 0.967 
    gamma = 1e-04 

  Initial states:
  l[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5] s[-6] s[-7] s[-8] s[-9] s[-10]
 13428  584   407  41.5 -89.7 -7.18 -12.9 -19.3  -128  -225  -231   -253
 s[-11]
  -65.9

  sigma^2:  1e-04

 AIC AICc  BIC 
5872 5874 5931

`augment`

augment(USETS) %>%
    select(.innov, .resid) %>%
    pivot_longer(c(.innov, .resid)) %>%
    autoplot()

The residual

USETS %>%
    gg_tsresiduals()

They don’t always end up MNA

USETSM <- USEM %>%
    model(ETS(Employed))
USETSM %>%
    report()

Series: Employed 
Model: ETS(A,Ad,A) 
  Smoothing parameters:
    alpha = 0.836 
    beta  = 0.128 
    gamma = 0.164 
    phi   = 0.929 

  Initial states:
  l[0]  b[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5] s[-6] s[-7] s[-8] s[-9]
 17951 -41.1 24.7  33.6  50.4  80.6    99  17.2  81.3 -37.8 -85.8   -66
 s[-10] s[-11]
  -89.2   -108

  sigma^2:  7864

 AIC AICc  BIC 
5573 5575 5644

USETSM %>%
    augment() %>%
    select(Employed, .fitted) %>%
    pivot_longer(c(Employed, .fitted)) %>%
    autoplot()

The flow:

Stationarity and Differencing
Non-seasonal ARIMA models
Estimation and order selection
ARIMA in R
Forecasting ARIMA models
Seasonal ARIMA models
Comparison: ETS vs. ARIMA

Stationarity and Differencing

Definition:

If Y_{t} is a stationary time series, then for all s, the distribution y_{t}, ..., y_{t+s} doesn’t depend on t.

The implies that - the mean (roughly horizontal), - the variance (constant variance), - and the covariance (no predictable long-term patterns) between elements of Y are constants.

Stationary

library(hrbrthemes)
gafa_stock %>%
    filter(Symbol == "GOOG", year(Date) == 2018) %>%
    autoplot(Close) + ylab("Google closing stock price") + xlab("Day") + theme_ipsum_rc()

Stationary?

gafa_stock %>%
    filter(Symbol == "GOOG", year(Date) == 2018) %>%
    autoplot(difference(Close)) + ylab("Google closing stock price") + xlab("Day") +
    theme_ipsum_rc()

Stationary?

as_tsibble(fma::strikes) %>%
    autoplot(value) + ylab("Number of strikes") + xlab("Year") + theme_ipsum_rc()

Stationary?

as_tsibble(fma::hsales) %>%
    autoplot(value) + labs(x = "Year", y = "Total sales", title = "Sales of new one-family houses, USA") +
    theme_ipsum_rc()

Stationary?

as_tsibble(fma::eggs) %>%
    autoplot(value) + labs(x = "Year", y = "$", title = "Price of a dozen eggs in 1993 dollars") +
    theme_ipsum_rc()

Stationary?

aus_livestock %>%
    filter(Animal == "Pigs", State == "Victoria", year(Month) >= 2010) %>%
    autoplot(Count/1000) + labs(x = "Year", y = "thousands", title = "Number of pigs slaughtered in Victoria") +
    theme_ipsum_rc()

Stationary?

pelt %>%
    autoplot(Lynx) + labs(x = "Year", y = "Number trapped", title = "Annual Canadian Lynx Trappings") +
    theme_ipsum_rc()

Definition

If \{y_t\} is a stationary time series, then for all s, the distribution of (y_t,\dots,y_{t+s}) does not depend on t.

Transformations help to stabilize the variance.

For ARIMA modelling, we also need to stabilize the mean.

Non-stationarity in the mean

Identifying non-stationary series

time plot.
The ACF of stationary data drops to zero relatively quickly
The ACF of non-stationary data decreases slowly.
For non-stationary data, the value of r_1 is often large and positive.

Let’s Apply These

Note the need to rearrangement in stock data….

Google Price

google_2018 <- gafa_stock %>%
    filter(Symbol == "GOOG", year(Date) == 2018) %>%
    mutate(trading_day = row_number()) %>%
    update_tsibble(index = trading_day, regular = TRUE)

google_2018 %>%
    autoplot(Close) + theme_ipsum_rc()

google_2018 %>%
    ACF(Close) %>%
    autoplot() + theme_ipsum_rc()

Google change

gafa_stock %>%
    filter(Symbol == "GOOG", year(Date) == 2018) %>%
    autoplot(difference(Close)) + labs(y = "Change in Google closing stock price ( $USD)",
    x = "Day") + theme_ipsum_rc()

google_2018 %>%
    ACF(difference(Close)) %>%
    autoplot() + theme_ipsum_rc()

Conclusion

The price is unlikely stationary, the change appears to be.

Differencing

Definition:
- The differenced series is the change between each observation in the original series: y'_t = y_t - y_{t-1}.

Differencing helps to stabilize the mean.
The differenced series will have only T-1 values since it is not possible to calculate a difference y_1' for the first observation.

Sometimes one isn’t enough

Instead of the change, we need model the change in change – a second difference.

y^{''}_{t} = y^{'}_{t} - y^{'}_{t - 1}

y^{''}_{t} = (y_t - y_{t-1}) - (y_{t-1}-y_{t-2})

y^{''}_{t} = y_t - 2y_{t-1} +y_{t-2}

y_t'' will have T-2 values.
In practice, it is almost never necessary to go beyond second-order differences.

Seasonal differencing

A seasonal difference is the difference between an observation and the corresponding observation from the previous year. y^{'seasonal}_t = y_t - y_{t-m} where m= number of seasons.

For monthly data m=12.
For quarterly data m=4.

Electricity Generation

The oscillation appears more variable in the latter half than the former half.

usmelec <- fpp2::usmelec %>%
    as_tsibble() %>%
    mutate(Generation = value)
usmelec %>%
    autoplot() + theme_ipsum_rc()

Electricity Generation (log)

usmelec %>%
    autoplot(log(Generation)) + theme_ipsum_rc()

Electricity Generation (seasonal difference and log)

usmelec %>%
    autoplot(log(Generation) %>%
        difference(12)) + theme_ipsum_rc()

Electricity Generation (seasonal difference and log)

Patterns seem to remain.

usmelec %>%
    ACF(log(Generation) %>%
        difference(12)) %>%
    autoplot() + theme_ipsum_rc()

Electricity Generation (seasonal difference and log with difference)

usmelec %>%
    autoplot(log(Generation) %>%
        difference(12) %>%
        difference()) + theme_ipsum_rc()

Electricity Generation (seasonal difference and log with difference)

usmelec %>%
    ACF(log(Generation) %>%
        difference(12) %>%
        difference()) %>%
    autoplot() + theme_ipsum_rc()

Electricity production

Seasonally differenced series is closer to being stationary.
Remaining non-stationarity can be removed with further first difference.

If y'_t = y_t - y_{t-12} denotes seasonally differenced series, then twice-differenced series is

y^*_t = y'_t - y'_{t-1}

y^*_t = (y_t - y_{t-12}) - (y_{t-1} - y_{t-13})

y^*_t = y_t - y_{t-1} - y_{t-12} + y_{t-13}

Seasonal differencing

When both seasonal and first differences are applied \dots

it makes no difference which is done first – the result will be the same.
If seasonality is strong, we recommend that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for further first difference.

It is important that if differencing is used, the differences are interpretable.

Interpretation of differencing

first differences are the change between one observation and the next;
seasonal differences are the change between one year to the next.

But taking lag 3 differences for yearly data, for example, results in a model which cannot be sensibly interpreted.

Unit root tests

Statistical tests can suggest the required order of differencing.

Augmented Dickey Fuller test: null hypothesis is that the data are non-stationary and non-seasonal.
Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: null hypothesis is that the data are stationary and non-seasonal.
Other tests available for seasonal data.

You May Recall from Time Series `features`

KPSS test

Google prices won’t work; differences are indicated.

google_2018 %>%
    features(Close, features = list(unitroot_kpss, unitroot_ndiffs))

# A tibble: 1 × 4
  Symbol kpss_stat kpss_pvalue ndiffs
  <chr>      <dbl>       <dbl>  <int>
1 GOOG       0.573      0.0252      1

Automatically selecting differences

STL decomposition: y_t = T_t+S_t+R_t

Seasonal strength F_s = \max\big(0, 1-\frac{\text{Var}(R_t)}{\text{Var}(S_t+R_t)}\big)

If F_s > 0.64, do one seasonal difference.

usmelec %>%
    mutate(log_gen = log(Generation)) %>%
    features(log_gen, list(unitroot_nsdiffs, feat_stl))

# A tibble: 1 × 10
  nsdiffs trend_…¹ seaso…² seaso…³ seaso…⁴ spikin…⁵ linea…⁶ curva…⁷ stl_e…⁸
    <int>    <dbl>   <dbl>   <dbl>   <dbl>    <dbl>   <dbl>   <dbl>   <dbl>
1       1    0.994   0.943       7       4 1.40e-12    5.65  -0.911   0.207
# … with 1 more variable: stl_e_acf10 <dbl>, and abbreviated variable
#   names ¹trend_strength, ²seasonal_strength_year, ³seasonal_peak_year,
#   ⁴seasonal_trough_year, ⁵spikiness, ⁶linearity, ⁷curvature, ⁸stl_e_acf1

Automatically selecting differences

usmelec %>%
    mutate(log_gen = log(Generation)) %>%
    features(log_gen, unitroot_nsdiffs)

# A tibble: 1 × 1
  nsdiffs
    <int>
1       1

usmelec %>%
    mutate(d_log_gen = difference(log(Generation), 12)) %>%
    features(d_log_gen, unitroot_ndiffs)

# A tibble: 1 × 1
  ndiffs
   <int>
1      1

Try it out with `tourism`

For the tourism dataset, compute the total number of trips and find an appropriate differencing (after transformation if necessary) to obtain stationary data.

tourism %>%
    summarise(Trips = sum(Trips)) %>%
    autoplot() + labs(title = "Total trips in tourism", x = "Quarter", y = "Trips") +
    theme_ipsum_rc()

The Backshift Operator

A very useful notational device is the backward shift operator, B, which is used as follows: B y_{t} = y_{t - 1}

In other words, B, operating on y_{t}, has the effect of shifting the data back one period. Two applications of B to y_{t} shifts the data back two periods: B(By_{t}) = B^{2}y_{t} = y_{t-2}

For monthly data, if we wish to shift attention to “the same month last year”, then B^{12} is used, and the notation is B^{12}y_{t} = y_{t-12}.

Backshift notation

The backward shift operator is convenient for describing the process of differencing.

A first difference can be written as y'_{t} = y_{t} - y_{t-1} = y_t - By_{t} = (1 - B)y_{t}

Note that a first difference is represented by (1 - B).

Similarly, if second-order differences (i.e., first differences of first differences) have to be computed, then:

y''_{t} = y_{t} - 2y_{t - 1} + y_{t - 2} = (1 - B)^{2} y_{t}

Backshift notation

Second-order difference is denoted (1- B)^{2}.
Second-order difference is not the same as a second difference, which would be denoted 1- B^{2};
In general, a dth-order difference can be written as (1 - B)^{d} y_{t}
A seasonal difference followed by a first difference can be written as (1-B)(1-B^m)y_t

Backshift notation

The “backshift” notation is convenient because the terms can be multiplied together to see the combined effect.

(1-B)(1-B^m)y_t = (1 - B - B^m + B^{m+1})y_t

(1-B)(1-B^m)y_t = y_t-y_{t-1}-y_{t-m}+y_{t-m-1}

For monthly data, m=12 and we obtain the same result as earlier.

Non-seasonal ARIMA models

Autoregressive models

Autoregressive (AR) models:

y_{t} = c + \phi_{1}y_{t - 1} + \phi_{2}y_{t - 2} + \cdots + \phi_{p}y_{t - p} + \varepsilon_{t}

where \varepsilon_t is white noise. This is a multiple regression with lagged values of y_t as predictors.

AR(1) model

y_{t} = 18 -0.8 y_{t - 1} + \varepsilon_{t}

\varepsilon_t\sim N(0,1), T=100.

AR(1) model

y_{t} = c + \phi_1 y_{t - 1} + \varepsilon_{t}

When \phi_1=0, y_t is equivalent to WN
When \phi_1=1 and c=0, y_t is equivalent to a RW
When \phi_1=1 and c\ne0, y_t is equivalent to a RW with drift
When \phi_1<0, y_t tends to oscillate between positive and negative values.

AR(2) model

y_t = 8 + 1.3y_{t-1} - 0.7 y_{t-2} + \varepsilon_t

\varepsilon_t\sim N(0,1), \qquad, T=100.

p2

Stationarity conditions

We normally restrict autoregressive models to stationary data, and then some constraints on the values of the parameters are required.

General condition for stationarity

Complex roots of 1-\phi_1 z - \phi_2 z^2 - \dots - \phi_pz^p lie outside the unit circle on the complex plane.

For p=1: -1<\phi_1<1.
For p=2:-1<\phi_2<1\qquad \phi_2+\phi_1 < 1 \qquad \phi_2 -\phi_1 < 1.
More complicated conditions hold for p\ge3.
Estimation software takes care of this.

Moving Average (MA) models

Moving Average (MA) models: y_{t} = c + \varepsilon_t + \theta_{1}\varepsilon_{t - 1} + \theta_{2}\varepsilon_{t - 2} + \cdots + \theta_{q}\varepsilon_{t - q}, where \varepsilon_t is white noise.

This is a multiple regression with past errors as predictors. Don’t confuse this with moving average smoothing!

set.seed(2)
p1 <- tsibble(idx = seq_len(100), sim = 20 + arima.sim(list(ma = 0.8), n = 100),
    index = idx) %>%
    autoplot(sim) + ylab("") + ggtitle("MA(1)")
p2 <- tsibble(idx = seq_len(100), sim = arima.sim(list(ma = c(-1, +0.8)), n = 100),
    index = idx) %>%
    autoplot(sim) + ylab("") + ggtitle("MA(2)")

p1 | p2

MA(1) model

y_t = 20 + \varepsilon_t + 0.8 \varepsilon_{t-1}

\varepsilon_t\sim N(0,1) and T=100.

p1

MA(2) model

y_t = \varepsilon_t -\varepsilon_{t-1} + 0.8 \varepsilon_{t-2}

\varepsilon_t\sim N(0,1) and T=100.

p2

MA(\infty) models

It is possible to write any stationary AR( p ) process as an MA( \infty ) process.

Example: AR(1)

y_t = \phi_1y_{t-1} + \varepsilon_t y_t = \phi_1(\phi_1y_{t-2} + \varepsilon_{t-1}) + \varepsilon_t y_t = \phi_1^2y_{t-2} + \phi_1 \varepsilon_{t-1} + \varepsilon_t y_t = \phi_{1}^{3y_{t-3}} + \phi_{1}^{2}\varepsilon_{t-2} + \phi_1 \varepsilon_{t-1} + \varepsilon_t

Provided -1 < \phi_1 < 1: y_t = \varepsilon_t + \phi_1 \varepsilon_{t-1} + \phi_1^2 \varepsilon_{t-2} + \phi_1^3 \varepsilon_{t-3} + \cdots

Invertibility

Any MA( q ) process can be written as an AR( \infty ) process if we impose some constraints on the MA parameters.
Then the MA model is called invertible.
Invertible models have some mathematical properties that make them easier to use in practice.
Invertibility of an ARIMA model is equivalent to forecastability of an ETS model.

Invertibility

General condition for invertibility: Complex roots of 1+\theta_1 z + \theta_2 z^2 + \dots + \theta_qz^q lie outside the unit circle on the complex plane.

For q=1: -1<\theta_1<1.
For q=2: -1<\theta_2<1 \qquad \theta_2+\theta_1 >-1 \qquad \theta_1 -\theta_2 < 1
More complicated conditions hold for q\ge3.
Estimation software takes care of this.

ARMA models

Autoregressive Moving Average models:

y_{t} = c + \phi_{1}y_{t - 1} + \cdots + \phi_{p}y_{t - p} + \theta_{1}\varepsilon_{t - 1} + \cdots + \theta_{q}\varepsilon_{t - q} + \varepsilon_{t}.

Predictors include both lagged values of y_t and lagged errors.
Conditions on coefficients ensure stationarity.
Conditions on coefficients ensure invertibility.

Autoregressive Integrated Moving Average models

Combine ARMA model with differencing.
(1-B)^d y_t follows an ARMA model.

ARIMA models

Autoregressive Integrated Moving Average models

ARIMA( p, d, q ) model

AR: p = order of the autoregressive part
I: d = degree of first differencing involved
MA: q = order of the moving average part.

White noise model: ARIMA(0,0,0)
Random walk: ARIMA(0,1,0) with no constant
Random walk with drift: ARIMA(0,1,0) with \rlap{const.}
AR( p ): ARIMA( p,0,0)
MA( q ): ARIMA(0,0,q )

Backshift notation for ARIMA

ARMA model:

y_{t} = c + \phi_{1}By_{t} + \cdots + \phi_pB^py_{t} + \varepsilon_{t} + \theta_{1}B\varepsilon_{t} + \cdots + \theta_qB^q\varepsilon_{t}

or

y_{t} = (1-\phi_1B - \cdots - \phi_p B^p) y_t = c + (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t

ARIMA(1,1,1) model

\underbrace{(1 - \phi_{1} B)}_{\text{AR(1)}} \underbrace{(1 - B)}_{\text{First Difference}} y_{t} = c + \underbrace{(1 + \theta_{1} B) \varepsilon_{t}}_{\text{MA(1)}}

Written out: y_t = c + y_{t-1} + \phi_1 y_{t-1}- \phi_1 y_{t-2} + \theta_1\varepsilon_{t-1} + \varepsilon_t

R model

Intercept form

(1-\phi_1B - \cdots - \phi_p B^p) y_t' = c + (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t

Mean form

(1-\phi_1B - \cdots - \phi_p B^p)(y_t' - \mu) = (1 + \theta_1 B + \cdots + \theta_q B^q)\varepsilon_t

y_t' = (1-B)^d y_t
\mu is the mean of y_t'.
c = \mu(1-\phi_1 - \cdots - \phi_p ).
fable uses intercept form

US consumption expenditure

us_change %>%
    autoplot(Consumption) + xlab("Year") + ylab("Quarterly percentage change") +
    ggtitle("US consumption") + theme_ipsum_rc()

US personal consumption

fit <- us_change %>%
    model(arima = ARIMA(Consumption ~ PDQ(0, 0, 0)))
report(fit)

Series: Consumption 
Model: ARIMA(1,0,3) w/ mean 

Coefficients:
        ar1     ma1     ma2     ma3  constant
      0.573  -0.362  0.0925  0.1934    0.3160
s.e.  0.150   0.161  0.0787  0.0824    0.0371

sigma^2 estimated as 0.3334:  log likelihood=-170
AIC=352   AICc=352   BIC=372

ARIMA(1,0,3) model:

y_t = 0.316 + 0.573y_{t-1} -0.362 \varepsilon_{t-1} + 0.0925 \varepsilon_{t-2} + 0.193 \varepsilon_{t-3} + \varepsilon_{t}

where \varepsilon_t is white noise with a standard deviation of 0.577 = \sqrt{0.333}.

US personal consumption

fit %>%
    forecast(h = 10) %>%
    autoplot(tail(us_change, 80)) + theme_ipsum_rc()

Understanding ARIMA models

If c=0 and d=0, the long-term forecasts will go to zero.
If c=0 and d=1, the long-term forecasts will go to a non-zero constant.
If c=0 and d=2, the long-term forecasts will follow a straight line.
If c\ne0 and d=0, the long-term forecasts will go to the mean of the data.
If c\ne0 and d=1, the long-term forecasts will follow a straight line.
If c\ne0 and d=2, the long-term forecasts will follow a quadratic trend.

Understanding ARIMA models

Forecast variance and d

The higher the value of d, the more rapidly the prediction intervals increase in size.
For d=0, the long-term forecast standard deviation will go to the standard deviation of the historical data.

Cyclic behaviour

For cyclic forecasts, p\ge2 and some restrictions on coefficients are required.
If p=2, we need \phi_1^2+4\phi_2<0. Then average cycle of length

(2\pi)/\left[\text{arccos}(-\phi_1(1-\phi_2)/(4\phi_2))\right].

Estimation and order selection

ML Estimation

Having identified the model order, we need to estimate the parameters c, \phi_1,\dots,\phi_p, \theta_1,\dots,\theta_q.

MLE is very similar to least squares estimation obtained by minimizing \sum_{t-1}^T e_t^2
The ARIMA() function allows CLS or MLE estimation.
Non-linear optimization must be used in either case.
Different software will give different estimates.

Partial autocorrelations

Partial autocorrelations measure relationship between y_{t} and y_{t - k}, when the effects of other time lags 1, 2, 3, \dots, k - 1 are removed.

\alpha_k = \text{kth partial autocorrelation coefficient}

\alpha_k = \text{equal to the estimate of } \phi_k \text{ in regression }

y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_k y_{t-k}.

Varying number of terms on RHS gives \alpha_k for different values of k.
\alpha_1=\rho_1
- same critical values of \pm 1.96/\sqrt{T} as for ACF.
Last significant \alpha_k indicates the order of an AR model.

Mink trapping

mink <- as_tsibble(fma::mink)
mink %>%
    autoplot(value) + xlab("Year") + ylab("Minks trapped (thousands)") + ggtitle("Annual number of minks trapped") +
    theme_ipsum_rc()

Mink trapping

p1 <- mink %>%
    ACF(value) %>%
    autoplot()
p2 <- mink %>%
    PACF(value) %>%
    autoplot()
p1 | p2

Mink trapping

mink %>%
    gg_tsdisplay(value, plot_type = "partial")

ACF and PACF interpretation

AR(1)

\rho_k = \phi_1^k \qquad \text{ for } k=1,2,\dots

\alpha_1 = \phi_1 \qquad\alpha_k = 0\qquad\text{for} k=2,3,\dots

So we have an AR(1) model when

autocorrelations exponentially decay
there is a single significant partial autocorrelation.

ACF and PACF interpretation

AR(p)

ACF dies out in an exponential or damped sine-wave manner
PACF has all zero spikes beyond the pth spike

So we have an AR(p) model when

the ACF is exponentially decaying or sinusoidal
there is a significant spike at lag p in PACF, but none beyond p

ACF and PACF interpretation

MA(1)

\rho_1 = \theta_1\qquad \rho_k = 0 \qquad \text{ for } k=2,3,\dots

\alpha_k = -(-\theta_1)^k

So we have an MA(1) model when

the PACF is exponentially decaying and
there is a single significant spike in ACF

ACF and PACF interpretation

MA( q )

PACF dies out in an exponential or damped sine-wave manner
ACF has all zero spikes beyond the qth spike

So we have an MA( q ) model when

the PACF is exponentially decaying or sinusoidal
there is a significant spike at lag q in ACF, but none beyond q

Information criteria

Akaike’s Information Criterion (AIC): \text{AIC} = -2 \log(L) + 2(p+q+k+1), where L is the likelihood of the data, and k=1 if c\ne0 and k=0 if c=0

Corrected AIC: \text{AICc} = \text{AIC} + \displaystyle\frac{2(p+q+k+1)(p+q+k+2)}{T-p-q-k-2}.

Bayesian Information Criterion: \text{BIC} = \text{AIC} + [\log(T)-2](p+q+k+1).

Good models are obtained by minimizing either the AIC, AICc or BIC. HA prefer to use the AICc

ARIMA modelling in R

How does ARIMA() work?

A non-seasonal ARIMA process

\phi(B)(1-B)^dy_{t} = c + \theta(B)\varepsilon_t

Need to select appropriate orders: p,q, d

Hyndman and Khandakar (JSS, 2008) algorithm:

Select no. differences d and D via KPSS test and seasonal strength measure.
Select p,q by minimising AICc.
Use stepwise search to traverse model space.

How does ARIMA() work?

\text{AICc} = -2 \log(L) + 2(p+q+k+1)\left[1 + \frac{(p+q+k+2)}{T-p-q-k-2}\right].

where L is the maximised likelihood fitted to the differenced data, k=1 if c\neq 0 and k=0 otherwise.

Step 1: Select current model (with smallest AICc) from:
ARIMA(2,d,2)
ARIMA(0,d,0)
ARIMA(1,d,0)
ARIMA(0,d,1)

Step 2: Consider variations of current model - vary one of p,q, from current model by \pm1; - p,q both vary from current model by \pm1; - Include/exclude c from current model.

Model with lowest AICc becomes current model.

Repeat Step 2 until no lower AICc can be found.

Choosing your own model

web_usage <- as_tsibble(WWWusage)
web_usage %>%
    gg_tsdisplay(value, plot_type = "partial")

Choosing your own model

web_usage %>%
    mutate(diff = difference(value)) %>%
    gg_tsdisplay(diff, plot_type = "partial")

Choosing your own model

fit <- web_usage %>%
    model(arima = ARIMA(value ~ pdq(3, 1, 0)))
report(fit)

Series: value 
Model: ARIMA(3,1,0) 

Coefficients:
        ar1     ar2     ar3
      1.151  -0.661  0.3407
s.e.  0.095   0.135  0.0941

sigma^2 estimated as 9.656:  log likelihood=-252
AIC=512   AICc=512   BIC=522

Choosing your own model

web_usage %>%
    model(ARIMA(value ~ pdq(d = 1))) %>%
    report()

Series: value 
Model: ARIMA(1,1,1) 

Coefficients:
         ar1     ma1
      0.6504  0.5256
s.e.  0.0842  0.0896

sigma^2 estimated as 9.995:  log likelihood=-254
AIC=514   AICc=515   BIC=522

Choosing your own model

web_usage %>%
    model(ARIMA(value ~ pdq(d = 1), stepwise = FALSE, approximation = FALSE)) %>%
    report()

Series: value 
Model: ARIMA(3,1,0) 

Coefficients:
        ar1     ar2     ar3
      1.151  -0.661  0.3407
s.e.  0.095   0.135  0.0941

sigma^2 estimated as 9.656:  log likelihood=-252
AIC=512   AICc=512   BIC=522

Choosing your own model

gg_tsresiduals(fit)

Choosing your own model

augment(fit) %>%
    features(.resid, ljung_box, lag = 10, dof = 3)

# A tibble: 1 × 3
  .model lb_stat lb_pvalue
  <chr>    <dbl>     <dbl>
1 arima     4.49     0.722

Choosing your own model

fit %>%
    forecast(h = 10) %>%
    autoplot(web_usage)

Modelling procedure with `ARIMA()`

Plot the data. Identify any unusual observations.
If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
If the data are non-stationary: take first differences of the data until the data are stationary. 4. Examine the ACF/PACF: Is an AR(p) or MA(q) model appropriate?
Try your chosen model(s), and use the AICc to search for a better model.
Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
Once the residuals look like white noise, calculate forecasts.

Automatic modelling procedure with `ARIMA()`

Plot the data. Identify any unusual observations.
If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
Use ARIMA to automatically select a model.
Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
Once the residuals look like white noise, calculate forecasts.

Modelling procedure

link here

Seasonally adjusted electrical equipment

elecequip <- as_tsibble(fpp2::elecequip)
dcmp <- elecequip %>%
    model(STL(value ~ season(window = "periodic"))) %>%
    components() %>%
    select(-.model)
dcmp %>%
    as_tsibble %>%
    autoplot(season_adjust) + xlab("Year") + ylab("Seasonally adjusted new orders index")

Seasonally adjusted electrical equipment

Time plot shows sudden changes, particularly big drop in 2008/2009 due to global economic environment. Otherwise nothing unusual and no need for data adjustments.
No evidence of changing variance, so no Box-Cox transformation.
Data are clearly non-stationary, so we take first differences.

Seasonally adjusted electrical equipment

dcmp %>%
    mutate(diff = difference(season_adjust)) %>%
    gg_tsdisplay(diff, plot_type = "partial")

Seasonally adjusted electrical equipment

PACF is suggestive of AR(3). So initial candidate model is ARIMA(3,1,0). No other obvious candidates.
Fit ARIMA(3,1,0) model along with variations: ARIMA(4,1,0), ARIMA(2,1,0), ARIMA(3,1,1), etc. ARIMA(3,1,1) has smallest AICc value.

Seasonally adjusted electrical equipment

fit <- dcmp %>%
    model(arima = ARIMA(season_adjust))
report(fit)

Series: season_adjust 
Model: ARIMA(3,1,0) 

Coefficients:
          ar1      ar2     ar3
      -0.3418  -0.0426  0.3185
s.e.   0.0681   0.0725  0.0682

sigma^2 estimated as 9.639:  log likelihood=-494
AIC=996   AICc=996   BIC=1009

Seasonally adjusted electrical equipment

fit <- dcmp %>%
    model(arima = ARIMA(season_adjust, approximation = FALSE))
report(fit)

Series: season_adjust 
Model: ARIMA(3,1,1) 

Coefficients:
         ar1     ar2     ar3     ma1
      0.0044  0.0916  0.3698  -0.392
s.e.  0.2201  0.0984  0.0669   0.243

sigma^2 estimated as 9.577:  log likelihood=-493
AIC=995   AICc=996   BIC=1012

Seasonally adjusted electrical equipment

ACF plot of residuals from ARIMA(3,1,1) model look like white noise.

Seasonally adjusted electrical equipment

augment(fit) %>%
    features(.resid, ljung_box, lag = 24, dof = 4)

# A tibble: 1 × 3
  .model lb_stat lb_pvalue
  <chr>    <dbl>     <dbl>
1 arima     24.0     0.241

Seasonally adjusted electrical equipment

fit %>%
    forecast %>%
    autoplot(dcmp)

Forecasting

Point forecasts

Rearrange ARIMA equation so y_t is on LHS.
Rewrite equation by replacing t by T+h.
On RHS, replace future observations by their forecasts, future errors by zero, and past errors by corresponding residuals.

Start with h=1. Repeat for h=2,3,\dots.

Intervals

Let’s Try….

For the GDP data (from global_economy):

fit a suitable ARIMA model to the logged data for all countries
check the residual diagnostics for Austria;
produce forecasts of your fitted model for Austria.

Seasonal ARIMA models

ARIMA	~\underbrace{(p, d, q)}	\underbrace{(P, D, Q)_{m}}
	{\uparrow}	{\uparrow}
	Non-seasonal part	Seasonal part of
	of the model	of the model

where m = number of observations per year.

Seasonal ARIMA models

E.g., ARIMA (1, 1, 1)(1, 1, 1)_{4} model (without constant)

(1 - \phi_{1}B)(1 - \Phi_{1}B^{4}) (1 - B) (1 - B^{4})y_{t} ~= ~ (1 + \theta_{1}B) (1 + \Theta_{1}B^{4})\varepsilon_{t}.

Common ARIMA models

The US Census Bureau uses the following models most often:

ARIMA(0,1,1)(0,1,1)_m& with log transformation
ARIMA(0,1,2)(0,1,1)_m& with log transformation
ARIMA(2,1,0)(0,1,1)_m& with log transformation
ARIMA(0,2,2)(0,1,1)_m& with log transformation
ARIMA(2,1,2)(0,1,1)_m& with no transformation

Seasonal ARIMA models

The seasonal part of an AR or MA model will be seen in the seasonal lags of the PACF and ACF.

ARIMA(0,0,0)(0,0,1)_{12} will show:

a spike at lag 12 in the ACF but no other significant spikes.
The PACF will show exponential decay in the seasonal lags; that is, at lags 12, 24, 36, .

ARIMA(0,0,0)(1,0,0)_{12} will show:

exponential decay in the seasonal lags of the ACF
a single significant spike at lag 12 in the PACF.

US electricity production

usmelec %>%
    autoplot(log(Generation)) + theme_ipsum_rc()

US electricity production

usmelec %>%
    autoplot(log(Generation) %>%
        difference(12)) + theme_ipsum_rc()

US electricity production

usmelec %>%
    autoplot(log(Generation) %>%
        difference(12) %>%
        difference()) + theme_ipsum_rc()

US electricity production

usmelec %>%
    gg_tsdisplay(log(Generation) %>%
        difference(12) %>%
        difference(), plot_type = "partial")

US electricity production

d=1 and D=1 seems necessary
P=0 and Q=1 suggested by seasonal lags
p=0 and q=3 suggested by non-seasonal lags.

US electricity production

usmelec %>%
    model(arima = ARIMA(log(Generation) ~ pdq(0, 1, 3) + PDQ(0, 1, 1))) %>%
    report()

Series: Generation 
Model: ARIMA(0,1,3)(0,1,1)[12] 
Transformation: log(Generation) 

Coefficients:
          ma1      ma2      ma3     sma1
      -0.4266  -0.2496  -0.0439  -0.8358
s.e.   0.0462   0.0516   0.0430   0.0262

sigma^2 estimated as 0.0006904:  log likelihood=1045
AIC=-2080   AICc=-2080   BIC=-2059

US electricity production

usmelec %>%
    model(arima = ARIMA(log(Generation))) %>%
    report()

Series: Generation 
Model: ARIMA(1,1,1)(2,1,1)[12] 
Transformation: log(Generation) 

Coefficients:
         ar1      ma1    sar1     sar2     sma1
      0.4116  -0.8483  0.0100  -0.1017  -0.8204
s.e.  0.0617   0.0348  0.0561   0.0529   0.0357

sigma^2 estimated as 0.0006841:  log likelihood=1047
AIC=-2082   AICc=-2082   BIC=-2057

US electricity production

fit <- usmelec %>%
    model(arima = ARIMA(log(Generation)))
gg_tsresiduals(fit)

US electricity production

augment(fit) %>%
    features(.resid, ljung_box, lag = 24, dof = 5)

# A tibble: 1 × 3
  .model lb_stat lb_pvalue
  <chr>    <dbl>     <dbl>
1 arima     44.5  0.000799

US electricity production

usmelec %>%
    model(arima = ARIMA(log(Generation))) %>%
    forecast(h = "3 years") %>%
    autoplot(usmelec) + theme_ipsum_rc()

US electricity production

usmelec %>%
    model(arima = ARIMA(log(Generation))) %>%
    forecast(h = "3 years") %>%
    autoplot({
        usmelec %>%
            filter(year(index) > 2010)
    }) + theme_ipsum_rc()

ARIMA vs ETS

Myth that ARIMA models are more general than exponential smoothing.
Linear exponential smoothing models all special cases of ARIMA models.
Non-linear exponential smoothing models have no equivalent ARIMA counterparts.
Many ARIMA models have no exponential smoothing counterparts.
ETS models all non-stationary. Models with seasonality or non-damped trend (or both) have two unit roots; all other models have one unit \rlap{root.}

ARIMA vs ETS

Equivalences

ETS model	ARIMA model	Parameters
ETS(A,N,N)	ARIMA(0,1,1)	\theta_1 = \alpha-1
ETS(A,A,N)	ARIMA(0,2,2)	\theta_1 = \alpha+\beta-2
		\theta_2 = 1-\alpha
ETS(A,A,N)	ARIMA(1,1,2)	\phi_1=\phi
		\theta_1 = \alpha+\phi\beta-1-\phi
		\theta_2 = (1-\alpha)\phi
ETS(A,N,A)	ARIMA(0,0,m)(0,1,0)_m
ETS(A,A,A)	ARIMA(0,1,m+1)(0,1,0)_m
ETS(A,A,A)	ARIMA(1,0,m+1)(0,1,0)_m

Save Stuff

save.image("./data/FullWorkspace.RData")

Choice and Forecasting: Week 11

Housekeeping

Plan

Decomposition forecasting

Two Examples

Evaluating Forecast Accuracy of Points

Evaluating Accuracy of Distributions

Time Series Cross-Validation

Judgmental Forecasts

Key Principles

Delphi

Analogies

Adjustments

The Linear Model

A Simplified Example

A Seasonal Linear Model

A Model

augment

The residual

A Trend?

Residuals

Diagnostics for residuals

How Does it Look?

Knots

This Is Cheating

Back to the Story

Evaluation

Handy predictors

Predictor Selection

Forecasting

Forecasting and Scenarios

Knots and their Importance

Correlation and Causation

A Bit More on Linear Models

Compare

Compare

Handling Multiple Models

Residual Assessment

Residual Assessment

Residual Assessment

Residual Assessment

Fit Assessment

Forecasts?

Simple Exponential Smoothing

Weighted Averages and the Component Form

The Model: ETS

Holt’s Trends

Damping

The Holt-Winters Method(s)

Multiplicative?

Damping

A Taxonomy [and the code]

Model Selection and IC [and Caveats]

Result

What does it look like?

Innovations and Residuals [Multiplicative]

Forecast Variances

A Simplified Example

A Model

augment

The residual

They don’t always end up MNA

The flow:

Stationarity and Differencing

Stationary

Stationary?

Stationary?

Stationary?

Stationary?

Stationary?

Stationary?

Definition

Non-stationarity in the mean

Let’s Apply These

Google Price

Google change

Conclusion

Differencing

Sometimes one isn’t enough

Seasonal differencing

`augment`

The Model: `ETS`

`augment`

You May Recall from Time Series `features`

Try it out with `tourism`

Modelling procedure with `ARIMA()`

Automatic modelling procedure with `ARIMA()`