Choice and Forecasting: Week 10

Robert W. Walker

Some Data for Today and General Considerations

Panel data. Multiple time series are often described as a panel, a cross-section of time series, or a time series of cross-sections. The data structure has two [non-overlapping] indices. Let’s review, and discuss a bit, what exactly we mean.

Extending the Data

fredr is amazing.

US.Employment <- map_dfr(
c(rownames(table(us_employment$Series_ID))), ~fredr::fredr_series_observations(.))
save(US.Employment, file="USEmployment.RData")
load(url("https://github.com/robertwwalker/xaringan/raw/master/CMF-Week-9/USEmployment.RData"))
load("USEmployment.RData")
us_employment %>%
    data.frame() %>%
    group_by(Series_ID) %>%
    summarise(Title = first(Title)) %>%
    mutate(series_id = Series_ID) %>%
    ungroup() %>%
    select(-Series_ID) -> Names.List
US.Employment.T <- left_join(US.Employment, Names.List, by = c(series_id = "series_id")) %>%
    mutate(YM = yearmonth(date)) %>%
    rename(Employed = value) %>%
    as_tsibble(., index = YM, key = Title)

Additional Features

For much of the study of time series, the key issue is one known as stationarity. For now, we will do at least some hand waving to be clarified in chapters 5 and more in 9. But we want to compute things and then build out all the details. Let’s take my new retail employment data.

A Recreation on New Data

EMPN <- US.Employment.T %>%
    filter(YM > yearmonth("1990-01") & Title == "Retail Trade") %>%
    as_tsibble(index = YM)
EMPO <- us_employment %>%
    filter(Title == "Retail Trade" & Month > yearmonth("1990-01")) %>%
    as_tsibble(., index = Month)
Plot1 <- ggplot(EMPN, aes(x = YM, y = Employed)) + geom_line(color = "red") +
    geom_line(data = EMPO, aes(x = Month, y = Employed), inherit.aes = FALSE)
Plot1

Data are Revised Occasionally

library(patchwork)
dcmp <- EMPO %>%
    model(stl = STL(Employed))
Plot2 <- components(dcmp) %>%
    autoplot()
dcmp <- EMPN %>%
    model(stl = STL(Employed))
Plot3 <- components(dcmp) %>%
    autoplot()
Plot1/(Plot2 + Plot3)

Three Sectors

USET <- US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Retail Trade", "Financial Activities",
        "Manufacturing")) %>%
    as_tsibble(., index = YM, key = Title)
USET %>%
    autoplot(Employed)

Retail (season)

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Retail Trade")) %>%
    as_tsibble(., index = YM) %>%
    gg_season(Employed)

Retail (subseries)

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Retail Trade")) %>%
    as_tsibble(., index = YM) %>%
    gg_subseries(Employed)

Retail (lag)

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Retail Trade")) %>%
    as_tsibble(., index = YM) %>%
    gg_lag(Employed)

Manufacturing

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Manufacturing")) %>%
    as_tsibble(., index = YM) %>%
    gg_season(Employed)

Manufacturing

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Manufacturing")) %>%
    as_tsibble(., index = YM) %>%
    gg_subseries(Employed)

Manufacturing

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Manufacturing")) %>%
    as_tsibble(., index = YM) %>%
    gg_lag(Employed)

Financial

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Financial Activities")) %>%
    as_tsibble(., index = YM) %>%
    gg_season(Employed)

Financial

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Financial Activities")) %>%
    as_tsibble(., index = YM) %>%
    gg_subseries(Employed)

Financial

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Financial Activities")) %>%
    as_tsibble(., index = YM) %>%
    gg_lag(Employed)

Features: Summary

The features command is the magic tool for tidy summary and statistics for time series in this index/key format. For example, basic summary

USET %>%
    features(Employed, features = list(mean = mean, min = min, max = max, sd = sd,
        quantile))
# A tibble: 3 × 10
  Title         mean    min    max    sd   `0%`  `25%`  `50%`  `75%` `100%`
  <chr>        <dbl>  <dbl>  <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 Financial …  7767.  6472   8846   640.  6472   7363   7876   8226.  8846 
2 Manufactur… 14554. 11340  17870  2241. 11340  12333. 14219  17088  17870 
3 Retail Tra… 14746. 12548. 16394.  915. 12548. 14336. 14962. 15387. 16394.

Features: Correlation Features

Learning about the time series properties

USET %>%
    features(Employed, features = feat_acf)
# A tibble: 3 × 8
  Title                 acf1 acf10 diff1_…¹ diff1…² diff2…³ diff2…⁴ seaso…⁵
  <chr>                <dbl> <dbl>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 Financial Activities 0.990  8.94   0.283    0.165  -0.313   0.415   0.883
2 Manufacturing        0.995  9.35   0.0466   0.128  -0.499   0.505   0.925
3 Retail Trade         0.951  7.29   0.133    0.377  -0.198   0.305   0.876
# … with abbreviated variable names ¹​diff1_acf1, ²​diff1_acf10,
#   ³​diff2_acf1, ⁴​diff2_acf10, ⁵​season_acf1
USET %>%
    group_by(Title) %>%
    ACF(Employed) %>%
    autoplot()

For Contrast: Ford Returns

library(tidyquant)
Ford <- tq_get("F", from = "2000-01-01")
FordT <- Ford %>%
    as_tsibble(index = date)
FordT %>%
    autoplot(adjusted)

FC <- Ford %>%
    tq_transmute(adjusted, mutate_fun = periodReturn, period = "monthly") %>%
    mutate(YM = yearmonth(date)) %>%
    as_tsibble(., index = YM)
FC %>%
    autoplot(monthly.returns)

Ford’s ACF

The 6/7 and 12/13 patterns are interesting….

library(patchwork)
FC1 <- FC %>%
    ACF(monthly.returns) %>%
    autoplot()
FC2 <- FC %>%
    PACF(monthly.returns) %>%
    autoplot()
FC1 + FC2

Decomposition Features

USET %>%
    features(Employed, feat_stl)
# A tibble: 3 × 10
  Title     trend…¹ seaso…² seaso…³ seaso…⁴ spiki…⁵ linea…⁶ curva…⁷ stl_e…⁸
  <chr>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 Financia…   0.999   0.820       6       3  1.28e1  10603.  -3014.   0.718
2 Manufact…   0.999   0.555       8       3  8.41e3 -39200.   4365.   0.612
3 Retail T…   0.983   0.834      11       3  1.11e5  13098.  -6397.   0.506
# … with 1 more variable: stl_e_acf10 <dbl>, and abbreviated variable
#   names ¹​trend_strength, ²​seasonal_strength_year, ³​seasonal_peak_year,
#   ⁴​seasonal_trough_year, ⁵​spikiness, ⁶​linearity, ⁷​curvature, ⁸​stl_e_acf1

With More Data

NUSET8k <- US.Employment.T %>%
    data.frame() %>%
    group_by(Title) %>%
    summarise(MaxE = max(Employed)) %>%
    arrange(desc(MaxE)) %>%
    filter(MaxE > 8000 & MaxE < 120000)
USET8k <- left_join(NUSET8k, US.Employment.T) %>%
    as_tsibble(., index = YM, key = Title)

An Improvement on the Trend/Season

Features

Features 1

Features 2

The details are at the bottom for other statistics.

library(kableExtra)
USET8k %>%
    features(Employed, feat_stl) %>%
    knitr::kable(format = "html") %>%
    scroll_box(width = "100%", height = "300px")
Title trend_strength seasonal_strength_year seasonal_peak_year seasonal_trough_year spikiness linearity curvature stl_e_acf1 stl_e_acf10
Construction 0.999 0.963 8 2 2.10e+02 53918 -1084.7 0.576 0.526
Durable Goods 0.994 0.215 9 7 3.42e+03 1295 -40394.6 0.750 1.251
Education and Health Services 1.000 0.715 11 7 7.21e+03 217955 58537.7 0.529 0.580
Education and Health Services: Health Care 0.999 0.354 0 4 1.59e+04 46025 -436.7 0.495 0.533
Education and Health Services: Health Care and Social Assistance 0.999 0.313 0 4 6.19e+04 63326 17.6 0.517 0.584
Financial Activities 1.000 0.870 7 4 6.92e-01 78437 -273.1 0.723 0.879
Goods-Producing 0.996 0.812 9 2 1.12e+04 43343 -64790.2 0.734 1.081
Government 1.000 0.981 11 7 3.94e+02 190941 -19815.3 0.599 0.538
Government: Local Government 1.000 0.986 5 7 3.14e+02 96008 -15450.2 0.640 0.692
Government: Local Government Education 1.000 0.996 3 7 3.54e+01 54936 -8479.9 0.538 0.480
Leisure and Hospitality 0.997 0.607 7 4 5.47e+05 136395 22427.3 0.522 0.604
Leisure and Hospitality: Accommodation and Food Services 0.971 0.473 8 4 5.27e+06 31615 -1157.7 0.499 0.541
Leisure and Hospitality: Food Services and Drinking Places 0.975 0.404 6 4 3.26e+06 29889 -377.9 0.476 0.497
Manufacturing 0.997 0.434 9 2 5.19e+03 -8507 -64155.3 0.772 1.262
Private Service-Providing 1.000 0.543 0 4 1.50e+07 906495 115965.6 0.545 0.623
Professional and Business Services 1.000 0.675 11 1 3.62e+03 187091 41544.7 0.626 0.810
Professional and Business Services: Administrative and Support Services 0.995 0.808 11 1 1.84e+04 21351 -8027.1 0.617 0.797
Professional and Business Services: Administrative and Waste Services 0.995 0.810 11 1 1.93e+04 22521 -8025.8 0.617 0.804
Professional and Business Services: Professional and Technical Services 1.000 0.691 2 5 1.81e+02 29125 -1380.0 0.626 0.719
Retail Trade 1.000 0.881 0 4 5.78e+03 135654 -6877.1 0.511 0.473
Trade, Transportation, and Utilities 1.000 0.845 0 4 2.02e+04 211628 -7668.1 0.566 0.583

coef_hurst

A measure of the degree to which adjacent observations depend on one another over time. Generically, this statistic takes values between zero and one with one indicating very high levels of dependence through time.

USET %>%
    features(Employed, coef_hurst)
# A tibble: 3 × 2
  Title                coef_hurst
  <chr>                     <dbl>
1 Financial Activities      1.00 
2 Manufacturing             1.00 
3 Retail Trade              0.999

Middling for Ford

FC %>%
    features(monthly.returns, features = coef_hurst)
# A tibble: 1 × 1
  coef_hurst
       <dbl>
1      0.500

feat_spectral

USET %>%
    features(Employed, feat_spectral)
# A tibble: 3 × 2
  Title                spectral_entropy
  <chr>                           <dbl>
1 Financial Activities            0.263
2 Manufacturing                   0.180
3 Retail Trade                    0.412
FC %>%
    features(monthly.returns, features = feat_spectral)
# A tibble: 1 × 1
  spectral_entropy
             <dbl>
1            0.989

The Absence of Correlation

Ljung-Box modifies the idea in the Box-Pierce statistic for assessing whether or not a given series [or transformation thereof] is essentially uncorrelated. In both cases, we will get to the details next week [chapter 5]. For now, the idea is simply that k squared autocorrelations will sum to a chi-squared distribution with k degrees of freedom. Large correlations reveal dependence.

USET %>%
    features(Employed, features = list(box_pierce, ljung_box))
# A tibble: 3 × 5
  Title                bp_stat bp_pvalue lb_stat lb_pvalue
  <chr>                  <dbl>     <dbl>   <dbl>     <dbl>
1 Financial Activities    365.         0    368.         0
2 Manufacturing           368.         0    371.         0
3 Retail Trade            337.         0    339.         0
FC %>%
    features(monthly.returns, features = list(box_pierce, ljung_box))
# A tibble: 1 × 4
  bp_stat bp_pvalue lb_stat lb_pvalue
    <dbl>     <dbl>   <dbl>     <dbl>
1  0.0387     0.844  0.0391     0.843

feat_pacf

USET %>%
    features(Employed, feat_pacf)
# A tibble: 3 × 5
  Title                pacf5 diff1_pacf5 diff2_pacf5 season_pacf
  <chr>                <dbl>       <dbl>       <dbl>       <dbl>
1 Financial Activities 0.987       0.712       1.00      -0.0555
2 Manufacturing        0.994       0.238       0.791      0.0348
3 Retail Trade         1.08        0.834       1.06      -0.0188
FC %>%
    features(monthly.returns, features = feat_pacf)
# A tibble: 1 × 4
    pacf5 diff1_pacf5 diff2_pacf5 season_pacf
    <dbl>       <dbl>       <dbl>       <dbl>
1 0.00618       0.662        1.23       0.109

Unit Roots

The stationarity issue from earlier is given much attention. Can we reasonably think of characteristics as fixed? There are three means of assessment with details to Chapter 9.

USET %>%
    features(Employed, features = list(unitroot_kpss, unitroot_pp, unitroot_ndiffs,
        unitroot_nsdiffs)) %>%
    knitr::kable(format = "html")
Title kpss_stat kpss_pvalue pp_stat pp_pvalue ndiffs nsdiffs
Financial Activities 4.63 0.01 -1.193 0.100 1 1
Manufacturing 5.68 0.01 -0.938 0.100 1 0
Retail Trade 3.91 0.01 -2.636 0.089 1 1
FC %>%
    features(monthly.returns, features = list(unitroot_kpss, unitroot_pp, unitroot_ndiffs,
        unitroot_nsdiffs))
# A tibble: 1 × 6
  kpss_stat kpss_pvalue pp_stat pp_pvalue ndiffs nsdiffs
      <dbl>       <dbl>   <dbl>     <dbl>  <int>   <int>
1    0.0899         0.1   -16.7      0.01      0       0

Tiling

A reminder

USET %>%
    features(Employed, features = list(var_tiled_mean, var_tiled_var))
# A tibble: 3 × 3
  Title                var_tiled_mean var_tiled_var
  <chr>                         <dbl>         <dbl>
1 Financial Activities          1.02      0.0000411
2 Manufacturing                 1.03      0.0000923
3 Retail Trade                  0.922     0.0136   
FC %>%
    features(monthly.returns, features = list(var_tiled_mean, var_tiled_var))
# A tibble: 1 × 2
  var_tiled_mean var_tiled_var
           <dbl>         <dbl>
1          0.118          2.60

Detecting Shifts

USET %>%
    features(Employed, features = list(shift_level_max, shift_var_max, shift_kl_max)) %>%
    kable(format = "html")
Title shift_level_max shift_level_index shift_var_max shift_var_index shift_kl_max shift_kl_index
Financial Activities 371 229 24037 233 0.299 227
Manufacturing 1559 228 417020 235 0.522 227
Retail Trade 777 226 788931 354 1.841 227
FC %>%
    features(monthly.returns, features = list(shift_level_max, shift_var_max,
        shift_kl_max)) %>%
    kable(format = "html")
shift_level_max shift_level_index shift_var_max shift_var_index shift_kl_max shift_kl_index
0.258 110 0.194 113 37.2 112

Crossings and Flat Spots

USET %>%
    features(Employed, features = list(n_crossing_points, longest_flat_spot)) %>%
    kable(format = "html")
Title n_crossing_points longest_flat_spot
Financial Activities 5 40
Manufacturing 11 52
Retail Trade 31 10
FC %>%
    features(monthly.returns, features = list(n_crossing_points, longest_flat_spot)) %>%
    kable(format = "html")
n_crossing_points longest_flat_spot
131 8

ARCH

What proportion of the current squared residual is explained by the prior squared residual? This reports R^2; if the variance explained is large, volatility is persistent. There is a chi-square statistic also.

USET %>%
    features(Employed, features = stat_arch_lm) %>%
    kable(format = "html")
Title stat_arch_lm
Financial Activities 0.989
Manufacturing 0.972
Retail Trade 0.917
FC %>%
    features(monthly.returns, features = stat_arch_lm) %>%
    kable(format = "html")
stat_arch_lm
0.051

The Box-Cox

USET %>%
    features(Employed, features = guerrero) %>%
    kable(format = "html")
Title lambda_guerrero
Financial Activities 0.948
Manufacturing 1.037
Retail Trade 1.186
FC %>%
    features(monthly.returns, features = guerrero) %>%
    kable(format = "html")
lambda_guerrero
0.609
USET %>%
    features(Employed, features = guerrero)
# A tibble: 3 × 2
  Title                lambda_guerrero
  <chr>                          <dbl>
1 Financial Activities           0.948
2 Manufacturing                  1.04 
3 Retail Trade                   1.19 

Filtered Manufacturing

USET %>%
    filter(Title == "Manufacturing") %>%
    mutate(Filt = box_cox(Employed, 1.0369662)) %>%
    select(YM, Filt, Employed) %>%
    pivot_longer(c(Filt, Employed)) %>%
    autoplot(value)

USET %>%
    filter(Title == "Financial Activities") %>%
    autoplot(box_cox(Employed, 0.9481456))

USET %>%
    filter(Title == "Retail Trade") %>%
    autoplot(box_cox(Employed, 1.1860464))

FC %>%
    features(monthly.returns, features = guerrero)
# A tibble: 1 × 1
  lambda_guerrero
            <dbl>
1           0.609
FC %>%
    autoplot(box_cox(monthly.returns, 0.6857523))

Australian Tourism

The example is great.

Principal Components

Lets walk through this example.

Ex. PC

Principal Components

tourism_features <- tourism %>%
    features(Trips, feature_set(pkgs = "feasts"))
library(broom)
pcs <- tourism_features %>%
    select(-State, -Region, -Purpose) %>%
    prcomp(scale = TRUE) %>%
    augment(tourism_features)
pcs %>%
    ggplot(aes(x = .fittedPC1, y = .fittedPC2, col = Purpose)) + geom_point() +
    theme(aspect.ratio = 1)

The Forecasting Workflow

Diagram

Four Simple Methods

  • MEAN() or the mean method
  • NAIVE() or naive
  • SNAIVE() or seasonal naive
  • RW() or random walk/drift

An Example

# Set training data from 1992 to 2006
train <- aus_production %>%
    filter_index("1992 Q1" ~ "2006 Q4")
# Fit the models
beer_fit <- train %>%
    model(Mean = MEAN(Beer), Naïve = NAIVE(Beer), `Seasonal naïve` = SNAIVE(Beer))
# Generate forecasts for 14 quarters
beer_fc <- beer_fit %>%
    forecast(h = 14)
# Plot forecasts against actual values
beer_fc %>%
    autoplot(train, level = NULL) + autolayer(filter_index(aus_production, "2007 Q1" ~
    .), colour = "black") + labs(y = "Megalitres", title = "Forecasts for quarterly beer production") +
    guides(colour = guide_legend(title = "Forecast"))

A Google stock example

Note the reindexing.

# Re-index based on trading days
google_stock <- gafa_stock %>%
    filter(Symbol == "GOOG", year(Date) >= 2015) %>%
    mutate(day = row_number()) %>%
    update_tsibble(index = day, regular = TRUE)
# Filter the year of interest
google_2015 <- google_stock %>%
    filter(year(Date) == 2015)
# Fit the models
google_fit <- google_2015 %>%
    model(Mean = MEAN(Close), Naïve = NAIVE(Close), Drift = NAIVE(Close ~ drift()))
# Produce forecasts for the trading days in January 2016
google_jan_2016 <- google_stock %>%
    filter(yearmonth(Date) == yearmonth("2016 Jan"))
google_fc <- google_fit %>%
    forecast(new_data = google_jan_2016)
# Plot the forecasts
google_fc %>%
    autoplot(google_2015, level = NULL) + autolayer(google_jan_2016, Close,
    colour = "black") + labs(y = "$US", title = "Google daily closing stock prices",
    subtitle = "(Jan 2015 - Jan 2016)") + guides(colour = guide_legend(title = "Forecast"))

augment

Adds three new columns to data.

  • The fitted value,
  • the residual,
  • and the innovation residual.

The latter two will be identical unless some transformation is used, in which case the residual is in the base term [y] while the innovation is in the transformed metric [w = f(y)].

Google stock price residuals

google_2015 %>%
    model(NAIVE(Close)) %>%
    gg_tsresiduals()

Two portmanteau tests

BP Test

LBQ Test

aug <- google_2015 %>%
    model(NAIVE(Close)) %>%
    augment()
aug %>%
    features(.innov, box_pierce, lag = 10, dof = 0)
# A tibble: 1 × 4
  Symbol .model       bp_stat bp_pvalue
  <chr>  <chr>          <dbl>     <dbl>
1 GOOG   NAIVE(Close)    7.74     0.654
aug %>%
    features(.innov, ljung_box, lag = 10, dof = 0)
# A tibble: 1 × 4
  Symbol .model       lb_stat lb_pvalue
  <chr>  <chr>          <dbl>     <dbl>
1 GOOG   NAIVE(Close)    7.91     0.637

Uncertainty and Intervals

Intervals

google_2015 %>%
    model(NAIVE(Close)) %>%
    forecast(h = 10) %>%
    hilo()
# A tsibble: 10 x 7 [1]
# Key:       Symbol, .model [1]
   Symbol .model         day        Close .mean        `80%`        `95%`
   <chr>  <chr>        <dbl>       <dist> <dbl>       <hilo>       <hilo>
 1 GOOG   NAIVE(Close)   253  N(759, 125)  759. [745, 773]80 [737, 781]95
 2 GOOG   NAIVE(Close)   254  N(759, 250)  759. [739, 779]80 [728, 790]95
 3 GOOG   NAIVE(Close)   255  N(759, 376)  759. [734, 784]80 [721, 797]95
 4 GOOG   NAIVE(Close)   256  N(759, 501)  759. [730, 788]80 [715, 803]95
 5 GOOG   NAIVE(Close)   257  N(759, 626)  759. [727, 791]80 [710, 808]95
 6 GOOG   NAIVE(Close)   258  N(759, 751)  759. [724, 794]80 [705, 813]95
 7 GOOG   NAIVE(Close)   259  N(759, 876)  759. [721, 797]80 [701, 817]95
 8 GOOG   NAIVE(Close)   260 N(759, 1002)  759. [718, 799]80 [697, 821]95
 9 GOOG   NAIVE(Close)   261 N(759, 1127)  759. [716, 802]80 [693, 825]95
10 GOOG   NAIVE(Close)   262 N(759, 1252)  759. [714, 804]80 [690, 828]95

Plotting Forecasts

google_2015 %>%
    model(NAIVE(Close)) %>%
    forecast(h = 10) %>%
    autoplot(google_2015) + labs(title = "Google daily closing stock price",
    y = "$US")

Bootstrap Paths

A bit on the bootstrap. Using bootstrap intervals is integrated into fpp3.

fit <- google_2015 %>%
    model(NAIVE(Close))
sim <- fit %>%
    generate(h = 30, times = 5, bootstrap = TRUE)
sim
# A tsibble: 150 x 5 [1]
# Key:       Symbol, .model, .rep [5]
   Symbol .model         day .rep   .sim
   <chr>  <chr>        <dbl> <chr> <dbl>
 1 GOOG   NAIVE(Close)   253 1      763.
 2 GOOG   NAIVE(Close)   254 1      739.
 3 GOOG   NAIVE(Close)   255 1      737.
 4 GOOG   NAIVE(Close)   256 1      737.
 5 GOOG   NAIVE(Close)   257 1      737.
 6 GOOG   NAIVE(Close)   258 1      761.
 7 GOOG   NAIVE(Close)   259 1      764.
 8 GOOG   NAIVE(Close)   260 1      772.
 9 GOOG   NAIVE(Close)   261 1      780.
10 GOOG   NAIVE(Close)   262 1      777.
# … with 140 more rows

Plot them [can automate with forecast]

fc <- fit %>%
    forecast(h = 30, bootstrap = TRUE)
autoplot(fc, google_2015) + labs(title = "Google daily closing stock price",
    y = "$US")

Transformations with forecasting

Just like smart prediction, the fable package handles the inversion of transformations. The bigger issue is the need for bias-adjustment.

Decomposition forecasting

The decomposition_model function allows for a separate approach to the seasonal and seasonally adjusted components. The latter is often combined with a model covered later including Holt’s method or ARIMA models.

Two Examples

us_retail_employment <- us_employment %>%
    filter(year(Month) >= 1990, Title == "Retail Trade")
fit_dcmp <- us_retail_employment %>%
    model(stlf = decomposition_model(STL(Employed ~ trend(window = 7), robust = TRUE),
        NAIVE(season_adjust)))
fit_dcmp %>%
    forecast() %>%
    autoplot(us_retail_employment) + labs(y = "Number of people", title = "US retail employment")

fit_dcmp %>%
    gg_tsresiduals()

fit_dcmp <- us_retail_employment %>%
    model(stlf = decomposition_model(STL(Employed ~ trend(window = 7), robust = TRUE),
        RW(season_adjust)))
fit_dcmp %>%
    forecast() %>%
    autoplot(us_retail_employment) + labs(y = "Number of people", title = "US retail employment")

fit_dcmp %>%
    gg_tsresiduals()

Evaluating Forecast Accuracy of Points

  • On training and test [usually 20%]
    • filter and slice
  • Scale dependent errors
    • MAE: mean absolute error
    • RMSE: root mean squared error
  • Percentage errors
    • MAPE: mean absolute percentage error
    • sMAPE: symmetric MAPE
  • Scaled errors [def in 5.8]
    • MASE: mean absolute scaled error
    • RMSSE: root mean square scaled error

accuracy

Evaluating Accuracy of Distributions

accuracy(., list(...))

  • Quantile scores quantile_score
  • Winkler scores winkler_score
  • Continuous ranked probability scores CRPS
  • Skill scores skill_score() with a statistic for comparison.

Time Series Cross-Validation

Multiple training and test sets based on cross-validation ideas. NB: Dates are discontinuous.

G22 <- tidyquant::tq_get("GOOG", from = "2021-01-01") %>%
    select(date, symbol, close) %>%
    mutate(Date = row_number()) %>%
    as_tsibble(index = Date)
G22_tr <- G22 %>%
    stretch_tsibble(.init = 90, .step = 1) %>%
    relocate(date, symbol, .id)
G22_tr %>%
    model(rw = RW(close ~ drift())) %>%
    forecast(h = 1) %>%
    accuracy(G22)
# A tibble: 1 × 10
  .model .type     ME  RMSE   MAE    MPE  MAPE  MASE RMSSE    ACF1
  <chr>  <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>   <dbl>
1 rw     Test  -0.258  2.46  1.82 -0.233  1.48  1.05  1.04 -0.0491

Judgmental Forecasts

  1. Limitations
  2. Key principles
  3. Delphi method
  4. Analogy
  5. Scenarios
  6. New product forecasting
  7. Judgmental adjustments

Key Principles

  • Set the forecasting task clearly and concisely
  • Implement a systematic approach
  • Document and justify
  • Systematically evaluate forecasts
  • Segregate forecasters and users

Delphi

Delphi

Analogies

Structured Analogies

Adjustments

  • Use them sparingly.
  • Make them structured.

The Path Forward: Main Models

  1. Develop time series regression models
  2. Exponential smoothing
  3. ARIMA models
  4. Dynamic regression models
  • Hierarchies and groups
  • Advanced methods
  • Practical issues