Choice and Forecasting: Week 9

Robert W. Walker

Getting Started

load(url("https://github.com/robertwwalker/xaringan/raw/master/CMF-Week-9/data/FullWorkspace.RData"))

Autocorrelation and autocovariance

Lag plots and autocorrelation

Example: Beer production

new_production <- aus_production %>%
    filter(year(Quarter) >= 1992)
new_production
# A tsibble: 74 x 7 [1Q]
   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
 1 1992 Q1   443    5777    383   1289       38332   117
 2 1992 Q2   410    5853    404   1501       39774   151
 3 1992 Q3   420    6416    446   1539       42246   175
 4 1992 Q4   532    5825    420   1568       38498   129
 5 1993 Q1   433    5724    394   1450       39460   116
 6 1993 Q2   421    6036    462   1668       41356   149
 7 1993 Q3   410    6570    475   1648       42949   163
 8 1993 Q4   512    5675    443   1863       40974   138
 9 1994 Q1   449    5311    421   1468       40162   127
10 1994 Q2   381    5717    475   1755       41199   159
# … with 64 more rows

Example: Beer production

new_production %>%
    gg_lag(Beer)

Example: Beer production

new_production %>%
    gg_lag(Beer, geom = "point")

Lagged scatterplots

Each graph shows y_t plotted against y_{t-k} for different values of k.

  • The autocorrelations are the correlations associated with these scatterplots.
  • ACF (autocorrelation function):
    • r_1=\text{Correlation}(y_{t}, y_{t-1})
    • r_2=\text{Correlation}(y_{t}, y_{t-2})
    • r_3=\text{Correlation}(y_{t}, y_{t-3})
    • etc.

Autocorrelation

Covariance and correlation: measure extent of linear relationship between two variables (y and X).

Autocovariance and autocorrelation: measure linear relationship between lagged values of a time series y.

We measure the relationship between:

  • y_{t} and y_{t-1}
  • y_{t} and y_{t-2}
  • y_{t} and y_{t-3}
  • etc.

Autocorrelation

We denote the sample autocovariance at lag k by c_k and the sample autocorrelation at lag k by r_k. Then define

c_k = \frac{1}{T}\sum_{t=k+1}^T (y_t-\bar{y})(y_{t-k}-\bar{y}) r_{k} = c_k/c_0

  • r_1 indicates how successive values of y relate to each other
  • r_2 indicates how y values two periods apart relate to each other
  • r_k is the same as the sample correlation between y_t and y_{t-k}.

Autocorrelation

Results for first 9 lags for beer data:

new_production <- aus_production %>%
    filter(year(Quarter) >= 1992)
new_production %>%
    ACF(Beer, lag_max = 9)
# A tsibble: 9 x 2 [1Q]
    lag     acf
  <lag>   <dbl>
1    1Q -0.102 
2    2Q -0.657 
3    3Q -0.0603
4    4Q  0.869 
5    5Q -0.0892
6    6Q -0.635 
7    7Q -0.0542
8    8Q  0.832 
9    9Q -0.108 

Autocorrelation

Results for first 9 lags for beer data:

new_production %>%
    ACF(Beer, lag_max = 9) %>%
    autoplot()
  • Together, the autocorrelations at lags 1, 2, , make up the or ACF.
  • The plot is known as a correlogram

Autocorrelation

new_production %>%
    ACF(Beer) %>%
    autoplot()
  • r_{4} higher than for the other lags. This is due to the seasonal pattern in the data: the peaks tend to be 4 quarters apart and the troughs tend to be 2 quarters apart.
  • r_2 is more negative than for the other lags because troughs tend to be 2 quarters behind peaks.

Trend and seasonality in ACF plots

  • When data have a trend, the autocorrelations for small lags tend to be large and positive.
  • When data are seasonal, the autocorrelations will be larger at the seasonal lags (i.e., at multiples of the seasonal frequency)
  • When data are trended and seasonal, you see a combination of these effects.

Autocorrelation functions

AutoCorrelation Functions by Allison Horst

US retail trade employment

retail <- us_employment %>%
    filter(Title == "Retail Trade", year(Month) >= 1980)
retail %>%
    autoplot(Employed)

US retail trade employment

retail %>%
    ACF(Employed, lag_max = 48) %>%
    autoplot()

Google stock price

google_2015 <- gafa_stock %>%
    filter(Symbol == "GOOG", year(Date) == 2015) %>%
    select(Date, Close)
google_2015
# A tsibble: 252 x 2 [!]
   Date       Close
   <date>     <dbl>
 1 2015-01-02  522.
 2 2015-01-05  511.
 3 2015-01-06  499.
 4 2015-01-07  498.
 5 2015-01-08  500.
 6 2015-01-09  493.
 7 2015-01-12  490.
 8 2015-01-13  493.
 9 2015-01-14  498.
10 2015-01-15  499.
# … with 242 more rows

Google stock price

google_2015 %>%
    autoplot(Close)

Google stock price

google_2015 %>%
    ACF(Close, lag_max = 100)
# A tsibble: 100 x 2 [1]
     lag   acf
   <lag> <dbl>
 1     1 0.982
 2     2 0.959
 3     3 0.937
 4     4 0.918
 5     5 0.901
 6     6 0.883
 7     7 0.865
 8     8 0.849
 9     9 0.834
10    10 0.818
# … with 90 more rows

Google stock price

google_2015 %>%
    ACF(Close, lag_max = 100) %>%
    autoplot()

Which is which?

White noise

Example: White noise

set.seed(30)
wn <- tsibble(t = 1:50, y = rnorm(50), index = t)
wn %>%
    autoplot(y)

White noise data is uncorrelated across time with zero mean and constant variance.

(Technically, we require independence as well.)

Example: White noise

wn %>% ACF(y)
r_{1} r_{2} r_{3} r_{4} r_{5} r_{6} r_{7} r_{8} r_{9} r_{10}
0.014 -0.163 0.163 -0.259 -0.198 0.064 -0.139 -0.032 0.199 -0.024
  • Sample autocorrelations for white noise series.
  • Expect each autocorrelation to be close to zero.
  • Blue lines show 95% critical values.

Sampling distribution of autocorrelations

Sampling distribution of r_k for white noise data is asymptotically N(0,1/T).

  • 95% of all r_k for white noise must lie within \pm 1.96/\sqrt{T}.
  • If this is not the case, the series is probably not WN.
  • Common to plot lines at \pm 1.96/\sqrt{T} when plotting ACF. These are the critical values.

Example: Pigs slaughtered

pigs <- aus_livestock %>%
    filter(State == "Victoria", Animal == "Pigs", year(Month) >= 2014)
pigs %>%
    autoplot(Count/1000) + labs(y = "Thousands", title = "Number of pigs slaughtered in Victoria")

Example: Pigs slaughtered

pigs %>%
    ACF(Count) %>%
    autoplot()

Example: Pigs slaughtered

Monthly total number of pigs slaughtered in the state of Victoria, Australia, from January 2014 through December 2018 (Source: Australian Bureau of Statistics.)

  • Difficult to detect pattern in time plot.
  • ACF shows significant autocorrelation for lag 2 and 12.
  • Indicate some slight seasonality.

These show the series is not a white noise series.

Let’s Try One

You can compute the daily changes in the Google stock price in 2018 using

dgoog <- gafa_stock %>%
    filter(Symbol == "GOOG", year(Date) >= 2018) %>%
    mutate(diff = difference(Close))

Does diff look like white noise?

Chapter 3: Decomposition

Transformations and adjustments

Getting started

library(tidyverse)
library(fpp3)
library(purrr)
library(gganimate)

Filter Australia

global_economy %>%
    filter(Country == "Australia") %>%
    autoplot(GDP)

How to Adjust?

Per capita adjustments

global_economy %>%
    filter(Country == "Australia") %>%
    autoplot(GDP/Population) + hrbrthemes::theme_ipsum_rc()

Have a look….

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

global_economy %>%
    mutate(GDPPC = GDP/Population) %>%
    select(Country, Year, GDPPC) %>%
    top_n(., 10, wt = GDPPC)
# A tsibble: 10 x 3 [1Y]
# Key:       Country [2]
   Country        Year   GDPPC
   <fct>         <dbl>   <dbl>
 1 Liechtenstein  2013 173528.
 2 Liechtenstein  2014 179308.
 3 Liechtenstein  2015 167591.
 4 Liechtenstein  2016 164993.
 5 Monaco         2007 167125.
 6 Monaco         2008 180640.
 7 Monaco         2013 172589.
 8 Monaco         2014 185153.
 9 Monaco         2015 163369.
10 Monaco         2016 168011.

Plot

global_economy %>%
    autoplot(GDP/Population) + guides(color = FALSE) + hrbrthemes::theme_ipsum_rc()

Inflation adjustments

global_economy <- tsibbledata::global_economy
print_retail <- aus_retail %>%
    filter(Industry == "Newspaper and book retailing") %>%
    group_by(Industry) %>%
    index_by(Year = year(Month)) %>%
    summarise(Turnover = sum(Turnover))
aus_economy <- filter(global_economy, Code == "AUS")
print_retail %>%
    left_join(aus_economy, by = "Year") %>%
    mutate(Adj_turnover = Turnover/CPI) %>%
    pivot_longer(c(Turnover, Adj_turnover), names_to = "Type", values_to = "Turnover") %>%
    ggplot(aes(x = Year, y = Turnover)) + geom_line() + facet_grid(vars(Type),
    scales = "free_y") + xlab("Years") + ylab(NULL) + ggtitle("Turnover: Australian print media industry") +
    hrbrthemes::theme_ipsum_rc()

Mathematical transformations

If the data show different variation at different levels of the series, then a transformation can be useful.

Denote original observations as y_1,\dots,y_n and transformed observations as w_1, \dots, w_n.

Transformations
Square root w_t = \sqrt{y_t}
Cube root w_t = \sqrt[3]{y_t}
Logarithm w_t = \log(y_t)

Logarithms, in particular, are useful because they are more interpretable: changes in a log value are relative (percent) changes on the original scale.

Mathematical transformations

food <- aus_retail %>%
    filter(Industry == "Food retailing") %>%
    summarise(Turnover = sum(Turnover))

Mathematical transformations

food %>%
    autoplot(sqrt(Turnover)) + labs(y = "Square root turnover") + hrbrthemes::theme_ipsum_rc()

Mathematical transformations

food %>%
    autoplot(Turnover^(1/3)) + labs(y = "Cube root turnover") + hrbrthemes::theme_ipsum_rc()

Mathematical transformations

food %>%
    autoplot(log(Turnover)) + labs(y = "Log turnover") + hrbrthemes::theme_ipsum_rc()

Mathematical transformations

food %>%
    autoplot(-1/Turnover) + labs(y = "Inverse turnover") + hrbrthemes::theme_ipsum_rc()

Box-Cox transformations

Each of these transformations is close to a member of the family of Box-Cox transformations: w_t = \left\{\begin{array}{ll} \log(y_t), & \quad \lambda = 0; \\ (y_t^\lambda-1)/\lambda , & \quad \lambda \ne 0. \end{array}\right.

  • \lambda=1: (No substantive transformation)
  • \lambda=\frac12: (Square root plus linear transformation)
  • \lambda=0: (Natural logarithm)
  • \lambda=-1: (Inverse plus 1)

Box-Cox transformations

food %>%
    mutate(!!!set_names(map(seq(0, 1, 0.01), ~expr(fabletools::box_cox(Turnover,
        !!.x))), seq(0, 1, 0.01))) %>%
    select(-Turnover) %>%
    pivot_longer(-Month, names_to = "lambda", values_to = "Turnover") %>%
    mutate(lambda = as.numeric(lambda)) %>%
    ggplot(aes(x = Month, y = Turnover)) + geom_line() + transition_states(1 -
    lambda, state_length = 0) + view_follow() + ggtitle("Box-CoxT(lambda = {format(1 - as.numeric(closest_state), digits = 2)})") +
    hrbrthemes::theme_ipsum_rc() -> my.anim
# save_animation('./img/Anim1.gif')

Animation

Box-Cox transformations

food %>%
    features(Turnover, features = guerrero)
# A tibble: 1 × 1
  lambda_guerrero
            <dbl>
1          0.0895
  • This attempts to balance the seasonal fluctuations and random variation across the series.
  • Always check the results.
  • A low value of \lambda can give extremely large prediction intervals.

Box-Cox transformations

food %>%
    autoplot(box_cox(Turnover, 0.0524)) + labs(y = "Box-Cox transformed turnover") +
    hrbrthemes::theme_ipsum_rc()

Transformations

  • Often no transformation needed.
  • Simple transformations are easier to explain and work well enough.
  • Transformations can have very large effect on PI.
  • If the data contains zeros, then don’t take logs.
  • logp1() can be useful for data with zeros.
  • If some data are negative, no power transformation is possible unless a constant is added to all values.
  • Choosing logs is a simple way to force forecasts to be positive
  • Transformations must be reversed to obtain forecasts on the original scale. (Handled automatically by fable.)

Try this out…

  1. For the following series, find an appropriate transformation in order to stabilise the variance.
  • United States GDP from global_economy
  • Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock
  • Victorian Electricity Demand from vic_elec.
  • Gas production from aus_production
  1. Why is a Box-Cox transformation unhelpful for the canadian_gas data?

Time series components

Time series patterns

Recall

  • Trend pattern exists when there is a long-term increase or decrease in the data.

  • Cyclic pattern exists when data exhibit rises and falls that are not of fixed period (duration usually of at least 2 years).

  • Seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week).

A Note on DeSeasoning

us_retail_employment <- us_employment %>%
    filter(year(Month) >= 1990, Title == "Retail Trade") %>%
    select(-Series_ID)
dcmp <- us_retail_employment %>%
    model(STL(Employed))
autoplot(us_retail_employment, Employed, color = "gray") + autolayer(components(dcmp),
    season_adjust, color = "blue") + labs(y = "Persons (thousands)", title = "Total employment in US retail")

Moving Averages

The general idea is a moving window. We will set .before and .after as follows.

aus_exports <- global_economy %>%
    filter(Country == "Australia") %>%
    mutate(`5-MA` = slider::slide_dbl(Exports, mean, .before = 2, .after = 2,
        .complete = TRUE))
aus_exports
# A tsibble: 58 x 10 [1Y]
# Key:       Country [1]
   Country  Code   Year     GDP Growth   CPI Imports Exports Popul…¹ `5-MA`
   <fct>    <fct> <dbl>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
 1 Austral… AUS    1960 1.86e10  NA     7.96    14.1    13.0  1.03e7   NA  
 2 Austral… AUS    1961 1.96e10   2.49  8.14    15.0    12.4  1.05e7   NA  
 3 Austral… AUS    1962 1.99e10   1.30  8.12    12.6    13.9  1.07e7   13.5
 4 Austral… AUS    1963 2.15e10   6.21  8.17    13.8    13.0  1.10e7   13.5
 5 Austral… AUS    1964 2.38e10   6.98  8.40    13.8    14.9  1.12e7   13.6
 6 Austral… AUS    1965 2.59e10   5.98  8.69    15.3    13.2  1.14e7   13.4
 7 Austral… AUS    1966 2.73e10   2.38  8.98    15.1    12.9  1.17e7   13.3
 8 Austral… AUS    1967 3.04e10   6.30  9.29    13.9    12.9  1.18e7   12.7
 9 Austral… AUS    1968 3.27e10   5.10  9.52    14.5    12.3  1.20e7   12.6
10 Austral… AUS    1969 3.66e10   7.04  9.83    13.3    12.0  1.23e7   12.6
# … with 48 more rows, and abbreviated variable name ¹​Population
autoplot(aus_exports, Exports) + autolayer(aus_exports, `5-MA`, color = "red") +
    labs(y = "Exports (% of GDP)", title = "Total Australian exports") + guides(colour = guide_legend(title = "series")) +
    hrbrthemes::theme_ipsum_rc()

We can even have moving averages of moving averages.

aus_exports2 <- aus_exports %>%
    mutate(`2x5-MA` = slider::slide_dbl(`5-MA`, mean, .before = 1, .after = 0,
        .complete = TRUE))
aus_exports2
# A tsibble: 58 x 11 [1Y]
# Key:       Country [1]
   Country  Code   Year     GDP Growth   CPI Imports Exports Popul…¹ `5-MA`
   <fct>    <fct> <dbl>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
 1 Austral… AUS    1960 1.86e10  NA     7.96    14.1    13.0  1.03e7   NA  
 2 Austral… AUS    1961 1.96e10   2.49  8.14    15.0    12.4  1.05e7   NA  
 3 Austral… AUS    1962 1.99e10   1.30  8.12    12.6    13.9  1.07e7   13.5
 4 Austral… AUS    1963 2.15e10   6.21  8.17    13.8    13.0  1.10e7   13.5
 5 Austral… AUS    1964 2.38e10   6.98  8.40    13.8    14.9  1.12e7   13.6
 6 Austral… AUS    1965 2.59e10   5.98  8.69    15.3    13.2  1.14e7   13.4
 7 Austral… AUS    1966 2.73e10   2.38  8.98    15.1    12.9  1.17e7   13.3
 8 Austral… AUS    1967 3.04e10   6.30  9.29    13.9    12.9  1.18e7   12.7
 9 Austral… AUS    1968 3.27e10   5.10  9.52    14.5    12.3  1.20e7   12.6
10 Austral… AUS    1969 3.66e10   7.04  9.83    13.3    12.0  1.23e7   12.6
# … with 48 more rows, 1 more variable: `2x5-MA` <dbl>, and abbreviated
#   variable name ¹​Population
autoplot(aus_exports2, Exports) + autolayer(aus_exports2, `5-MA`, color = "red") +
    autolayer(aus_exports2, `2x5-MA`, color = "blue") + labs(y = "Exports (% of GDP)",
    title = "Total Australian exports") + guides(colour = guide_legend(title = "series")) +
    hrbrthemes::theme_ipsum_rc()

Time series decomposition

y_t = f(S_t, T_t, R_t)

where y_t=: data at period t

T_t=: trend-cycle component at period t

S_t= & seasonal component at period t

R_t= & remainder component at period t

Additive decomposition: y_t = S_t + T_t + R_t.

Multiplicative decomposition: y_t = S_t \times T_t \times R_t.

Time series decomposition

  • Additive model appropriate if magnitude of seasonal fluctuations does not vary with level.
  • If seasonal are proportional to level of series, then multiplicative model appropriate.
  • Multiplicative decomposition more prevalent with economic series
  • Alternative: use a Box-Cox transformation, and then use additive decomposition.
  • Logs turn multiplicative relationship into an additive relationship:

y_t = S_t \times T_t \times E_t \quad\Rightarrow\quad \log y_t = \log S_t + \log T_t + \log R_t.

US Retail Employment

us_retail_employment <- us_employment %>%
    filter(year(Month) >= 1990, Title == "Retail Trade") %>%
    select(-Series_ID)
us_retail_employment
# A tsibble: 357 x 3 [1M]
      Month Title        Employed
      <mth> <chr>           <dbl>
 1 1990 Jan Retail Trade   13256.
 2 1990 Feb Retail Trade   12966.
 3 1990 Mar Retail Trade   12938.
 4 1990 Apr Retail Trade   13012.
 5 1990 May Retail Trade   13108.
 6 1990 Jun Retail Trade   13183.
 7 1990 Jul Retail Trade   13170.
 8 1990 Aug Retail Trade   13160.
 9 1990 Sep Retail Trade   13113.
10 1990 Oct Retail Trade   13185.
# … with 347 more rows

US Retail Employment

us_retail_employment %>%
    autoplot(Employed) + xlab("Year") + ylab("Persons (thousands)") + ggtitle("Total employment in US retail") +
    hrbrthemes::theme_ipsum_rc()
USREDC <- us_retail_employment %>%
    model(classical_decomposition(Employed, type = "additive")) %>%
    components()
USREDC
# A dable: 357 x 7 [1M]
# Key:     .model [1]
# :        Employed = trend + seasonal + random
   .model                       Month Emplo…¹  trend seaso…² random seaso…³
   <chr>                        <mth>   <dbl>  <dbl>   <dbl>  <dbl>   <dbl>
 1 "classical_decomposition… 1990 Jan  13256.    NA   -75.5   NA     13331.
 2 "classical_decomposition… 1990 Feb  12966.    NA  -273.    NA     13239.
 3 "classical_decomposition… 1990 Mar  12938.    NA  -253.    NA     13191.
 4 "classical_decomposition… 1990 Apr  13012.    NA  -190.    NA     13203.
 5 "classical_decomposition… 1990 May  13108.    NA   -88.9   NA     13197.
 6 "classical_decomposition… 1990 Jun  13183.    NA   -10.4   NA     13193.
 7 "classical_decomposition… 1990 Jul  13170. 13178.  -13.3    5.65  13183.
 8 "classical_decomposition… 1990 Aug  13160. 13161.   -9.99   8.80  13169.
 9 "classical_decomposition… 1990 Sep  13113. 13141.  -87.4   59.9   13201.
10 "classical_decomposition… 1990 Oct  13185. 13117.   34.6   33.8   13151.
# … with 347 more rows, and abbreviated variable names ¹​Employed,
#   ²​seasonal, ³​season_adjust
autoplot(USREDC) + labs(title = "Classical additive decomposition of total US retail employment") +
    hrbrthemes::theme_ipsum_rc()

US Retail Employment

us_retail_employment %>%
    model(stl = STL(Employed))
# A mable: 1 x 1
      stl
  <model>
1   <STL>

US Retail Employment

dcmp <- us_retail_employment %>%
    model(stl = STL(Employed))
components(dcmp)
# A dable: 357 x 7 [1M]
# Key:     .model [1]
# :        Employed = trend + season_year + remainder
   .model    Month Employed  trend season_year remainder season_adjust
   <chr>     <mth>    <dbl>  <dbl>       <dbl>     <dbl>         <dbl>
 1 stl    1990 Jan   13256. 13288.      -33.0      0.836        13289.
 2 stl    1990 Feb   12966. 13269.     -258.     -44.6          13224.
 3 stl    1990 Mar   12938. 13250.     -290.     -22.1          13228.
 4 stl    1990 Apr   13012. 13231.     -220.       1.05         13232.
 5 stl    1990 May   13108. 13211.     -114.      11.3          13223.
 6 stl    1990 Jun   13183. 13192.      -24.3     15.5          13207.
 7 stl    1990 Jul   13170. 13172.      -23.2     21.6          13193.
 8 stl    1990 Aug   13160. 13151.       -9.52    17.8          13169.
 9 stl    1990 Sep   13113. 13131.      -39.5     22.0          13153.
10 stl    1990 Oct   13185. 13110.       61.6     13.2          13124.
# … with 347 more rows

US Retail Employment

us_retail_employment %>%
    autoplot(Employed, color = "gray") + autolayer(components(dcmp), trend,
    color = "red") + xlab("Year") + ylab("Persons (thousands)") + ggtitle("Total employment in US retail") +
    hrbrthemes::theme_ipsum_rc()

US Retail Employment

components(dcmp) %>%
    autoplot() + xlab("Year") + hrbrthemes::theme_ipsum_rc()

US Retail Employment

components(dcmp) %>%
    gg_subseries(season_year) + hrbrthemes::theme_ipsum_rc()

Seasonal adjustment

  • Useful by-product of decomposition: an easy way to calculate seasonally adjusted data.
  • Additive decomposition: seasonally adjusted data given by y_t - S_t = T_t + R_t
  • Multiplicative decomposition: seasonally adjusted data given by y_t / S_t = T_t \times R_t

US Retail Employment

us_retail_employment %>%
    autoplot(Employed, color = "gray") + autolayer(components(dcmp), season_adjust,
    color = "blue") + xlab("Year") + ylab("Persons (thousands)") + ggtitle("Total employment in US retail") +
    hrbrthemes::theme_ipsum_rc()

Seasonal adjustment

  • We use estimates of S based on past values to seasonally adjust a current value.
  • Seasonally adjusted series reflect remainders as well as trend. Therefore they are not “smooth” and “downturns” or “upturns” can be misleading.
  • It is better to use the trend-cycle component to look for turning points.

History of time series decomposition

History of time series decomposition

  • Classical method originated in 1920s.
  • Census II method introduced in 1957. Basis for X-11 method and variants (including X-12-ARIMA, X-13-ARIMA)
  • STL method introduced in 1983
  • TRAMO/SEATS introduced in 1990s.

National Statistics Offices

  • ABS uses X-12-ARIMA
  • US Census Bureau uses X-13ARIMA-SEATS
  • Statistics Canada uses X-12-ARIMA
  • ONS (UK) uses X-12-ARIMA
  • EuroStat use X-13ARIMA-SEATS

X-11 decomposition

Advantages

  • Relatively robust to outliers
  • Completely automated choices for trend and seasonal changes
  • Very widely tested on economic data over a long period of time.

Disadvantages

  • No prediction/confidence intervals
  • Ad hoc method with no underlying model
  • Only developed for quarterly and monthly data
X11_dcmp <- us_retail_employment %>%
    model(seats = feasts:::X_13ARIMA_SEATS(Employed)) %>%
    components()
X11_dcmp
# A dable: 357 x 7 [1M]
# Key:     .model [1]
# :        Employed = f(trend, seasonal, irregular)
   .model    Month Employed  trend seasonal irregular season_adjust
   <chr>     <mth>    <dbl>  <dbl>    <dbl>     <dbl>         <dbl>
 1 seats  1990 Jan   13256. 13261.    0.999     1.00         13266.
 2 seats  1990 Feb   12966. 13243.    0.980     0.999        13235.
 3 seats  1990 Mar   12938. 13236.    0.977     1.00         13238.
 4 seats  1990 Apr   13012. 13233.    0.983     1.00         13235.
 5 seats  1990 May   13108. 13222.    0.991     1.00         13223.
 6 seats  1990 Jun   13183. 13206.    0.998     1.00         13205.
 7 seats  1990 Jul   13170. 13187.    0.999     1.00         13190.
 8 seats  1990 Aug   13160. 13165.    1.00      1.00         13162.
 9 seats  1990 Sep   13113. 13145.    0.998     1.00         13146.
10 seats  1990 Oct   13185. 13129.    1.00      1.00         13126.
# … with 347 more rows
autoplot(X11_dcmp) + hrbrthemes::theme_ipsum_rc()

Extensions: X-12-ARIMA and X-13-ARIMA

  • The X-11, X-12-ARIMA and X-13-ARIMA methods are based on Census II decomposition.
  • These allow adjustments for trading days and other explanatory variables.
  • Known outliers can be omitted.
  • Level shifts and ramp effects can be modelled.
  • Missing values estimated and replaced.
  • Holiday factors (e.g., Easter, Labour Day) can be estimated.

X-13ARIMA-SEATS

Advantages

  • Model-based
  • Smooth trend estimate
  • Allows estimates at end points
  • Allows changing seasonality
  • Developed for economic data

Disadvantages

  • Only developed for quarterly and monthly data

The details.

seats_dcmp <- us_retail_employment %>%
    model(seats = feasts:::SEATS(Employed)) %>%
    components()
seats_dcmp
# A dable: 357 x 7 [1M]
# Key:     .model [1]
# :        Employed = trend * seasonal * irregular
   .model    Month Employed  trend seasonal irregular season_adjust
   <chr>     <mth>    <dbl>  <dbl>    <dbl>     <dbl>         <dbl>
 1 seats  1990 Jan   13256. 13261.    0.999     1.00         13266.
 2 seats  1990 Feb   12966. 13243.    0.980     0.999        13235.
 3 seats  1990 Mar   12938. 13236.    0.977     1.00         13238.
 4 seats  1990 Apr   13012. 13233.    0.983     1.00         13235.
 5 seats  1990 May   13108. 13222.    0.991     1.00         13223.
 6 seats  1990 Jun   13183. 13206.    0.998     1.00         13205.
 7 seats  1990 Jul   13170. 13187.    0.999     1.00         13190.
 8 seats  1990 Aug   13160. 13165.    1.00      1.00         13162.
 9 seats  1990 Sep   13113. 13145.    0.998     1.00         13146.
10 seats  1990 Oct   13185. 13129.    1.00      1.00         13126.
# … with 347 more rows

A Plot

autoplot(seats_dcmp) + labs(title = "SEATS decomposition of total US retail employment") +
    hrbrthemes::theme_ipsum_rc()

STL decomposition

STL decomposition

  • STL: “Seasonal and Trend decomposition using Loess”
  • Very versatile and robust.
  • Unlike X-12-ARIMA, STL will handle any type of seasonality.
  • Seasonal component allowed to change over time, and rate of change controlled by user.
  • Smoothness of trend-cycle also controlled by user.
  • Robust to outliers
  • Not trading day or calendar adjustments.
  • Only additive.
  • Take logs to get multiplicative decomposition.
  • Use Box-Cox transformations to get other decompositions.

STL decomposition

us_retail_employment %>%
    model(STL(Employed ~ season(window = 9), robust = TRUE)) %>%
    components() %>%
    autoplot() + ggtitle("STL decomposition: US retail employment")

STL decomposition

s_windows <- seq(5, 55, by = 2)
stl_defs <- purrr::map(s_windows, function(s_window) {
    STL(Employed ~ season(window = s_window), robust = TRUE)
})
names(stl_defs) <- sprintf("season(window=%02d)", s_windows)

us_retail_employment %>%
    model(!!!stl_defs) %>%
    components() %>%
    as_tibble() %>%
    pivot_longer(Employed:remainder, names_to = "component", names_ptypes = list(component = factor(levels = c("Employed",
        "trend", "season_year", "remainder"))), values_to = "Employed") %>%
    ggplot(aes(x = Month, y = Employed)) + geom_line() + facet_grid(rows = vars(component),
    scales = "free_y") + labs(title = "STL decomposition of US retail employment",
    subtitle = "{closest_state}") + transition_states(.model)

STL decomposition

us_retail_employment %>%
    model(STL(Employed ~ season(window = 5))) %>%
    components()

us_retail_employment %>%
    model(STL(Employed ~ trend(window = 15) + season(window = "periodic"), robust = TRUE)) %>%
    components()
  • trend(window = ?) controls wiggliness of trend component.
  • season(window = ?) controls variation on seasonal component.
  • season(window = 'periodic') is equivalent to an infinite window.

STL decomposition

us_retail_employment %>%
    model(STL(Employed)) %>%
    components() %>%
    autoplot()
  • STL() chooses season(window=13) by default
  • Can include transformations.

STL decomposition

  • Algorithm that updates trend and seasonal components iteratively.
  • Starts with \hat{T}_t=0
  • Uses a mixture of loess and moving averages to successively refine the trend and seasonal estimates.
  • The trend window controls loess bandwidth applied to deasonalised values.
  • The season window controls loess bandwidth applied to detrended subseries.
  • Robustness weights based on remainder.
  • Default season window = 13 0 Default trend window = nextodd(ceiling((1.5*period)/(1-(1.5/s.window)))

When things go wrong

The ABS stuff-up

employed
# A tsibble: 440 x 4 [1M]
       Time Month  Year Employed
      <mth> <ord> <dbl>    <dbl>
 1 1978 Feb Feb    1978    5986.
 2 1978 Mar Mar    1978    6041.
 3 1978 Apr Apr    1978    6054.
 4 1978 May May    1978    6038.
 5 1978 Jun Jun    1978    6031.
 6 1978 Jul Jul    1978    6036.
 7 1978 Aug Aug    1978    6005.
 8 1978 Sep Sep    1978    6024.
 9 1978 Oct Oct    1978    6046.
10 1978 Nov Nov    1978    6034.
# … with 430 more rows

The ABS stuff-up

Details:

employed %>%
    autoplot(Employed) + ggtitle("Total employed") + ylab("Thousands") + xlab("Year")

The ABS stuff-up

employed %>%
    filter(Year >= 2005) %>%
    autoplot(Employed) + ggtitle("Total employed") + ylab("Thousands") + xlab("Year")

The ABS stuff-up

employed %>%
    filter(Year >= 2005) %>%
    gg_season(Employed) + ggtitle("Total employed") + ylab("Thousands")

The ABS stuff-up

employed %>%
    mutate(diff = difference(Employed)) %>%
    filter(Month == "Sep") %>%
    ggplot(aes(y = diff, x = 1)) + geom_boxplot() + coord_flip() + ggtitle("Sep - Aug: total employed") +
    xlab("") + ylab("Thousands") + scale_x_continuous(breaks = NULL, labels = NULL)

The ABS stuff-up

dcmp <- employed %>%
    filter(Year >= 2005) %>%
    model(stl = STL(Employed ~ season(window = 11), robust = TRUE))
components(dcmp) %>%
    autoplot()

The ABS stuff-up

components(dcmp) %>%
    filter(year(Time) == 2013) %>%
    gg_season(season_year) + ggtitle("Seasonal component") + guides(colour = "none")

The ABS stuff-up

components(dcmp) %>%
    as_tsibble() %>%
    autoplot(season_adjust)

The ABS stuff-up

  • August 2014 employment numbers higher than expected.
  • Supplementary survey usually conducted in August for employed people.
  • Most likely, some employed people were claiming to be unemployed in August to avoid supplementary questions.
  • Supplementary survey not run in 2014, so no motivation to lie about employment.
  • In previous years, seasonal adjustment fixed the problem.
  • The ABS has now adopted a new method to avoid the bias.

Some Data for Today and General Considerations

Panel data. Multiple time series are often described as a panel, a cross-section of time series, or a time series of cross-sections. The data structure has two [non-overlapping] indices. Let’s review, and discuss a bit, what exactly we mean.

Extending the Data

fredr is amazing.

US.Employment <- map_dfr(
c(rownames(table(us_employment$Series_ID))), ~fredr::fredr_series_observations(.))
save(US.Employment, file="USEmployment.RData")
load(url("https://github.com/robertwwalker/xaringan/raw/master/CMF-Week-9/USEmployment.RData"))
load("USEmployment.RData")
us_employment %>%
    data.frame() %>%
    group_by(Series_ID) %>%
    summarise(Title = first(Title)) %>%
    mutate(series_id = Series_ID) %>%
    ungroup() %>%
    select(-Series_ID) -> Names.List
US.Employment.T <- left_join(US.Employment, Names.List, by = c(series_id = "series_id")) %>%
    mutate(YM = yearmonth(date)) %>%
    rename(Employed = value) %>%
    as_tsibble(., index = YM, key = Title)

Additional Features

For much of the study of time series, the key issue is one known as stationarity. For now, we will do at least some hand waving to be clarified in chapters 5 and more in 9. But we want to compute things and then build out all the details. Let’s take my new retail employment data.

A Recreation on New Data

EMPN <- US.Employment.T %>%
    filter(YM > yearmonth("1990-01") & Title == "Retail Trade") %>%
    as_tsibble(index = YM)
EMPO <- us_employment %>%
    filter(Title == "Retail Trade" & Month > yearmonth("1990-01")) %>%
    as_tsibble(., index = Month)
Plot1 <- ggplot(EMPN, aes(x = YM, y = Employed)) + geom_line(color = "red") +
    geom_line(data = EMPO, aes(x = Month, y = Employed), inherit.aes = FALSE)
Plot1

Data are Revised Occasionally

library(patchwork)
dcmp <- EMPO %>%
    model(stl = STL(Employed))
Plot2 <- components(dcmp) %>%
    autoplot()
dcmp <- EMPN %>%
    model(stl = STL(Employed))
Plot3 <- components(dcmp) %>%
    autoplot()
Plot1/(Plot2 + Plot3)

Three Sectors

USET <- US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Retail Trade", "Financial Activities",
        "Manufacturing")) %>%
    as_tsibble(., index = YM, key = Title)
USET %>%
    autoplot(Employed)

Retail (season)

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Retail Trade")) %>%
    as_tsibble(., index = YM) %>%
    gg_season(Employed)

Retail (subseries)

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Retail Trade")) %>%
    as_tsibble(., index = YM) %>%
    gg_subseries(Employed)

Retail (lag)

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Retail Trade")) %>%
    as_tsibble(., index = YM) %>%
    gg_lag(Employed)

Manufacturing

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Manufacturing")) %>%
    as_tsibble(., index = YM) %>%
    gg_season(Employed)

Manufacturing

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Manufacturing")) %>%
    as_tsibble(., index = YM) %>%
    gg_subseries(Employed)

Manufacturing

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Manufacturing")) %>%
    as_tsibble(., index = YM) %>%
    gg_lag(Employed)

Financial

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Financial Activities")) %>%
    as_tsibble(., index = YM) %>%
    gg_season(Employed)

Financial

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Financial Activities")) %>%
    as_tsibble(., index = YM) %>%
    gg_subseries(Employed)

Financial

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Financial Activities")) %>%
    as_tsibble(., index = YM) %>%
    gg_lag(Employed)

Features: Summary

The features command is the magic tool for tidy summary and statistics for time series in this index/key format. For example, basic summary

USET %>%
    features(Employed, features = list(mean = mean, min = min, max = max, sd = sd,
        quantile))
# A tibble: 3 × 10
  Title         mean    min    max    sd   `0%`  `25%`  `50%`  `75%` `100%`
  <chr>        <dbl>  <dbl>  <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 Financial …  7767.  6472   8846   640.  6472   7363   7876   8226.  8846 
2 Manufactur… 14554. 11340  17870  2241. 11340  12333. 14219  17088  17870 
3 Retail Tra… 14746. 12548. 16394.  915. 12548. 14336. 14962. 15387. 16394.

Features: Correlation Features

Learning about the time series properties

USET %>%
    features(Employed, features = feat_acf)
# A tibble: 3 × 8
  Title                 acf1 acf10 diff1_…¹ diff1…² diff2…³ diff2…⁴ seaso…⁵
  <chr>                <dbl> <dbl>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 Financial Activities 0.990  8.94   0.283    0.165  -0.313   0.415   0.883
2 Manufacturing        0.995  9.35   0.0466   0.128  -0.499   0.505   0.925
3 Retail Trade         0.951  7.29   0.133    0.377  -0.198   0.305   0.876
# … with abbreviated variable names ¹​diff1_acf1, ²​diff1_acf10,
#   ³​diff2_acf1, ⁴​diff2_acf10, ⁵​season_acf1
USET %>%
    group_by(Title) %>%
    ACF(Employed) %>%
    autoplot()

For Contrast: Ford Returns

library(tidyquant)
Ford <- tq_get("F", from = "2000-01-01")
FordT <- Ford %>%
    as_tsibble(index = date)
FordT %>%
    autoplot(adjusted)

FC <- Ford %>%
    tq_transmute(adjusted, mutate_fun = periodReturn, period = "monthly") %>%
    mutate(YM = yearmonth(date)) %>%
    as_tsibble(., index = YM)
FC %>%
    autoplot(monthly.returns)

Ford’s ACF

The 6/7 and 12/13 patterns are interesting….

library(patchwork)
FC1 <- FC %>%
    ACF(monthly.returns) %>%
    autoplot()
FC2 <- FC %>%
    PACF(monthly.returns) %>%
    autoplot()
FC1 + FC2

Decomposition Features

USET %>%
    features(Employed, feat_stl)
# A tibble: 3 × 10
  Title     trend…¹ seaso…² seaso…³ seaso…⁴ spiki…⁵ linea…⁶ curva…⁷ stl_e…⁸
  <chr>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 Financia…   0.999   0.820       6       3  1.28e1  10603.  -3014.   0.718
2 Manufact…   0.999   0.555       8       3  8.41e3 -39200.   4365.   0.612
3 Retail T…   0.983   0.834      11       3  1.11e5  13098.  -6397.   0.506
# … with 1 more variable: stl_e_acf10 <dbl>, and abbreviated variable
#   names ¹​trend_strength, ²​seasonal_strength_year, ³​seasonal_peak_year,
#   ⁴​seasonal_trough_year, ⁵​spikiness, ⁶​linearity, ⁷​curvature, ⁸​stl_e_acf1

With More Data

NUSET8k <- US.Employment.T %>%
    data.frame() %>%
    group_by(Title) %>%
    summarise(MaxE = max(Employed)) %>%
    arrange(desc(MaxE)) %>%
    filter(MaxE > 8000 & MaxE < 120000)
USET8k <- left_join(NUSET8k, US.Employment.T) %>%
    as_tsibble(., index = YM, key = Title)

An Improvement on the Trend/Season

The details are at the bottom for other statistics.

library(kableExtra)
USET8k %>%
    features(Employed, feat_stl) %>%
    knitr::kable(format = "html") %>%
    scroll_box(width = "100%", height = "300px")
Title trend_strength seasonal_strength_year seasonal_peak_year seasonal_trough_year spikiness linearity curvature stl_e_acf1 stl_e_acf10
Construction 0.999 0.963 8 2 2.10e+02 53918 -1084.7 0.576 0.526
Durable Goods 0.994 0.215 9 7 3.42e+03 1295 -40394.6 0.750 1.251
Education and Health Services 1.000 0.715 11 7 7.21e+03 217955 58537.7 0.529 0.580
Education and Health Services: Health Care 0.999 0.354 0 4 1.59e+04 46025 -436.7 0.495 0.533
Education and Health Services: Health Care and Social Assistance 0.999 0.313 0 4 6.19e+04 63326 17.6 0.517 0.584
Financial Activities 1.000 0.870 7 4 6.92e-01 78437 -273.1 0.723 0.879
Goods-Producing 0.996 0.812 9 2 1.12e+04 43343 -64790.2 0.734 1.081
Government 1.000 0.981 11 7 3.94e+02 190941 -19815.3 0.599 0.538
Government: Local Government 1.000 0.986 5 7 3.14e+02 96008 -15450.2 0.640 0.692
Government: Local Government Education 1.000 0.996 3 7 3.54e+01 54936 -8479.9 0.538 0.480
Leisure and Hospitality 0.997 0.607 7 4 5.47e+05 136395 22427.3 0.522 0.604
Leisure and Hospitality: Accommodation and Food Services 0.971 0.473 8 4 5.27e+06 31615 -1157.7 0.499 0.541
Leisure and Hospitality: Food Services and Drinking Places 0.975 0.404 6 4 3.26e+06 29889 -377.9 0.476 0.497
Manufacturing 0.997 0.434 9 2 5.19e+03 -8507 -64155.3 0.772 1.262
Private Service-Providing 1.000 0.543 0 4 1.50e+07 906495 115965.6 0.545 0.623
Professional and Business Services 1.000 0.675 11 1 3.62e+03 187091 41544.7 0.626 0.810
Professional and Business Services: Administrative and Support Services 0.995 0.808 11 1 1.84e+04 21351 -8027.1 0.617 0.797
Professional and Business Services: Administrative and Waste Services 0.995 0.810 11 1 1.93e+04 22521 -8025.8 0.617 0.804
Professional and Business Services: Professional and Technical Services 1.000 0.691 2 5 1.81e+02 29125 -1380.0 0.626 0.719
Retail Trade 1.000 0.881 0 4 5.78e+03 135654 -6877.1 0.511 0.473
Trade, Transportation, and Utilities 1.000 0.845 0 4 2.02e+04 211628 -7668.1 0.566 0.583

coef_hurst

A measure of the degree to which adjacent observations depend on one another over time. Generically, this statistic takes values between zero and one with one indicating very high levels of dependence through time.

USET %>%
    features(Employed, coef_hurst)
# A tibble: 3 × 2
  Title                coef_hurst
  <chr>                     <dbl>
1 Financial Activities      1.00 
2 Manufacturing             1.00 
3 Retail Trade              0.999

Middling for Ford

FC %>%
    features(monthly.returns, features = coef_hurst)
# A tibble: 1 × 1
  coef_hurst
       <dbl>
1      0.500

feat_spectral

USET %>%
    features(Employed, feat_spectral)
# A tibble: 3 × 2
  Title                spectral_entropy
  <chr>                           <dbl>
1 Financial Activities            0.263
2 Manufacturing                   0.180
3 Retail Trade                    0.412
FC %>%
    features(monthly.returns, features = feat_spectral)
# A tibble: 1 × 1
  spectral_entropy
             <dbl>
1            0.988

The Absence of Correlation

Ljung-Box modifies the idea in the Box-Pierce statistic for assessing whether or not a given series [or transformation thereof] is essentially uncorrelated. In both cases, we will get to the details next week [chapter 5]. For now, the idea is simply that k squared autocorrelations will sum to a chi-squared distribution with k degrees of freedom. Large correlations reveal dependence.

USET %>%
    features(Employed, features = list(box_pierce, ljung_box))
# A tibble: 3 × 5
  Title                bp_stat bp_pvalue lb_stat lb_pvalue
  <chr>                  <dbl>     <dbl>   <dbl>     <dbl>
1 Financial Activities    365.         0    368.         0
2 Manufacturing           368.         0    371.         0
3 Retail Trade            337.         0    339.         0
FC %>%
    features(monthly.returns, features = list(box_pierce, ljung_box))
# A tibble: 1 × 4
  bp_stat bp_pvalue lb_stat lb_pvalue
    <dbl>     <dbl>   <dbl>     <dbl>
1  0.0359     0.850  0.0363     0.849

feat_pacf

USET %>%
    features(Employed, feat_pacf)
# A tibble: 3 × 5
  Title                pacf5 diff1_pacf5 diff2_pacf5 season_pacf
  <chr>                <dbl>       <dbl>       <dbl>       <dbl>
1 Financial Activities 0.987       0.712       1.00      -0.0555
2 Manufacturing        0.994       0.238       0.791      0.0348
3 Retail Trade         1.08        0.834       1.06      -0.0188
FC %>%
    features(monthly.returns, features = feat_pacf)
# A tibble: 1 × 4
    pacf5 diff1_pacf5 diff2_pacf5 season_pacf
    <dbl>       <dbl>       <dbl>       <dbl>
1 0.00612       0.632        1.04       0.109

Unit Roots

The stationarity issue from earlier is given much attention. Can we reasonably think of characteristics as fixed? There are three means of assessment with details to Chapter 9.

USET %>%
    features(Employed, features = list(unitroot_kpss, unitroot_pp, unitroot_ndiffs,
        unitroot_nsdiffs)) %>%
    knitr::kable(format = "html")
Title kpss_stat kpss_pvalue pp_stat pp_pvalue ndiffs nsdiffs
Financial Activities 4.63 0.01 -1.193 0.100 1 1
Manufacturing 5.68 0.01 -0.938 0.100 1 0
Retail Trade 3.91 0.01 -2.636 0.089 1 1
FC %>%
    features(monthly.returns, features = list(unitroot_kpss, unitroot_pp, unitroot_ndiffs,
        unitroot_nsdiffs))
# A tibble: 1 × 6
  kpss_stat kpss_pvalue pp_stat pp_pvalue ndiffs nsdiffs
      <dbl>       <dbl>   <dbl>     <dbl>  <int>   <int>
1    0.0895         0.1   -16.6      0.01      0       0

Tiling

A reminder

USET %>%
    features(Employed, features = list(var_tiled_mean, var_tiled_var))
# A tibble: 3 × 3
  Title                var_tiled_mean var_tiled_var
  <chr>                         <dbl>         <dbl>
1 Financial Activities          1.02      0.0000411
2 Manufacturing                 1.03      0.0000923
3 Retail Trade                  0.922     0.0136   
FC %>%
    features(monthly.returns, features = list(var_tiled_mean, var_tiled_var))
# A tibble: 1 × 2
  var_tiled_mean var_tiled_var
           <dbl>         <dbl>
1          0.118          2.59

Detecting Shifts

USET %>%
    features(Employed, features = list(shift_level_max, shift_var_max, shift_kl_max)) %>%
    kable(format = "html")
Title shift_level_max shift_level_index shift_var_max shift_var_index shift_kl_max shift_kl_index
Financial Activities 371 229 24037 233 0.299 227
Manufacturing 1559 228 417020 235 0.522 227
Retail Trade 777 226 788931 354 1.841 227
FC %>%
    features(monthly.returns, features = list(shift_level_max, shift_var_max,
        shift_kl_max)) %>%
    kable(format = "html")
shift_level_max shift_level_index shift_var_max shift_var_index shift_kl_max shift_kl_index
0.258 110 0.194 113 36.8 112

Crossings and Flat Spots

USET %>%
    features(Employed, features = list(n_crossing_points, longest_flat_spot)) %>%
    kable(format = "html")
Title n_crossing_points longest_flat_spot
Financial Activities 5 40
Manufacturing 11 52
Retail Trade 31 10
FC %>%
    features(monthly.returns, features = list(n_crossing_points, longest_flat_spot)) %>%
    kable(format = "html")
n_crossing_points longest_flat_spot
129 8

ARCH

What proportion of the current squared residual is explained by the prior squared residual? This reports R^2; if the variance explained is large, volatility is persistent. There is a chi-square statistic also.

USET %>%
    features(Employed, features = stat_arch_lm) %>%
    kable(format = "html")
Title stat_arch_lm
Financial Activities 0.989
Manufacturing 0.972
Retail Trade 0.917
FC %>%
    features(monthly.returns, features = stat_arch_lm) %>%
    kable(format = "html")
stat_arch_lm
0.05

The Box-Cox

USET %>%
    features(Employed, features = guerrero) %>%
    kable(format = "html")
Title lambda_guerrero
Financial Activities 0.948
Manufacturing 1.037
Retail Trade 1.186
FC %>%
    features(monthly.returns, features = guerrero) %>%
    kable(format = "html")
lambda_guerrero
0.78
USET %>%
    features(Employed, features = guerrero)
# A tibble: 3 × 2
  Title                lambda_guerrero
  <chr>                          <dbl>
1 Financial Activities           0.948
2 Manufacturing                  1.04 
3 Retail Trade                   1.19 

Filtered Manufacturing

USET %>%
    filter(Title == "Manufacturing") %>%
    mutate(Filt = box_cox(Employed, 1.0369662)) %>%
    select(YM, Filt, Employed) %>%
    pivot_longer(c(Filt, Employed)) %>%
    autoplot(value)

USET %>%
    filter(Title == "Financial Activities") %>%
    autoplot(box_cox(Employed, 0.9481456))

USET %>%
    filter(Title == "Retail Trade") %>%
    autoplot(box_cox(Employed, 1.1860464))

FC %>%
    features(monthly.returns, features = guerrero)
# A tibble: 1 × 1
  lambda_guerrero
            <dbl>
1           0.780
FC %>%
    autoplot(box_cox(monthly.returns, 0.6857523))

Australian Tourism

The example is great.

Principal Components

Lets walk through this example.

Ex. PC