Choice and Forecasting: Week 9

Robert W. Walker

Getting Started

load(url("https://github.com/robertwwalker/xaringan/raw/master/CMF-Week-9/data/FullWorkspace.RData"))

Autocorrelation and autocovariance

Lag plots and autocorrelation

Example: Beer production

new_production <- aus_production %>%
    filter(year(Quarter) >= 1992)
new_production

# A tsibble: 74 x 7 [1Q]
   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
 1 1992 Q1   443    5777    383   1289       38332   117
 2 1992 Q2   410    5853    404   1501       39774   151
 3 1992 Q3   420    6416    446   1539       42246   175
 4 1992 Q4   532    5825    420   1568       38498   129
 5 1993 Q1   433    5724    394   1450       39460   116
 6 1993 Q2   421    6036    462   1668       41356   149
 7 1993 Q3   410    6570    475   1648       42949   163
 8 1993 Q4   512    5675    443   1863       40974   138
 9 1994 Q1   449    5311    421   1468       40162   127
10 1994 Q2   381    5717    475   1755       41199   159
# … with 64 more rows

Example: Beer production

new_production %>%
    gg_lag(Beer)

Example: Beer production

new_production %>%
    gg_lag(Beer, geom = "point")

Lagged scatterplots

Each graph shows y_t plotted against y_{t-k} for different values of k.

The autocorrelations are the correlations associated with these scatterplots.
ACF (autocorrelation function):
- r_1=\text{Correlation}(y_{t}, y_{t-1})
- r_2=\text{Correlation}(y_{t}, y_{t-2})
- r_3=\text{Correlation}(y_{t}, y_{t-3})
- etc.

Autocorrelation

Covariance and correlation: measure extent of linear relationship between two variables (y and X).

Autocovariance and autocorrelation: measure linear relationship between lagged values of a time series y.

We measure the relationship between:

y_{t} and y_{t-1}
y_{t} and y_{t-2}
y_{t} and y_{t-3}
etc.

Autocorrelation

We denote the sample autocovariance at lag k by c_k and the sample autocorrelation at lag k by r_k. Then define

c_k = \frac{1}{T}\sum_{t=k+1}^T (y_t-\bar{y})(y_{t-k}-\bar{y}) r_{k} = c_k/c_0

r_1 indicates how successive values of y relate to each other
r_2 indicates how y values two periods apart relate to each other
r_k is the same as the sample correlation between y_t and y_{t-k}.

Autocorrelation

Results for first 9 lags for beer data:

new_production <- aus_production %>%
    filter(year(Quarter) >= 1992)
new_production %>%
    ACF(Beer, lag_max = 9)

# A tsibble: 9 x 2 [1Q]
    lag     acf
  <lag>   <dbl>
1    1Q -0.102 
2    2Q -0.657 
3    3Q -0.0603
4    4Q  0.869 
5    5Q -0.0892
6    6Q -0.635 
7    7Q -0.0542
8    8Q  0.832 
9    9Q -0.108

Autocorrelation

Results for first 9 lags for beer data:

new_production %>%
    ACF(Beer, lag_max = 9) %>%
    autoplot()

Together, the autocorrelations at lags 1, 2, , make up the or ACF.
The plot is known as a correlogram

Autocorrelation

new_production %>%
    ACF(Beer) %>%
    autoplot()

r_{4} higher than for the other lags. This is due to the seasonal pattern in the data: the peaks tend to be 4 quarters apart and the troughs tend to be 2 quarters apart.
r_2 is more negative than for the other lags because troughs tend to be 2 quarters behind peaks.

Trend and seasonality in ACF plots

When data have a trend, the autocorrelations for small lags tend to be large and positive.
When data are seasonal, the autocorrelations will be larger at the seasonal lags (i.e., at multiples of the seasonal frequency)
When data are trended and seasonal, you see a combination of these effects.

Autocorrelation functions

AutoCorrelation Functions by Allison Horst

US retail trade employment

retail <- us_employment %>%
    filter(Title == "Retail Trade", year(Month) >= 1980)
retail %>%
    autoplot(Employed)

US retail trade employment

retail %>%
    ACF(Employed, lag_max = 48) %>%
    autoplot()

Google stock price

google_2015 <- gafa_stock %>%
    filter(Symbol == "GOOG", year(Date) == 2015) %>%
    select(Date, Close)
google_2015

# A tsibble: 252 x 2 [!]
   Date       Close
   <date>     <dbl>
 1 2015-01-02  522.
 2 2015-01-05  511.
 3 2015-01-06  499.
 4 2015-01-07  498.
 5 2015-01-08  500.
 6 2015-01-09  493.
 7 2015-01-12  490.
 8 2015-01-13  493.
 9 2015-01-14  498.
10 2015-01-15  499.
# … with 242 more rows

Google stock price

google_2015 %>%
    autoplot(Close)

Google stock price

google_2015 %>%
    ACF(Close, lag_max = 100)

# A tsibble: 100 x 2 [1]
     lag   acf
   <lag> <dbl>
 1     1 0.982
 2     2 0.959
 3     3 0.937
 4     4 0.918
 5     5 0.901
 6     6 0.883
 7     7 0.865
 8     8 0.849
 9     9 0.834
10    10 0.818
# … with 90 more rows

Google stock price

google_2015 %>%
    ACF(Close, lag_max = 100) %>%
    autoplot()

Which is which?

White noise

Example: White noise

set.seed(30)
wn <- tsibble(t = 1:50, y = rnorm(50), index = t)
wn %>%
    autoplot(y)

White noise data is uncorrelated across time with zero mean and constant variance.

(Technically, we require independence as well.)

Example: White noise

wn %>% ACF(y)

r_{1}	r_{2}	r_{3}	r_{4}	r_{5}	r_{6}	r_{7}	r_{8}	r_{9}	r_{10}
0.014	-0.163	0.163	-0.259	-0.198	0.064	-0.139	-0.032	0.199	-0.024

Sample autocorrelations for white noise series.
Expect each autocorrelation to be close to zero.
Blue lines show 95% critical values.

Sampling distribution of autocorrelations

Sampling distribution of r_k for white noise data is asymptotically N(0,1/T).

95% of all r_k for white noise must lie within \pm 1.96/\sqrt{T}.
If this is not the case, the series is probably not WN.
Common to plot lines at \pm 1.96/\sqrt{T} when plotting ACF. These are the critical values.

Example: Pigs slaughtered

pigs <- aus_livestock %>%
    filter(State == "Victoria", Animal == "Pigs", year(Month) >= 2014)
pigs %>%
    autoplot(Count/1000) + labs(y = "Thousands", title = "Number of pigs slaughtered in Victoria")

Example: Pigs slaughtered

pigs %>%
    ACF(Count) %>%
    autoplot()

Example: Pigs slaughtered

Monthly total number of pigs slaughtered in the state of Victoria, Australia, from January 2014 through December 2018 (Source: Australian Bureau of Statistics.)

Difficult to detect pattern in time plot.
ACF shows significant autocorrelation for lag 2 and 12.
Indicate some slight seasonality.

These show the series is not a white noise series.

Let’s Try One

You can compute the daily changes in the Google stock price in 2018 using

dgoog <- gafa_stock %>%
    filter(Symbol == "GOOG", year(Date) >= 2018) %>%
    mutate(diff = difference(Close))

Does diff look like white noise?

Chapter 3: Decomposition

Transformations and adjustments

Getting started

library(tidyverse)
library(fpp3)
library(purrr)
library(gganimate)

Filter Australia

global_economy %>%
    filter(Country == "Australia") %>%
    autoplot(GDP)

How to Adjust?

Per capita adjustments

global_economy %>%
    filter(Country == "Australia") %>%
    autoplot(GDP/Population) + hrbrthemes::theme_ipsum_rc()

Have a look….

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

global_economy %>%
    mutate(GDPPC = GDP/Population) %>%
    select(Country, Year, GDPPC) %>%
    top_n(., 10, wt = GDPPC)

# A tsibble: 10 x 3 [1Y]
# Key:       Country [2]
   Country        Year   GDPPC
   <fct>         <dbl>   <dbl>
 1 Liechtenstein  2013 173528.
 2 Liechtenstein  2014 179308.
 3 Liechtenstein  2015 167591.
 4 Liechtenstein  2016 164993.
 5 Monaco         2007 167125.
 6 Monaco         2008 180640.
 7 Monaco         2013 172589.
 8 Monaco         2014 185153.
 9 Monaco         2015 163369.
10 Monaco         2016 168011.

Plot

global_economy %>%
    autoplot(GDP/Population) + guides(color = FALSE) + hrbrthemes::theme_ipsum_rc()

Inflation adjustments

global_economy <- tsibbledata::global_economy
print_retail <- aus_retail %>%
    filter(Industry == "Newspaper and book retailing") %>%
    group_by(Industry) %>%
    index_by(Year = year(Month)) %>%
    summarise(Turnover = sum(Turnover))
aus_economy <- filter(global_economy, Code == "AUS")
print_retail %>%
    left_join(aus_economy, by = "Year") %>%
    mutate(Adj_turnover = Turnover/CPI) %>%
    pivot_longer(c(Turnover, Adj_turnover), names_to = "Type", values_to = "Turnover") %>%
    ggplot(aes(x = Year, y = Turnover)) + geom_line() + facet_grid(vars(Type),
    scales = "free_y") + xlab("Years") + ylab(NULL) + ggtitle("Turnover: Australian print media industry") +
    hrbrthemes::theme_ipsum_rc()

Mathematical transformations

If the data show different variation at different levels of the series, then a transformation can be useful.

Denote original observations as y_1,\dots,y_n and transformed observations as w_1, \dots, w_n.

Transformations
Square root	w_t = \sqrt{y_t}
Cube root	w_t = \sqrt[3]{y_t}
Logarithm	w_t = \log(y_t)

Logarithms, in particular, are useful because they are more interpretable: changes in a log value are relative (percent) changes on the original scale.

Mathematical transformations

food <- aus_retail %>%
    filter(Industry == "Food retailing") %>%
    summarise(Turnover = sum(Turnover))

Mathematical transformations

food %>%
    autoplot(sqrt(Turnover)) + labs(y = "Square root turnover") + hrbrthemes::theme_ipsum_rc()

Mathematical transformations

food %>%
    autoplot(Turnover^(1/3)) + labs(y = "Cube root turnover") + hrbrthemes::theme_ipsum_rc()

Mathematical transformations

food %>%
    autoplot(log(Turnover)) + labs(y = "Log turnover") + hrbrthemes::theme_ipsum_rc()

Mathematical transformations

food %>%
    autoplot(-1/Turnover) + labs(y = "Inverse turnover") + hrbrthemes::theme_ipsum_rc()

Box-Cox transformations

Each of these transformations is close to a member of the family of Box-Cox transformations: w_t = \left\{\begin{array}{ll} \log(y_t), & \quad \lambda = 0; \\ (y_t^\lambda-1)/\lambda , & \quad \lambda \ne 0. \end{array}\right.

\lambda=1: (No substantive transformation)
\lambda=\frac12: (Square root plus linear transformation)
\lambda=0: (Natural logarithm)
\lambda=-1: (Inverse plus 1)

Box-Cox transformations

food %>%
    mutate(!!!set_names(map(seq(0, 1, 0.01), ~expr(fabletools::box_cox(Turnover,
        !!.x))), seq(0, 1, 0.01))) %>%
    select(-Turnover) %>%
    pivot_longer(-Month, names_to = "lambda", values_to = "Turnover") %>%
    mutate(lambda = as.numeric(lambda)) %>%
    ggplot(aes(x = Month, y = Turnover)) + geom_line() + transition_states(1 -
    lambda, state_length = 0) + view_follow() + ggtitle("Box-CoxT(lambda = {format(1 - as.numeric(closest_state), digits = 2)})") +
    hrbrthemes::theme_ipsum_rc() -> my.anim
# save_animation('./img/Anim1.gif')

Animation

Box-Cox transformations

food %>%
    features(Turnover, features = guerrero)

# A tibble: 1 × 1
  lambda_guerrero
            <dbl>
1          0.0895

This attempts to balance the seasonal fluctuations and random variation across the series.
Always check the results.
A low value of \lambda can give extremely large prediction intervals.

Box-Cox transformations

food %>%
    autoplot(box_cox(Turnover, 0.0524)) + labs(y = "Box-Cox transformed turnover") +
    hrbrthemes::theme_ipsum_rc()

Transformations

Often no transformation needed.
Simple transformations are easier to explain and work well enough.
Transformations can have very large effect on PI.
If the data contains zeros, then don’t take logs.
logp1() can be useful for data with zeros.
If some data are negative, no power transformation is possible unless a constant is added to all values.
Choosing logs is a simple way to force forecasts to be positive
Transformations must be reversed to obtain forecasts on the original scale. (Handled automatically by fable.)

Try this out…

For the following series, find an appropriate transformation in order to stabilise the variance.

United States GDP from global_economy
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock
Victorian Electricity Demand from vic_elec.
Gas production from aus_production

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

Time series components

Time series patterns

Recall

Trend pattern exists when there is a long-term increase or decrease in the data.
Cyclic pattern exists when data exhibit rises and falls that are not of fixed period (duration usually of at least 2 years).
Seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week).

A Note on DeSeasoning

us_retail_employment <- us_employment %>%
    filter(year(Month) >= 1990, Title == "Retail Trade") %>%
    select(-Series_ID)
dcmp <- us_retail_employment %>%
    model(STL(Employed))
autoplot(us_retail_employment, Employed, color = "gray") + autolayer(components(dcmp),
    season_adjust, color = "blue") + labs(y = "Persons (thousands)", title = "Total employment in US retail")

Moving Averages

The general idea is a moving window. We will set .before and .after as follows.

aus_exports <- global_economy %>%
    filter(Country == "Australia") %>%
    mutate(`5-MA` = slider::slide_dbl(Exports, mean, .before = 2, .after = 2,
        .complete = TRUE))
aus_exports

# A tsibble: 58 x 10 [1Y]
# Key:       Country [1]
   Country  Code   Year     GDP Growth   CPI Imports Exports Popul…¹ `5-MA`
   <fct>    <fct> <dbl>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
 1 Austral… AUS    1960 1.86e10  NA     7.96    14.1    13.0  1.03e7   NA  
 2 Austral… AUS    1961 1.96e10   2.49  8.14    15.0    12.4  1.05e7   NA  
 3 Austral… AUS    1962 1.99e10   1.30  8.12    12.6    13.9  1.07e7   13.5
 4 Austral… AUS    1963 2.15e10   6.21  8.17    13.8    13.0  1.10e7   13.5
 5 Austral… AUS    1964 2.38e10   6.98  8.40    13.8    14.9  1.12e7   13.6
 6 Austral… AUS    1965 2.59e10   5.98  8.69    15.3    13.2  1.14e7   13.4
 7 Austral… AUS    1966 2.73e10   2.38  8.98    15.1    12.9  1.17e7   13.3
 8 Austral… AUS    1967 3.04e10   6.30  9.29    13.9    12.9  1.18e7   12.7
 9 Austral… AUS    1968 3.27e10   5.10  9.52    14.5    12.3  1.20e7   12.6
10 Austral… AUS    1969 3.66e10   7.04  9.83    13.3    12.0  1.23e7   12.6
# … with 48 more rows, and abbreviated variable name ¹Population

autoplot(aus_exports, Exports) + autolayer(aus_exports, `5-MA`, color = "red") +
    labs(y = "Exports (% of GDP)", title = "Total Australian exports") + guides(colour = guide_legend(title = "series")) +
    hrbrthemes::theme_ipsum_rc()

We can even have moving averages of moving averages.

aus_exports2 <- aus_exports %>%
    mutate(`2x5-MA` = slider::slide_dbl(`5-MA`, mean, .before = 1, .after = 0,
        .complete = TRUE))
aus_exports2

# A tsibble: 58 x 11 [1Y]
# Key:       Country [1]
   Country  Code   Year     GDP Growth   CPI Imports Exports Popul…¹ `5-MA`
   <fct>    <fct> <dbl>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
 1 Austral… AUS    1960 1.86e10  NA     7.96    14.1    13.0  1.03e7   NA  
 2 Austral… AUS    1961 1.96e10   2.49  8.14    15.0    12.4  1.05e7   NA  
 3 Austral… AUS    1962 1.99e10   1.30  8.12    12.6    13.9  1.07e7   13.5
 4 Austral… AUS    1963 2.15e10   6.21  8.17    13.8    13.0  1.10e7   13.5
 5 Austral… AUS    1964 2.38e10   6.98  8.40    13.8    14.9  1.12e7   13.6
 6 Austral… AUS    1965 2.59e10   5.98  8.69    15.3    13.2  1.14e7   13.4
 7 Austral… AUS    1966 2.73e10   2.38  8.98    15.1    12.9  1.17e7   13.3
 8 Austral… AUS    1967 3.04e10   6.30  9.29    13.9    12.9  1.18e7   12.7
 9 Austral… AUS    1968 3.27e10   5.10  9.52    14.5    12.3  1.20e7   12.6
10 Austral… AUS    1969 3.66e10   7.04  9.83    13.3    12.0  1.23e7   12.6
# … with 48 more rows, 1 more variable: `2x5-MA` <dbl>, and abbreviated
#   variable name ¹Population

autoplot(aus_exports2, Exports) + autolayer(aus_exports2, `5-MA`, color = "red") +
    autolayer(aus_exports2, `2x5-MA`, color = "blue") + labs(y = "Exports (% of GDP)",
    title = "Total Australian exports") + guides(colour = guide_legend(title = "series")) +
    hrbrthemes::theme_ipsum_rc()

Time series decomposition

y_t = f(S_t, T_t, R_t)

where y_t=: data at period t

T_t=: trend-cycle component at period t

S_t= & seasonal component at period t

R_t= & remainder component at period t

Additive decomposition: y_t = S_t + T_t + R_t.

Multiplicative decomposition: y_t = S_t \times T_t \times R_t.

Time series decomposition

Additive model appropriate if magnitude of seasonal fluctuations does not vary with level.
If seasonal are proportional to level of series, then multiplicative model appropriate.
Multiplicative decomposition more prevalent with economic series
Alternative: use a Box-Cox transformation, and then use additive decomposition.
Logs turn multiplicative relationship into an additive relationship:

y_t = S_t \times T_t \times E_t \quad\Rightarrow\quad \log y_t = \log S_t + \log T_t + \log R_t.

US Retail Employment

us_retail_employment <- us_employment %>%
    filter(year(Month) >= 1990, Title == "Retail Trade") %>%
    select(-Series_ID)
us_retail_employment

# A tsibble: 357 x 3 [1M]
      Month Title        Employed
      <mth> <chr>           <dbl>
 1 1990 Jan Retail Trade   13256.
 2 1990 Feb Retail Trade   12966.
 3 1990 Mar Retail Trade   12938.
 4 1990 Apr Retail Trade   13012.
 5 1990 May Retail Trade   13108.
 6 1990 Jun Retail Trade   13183.
 7 1990 Jul Retail Trade   13170.
 8 1990 Aug Retail Trade   13160.
 9 1990 Sep Retail Trade   13113.
10 1990 Oct Retail Trade   13185.
# … with 347 more rows

US Retail Employment

us_retail_employment %>%
    autoplot(Employed) + xlab("Year") + ylab("Persons (thousands)") + ggtitle("Total employment in US retail") +
    hrbrthemes::theme_ipsum_rc()

USREDC <- us_retail_employment %>%
    model(classical_decomposition(Employed, type = "additive")) %>%
    components()
USREDC

# A dable: 357 x 7 [1M]
# Key:     .model [1]
# :        Employed = trend + seasonal + random
   .model                       Month Emplo…¹  trend seaso…² random seaso…³
   <chr>                        <mth>   <dbl>  <dbl>   <dbl>  <dbl>   <dbl>
 1 "classical_decomposition… 1990 Jan  13256.    NA   -75.5   NA     13331.
 2 "classical_decomposition… 1990 Feb  12966.    NA  -273.    NA     13239.
 3 "classical_decomposition… 1990 Mar  12938.    NA  -253.    NA     13191.
 4 "classical_decomposition… 1990 Apr  13012.    NA  -190.    NA     13203.
 5 "classical_decomposition… 1990 May  13108.    NA   -88.9   NA     13197.
 6 "classical_decomposition… 1990 Jun  13183.    NA   -10.4   NA     13193.
 7 "classical_decomposition… 1990 Jul  13170. 13178.  -13.3    5.65  13183.
 8 "classical_decomposition… 1990 Aug  13160. 13161.   -9.99   8.80  13169.
 9 "classical_decomposition… 1990 Sep  13113. 13141.  -87.4   59.9   13201.
10 "classical_decomposition… 1990 Oct  13185. 13117.   34.6   33.8   13151.
# … with 347 more rows, and abbreviated variable names ¹Employed,
#   ²seasonal, ³season_adjust

autoplot(USREDC) + labs(title = "Classical additive decomposition of total US retail employment") +
    hrbrthemes::theme_ipsum_rc()

US Retail Employment

us_retail_employment %>%
    model(stl = STL(Employed))

# A mable: 1 x 1
      stl
  <model>
1   <STL>

US Retail Employment

dcmp <- us_retail_employment %>%
    model(stl = STL(Employed))
components(dcmp)

# A dable: 357 x 7 [1M]
# Key:     .model [1]
# :        Employed = trend + season_year + remainder
   .model    Month Employed  trend season_year remainder season_adjust
   <chr>     <mth>    <dbl>  <dbl>       <dbl>     <dbl>         <dbl>
 1 stl    1990 Jan   13256. 13288.      -33.0      0.836        13289.
 2 stl    1990 Feb   12966. 13269.     -258.     -44.6          13224.
 3 stl    1990 Mar   12938. 13250.     -290.     -22.1          13228.
 4 stl    1990 Apr   13012. 13231.     -220.       1.05         13232.
 5 stl    1990 May   13108. 13211.     -114.      11.3          13223.
 6 stl    1990 Jun   13183. 13192.      -24.3     15.5          13207.
 7 stl    1990 Jul   13170. 13172.      -23.2     21.6          13193.
 8 stl    1990 Aug   13160. 13151.       -9.52    17.8          13169.
 9 stl    1990 Sep   13113. 13131.      -39.5     22.0          13153.
10 stl    1990 Oct   13185. 13110.       61.6     13.2          13124.
# … with 347 more rows

US Retail Employment

us_retail_employment %>%
    autoplot(Employed, color = "gray") + autolayer(components(dcmp), trend,
    color = "red") + xlab("Year") + ylab("Persons (thousands)") + ggtitle("Total employment in US retail") +
    hrbrthemes::theme_ipsum_rc()

US Retail Employment

components(dcmp) %>%
    autoplot() + xlab("Year") + hrbrthemes::theme_ipsum_rc()

US Retail Employment

components(dcmp) %>%
    gg_subseries(season_year) + hrbrthemes::theme_ipsum_rc()

Seasonal adjustment

Useful by-product of decomposition: an easy way to calculate seasonally adjusted data.
Additive decomposition: seasonally adjusted data given by y_t - S_t = T_t + R_t
Multiplicative decomposition: seasonally adjusted data given by y_t / S_t = T_t \times R_t

US Retail Employment

us_retail_employment %>%
    autoplot(Employed, color = "gray") + autolayer(components(dcmp), season_adjust,
    color = "blue") + xlab("Year") + ylab("Persons (thousands)") + ggtitle("Total employment in US retail") +
    hrbrthemes::theme_ipsum_rc()

Seasonal adjustment

We use estimates of S based on past values to seasonally adjust a current value.
Seasonally adjusted series reflect remainders as well as trend. Therefore they are not “smooth” and “downturns” or “upturns” can be misleading.
It is better to use the trend-cycle component to look for turning points.

History of time series decomposition

Classical method originated in 1920s.
Census II method introduced in 1957. Basis for X-11 method and variants (including X-12-ARIMA, X-13-ARIMA)
STL method introduced in 1983
TRAMO/SEATS introduced in 1990s.

National Statistics Offices

ABS uses X-12-ARIMA
US Census Bureau uses X-13ARIMA-SEATS
Statistics Canada uses X-12-ARIMA
ONS (UK) uses X-12-ARIMA
EuroStat use X-13ARIMA-SEATS

X-11 decomposition

Advantages

Relatively robust to outliers
Completely automated choices for trend and seasonal changes
Very widely tested on economic data over a long period of time.

Disadvantages

No prediction/confidence intervals
Ad hoc method with no underlying model
Only developed for quarterly and monthly data

X11_dcmp <- us_retail_employment %>%
    model(seats = feasts:::X_13ARIMA_SEATS(Employed)) %>%
    components()
X11_dcmp

# A dable: 357 x 7 [1M]
# Key:     .model [1]
# :        Employed = f(trend, seasonal, irregular)
   .model    Month Employed  trend seasonal irregular season_adjust
   <chr>     <mth>    <dbl>  <dbl>    <dbl>     <dbl>         <dbl>
 1 seats  1990 Jan   13256. 13261.    0.999     1.00         13266.
 2 seats  1990 Feb   12966. 13243.    0.980     0.999        13235.
 3 seats  1990 Mar   12938. 13236.    0.977     1.00         13238.
 4 seats  1990 Apr   13012. 13233.    0.983     1.00         13235.
 5 seats  1990 May   13108. 13222.    0.991     1.00         13223.
 6 seats  1990 Jun   13183. 13206.    0.998     1.00         13205.
 7 seats  1990 Jul   13170. 13187.    0.999     1.00         13190.
 8 seats  1990 Aug   13160. 13165.    1.00      1.00         13162.
 9 seats  1990 Sep   13113. 13145.    0.998     1.00         13146.
10 seats  1990 Oct   13185. 13129.    1.00      1.00         13126.
# … with 347 more rows

autoplot(X11_dcmp) + hrbrthemes::theme_ipsum_rc()

Extensions: X-12-ARIMA and X-13-ARIMA

The X-11, X-12-ARIMA and X-13-ARIMA methods are based on Census II decomposition.
These allow adjustments for trading days and other explanatory variables.
Known outliers can be omitted.
Level shifts and ramp effects can be modelled.
Missing values estimated and replaced.
Holiday factors (e.g., Easter, Labour Day) can be estimated.

X-13ARIMA-SEATS

Advantages

Model-based
Smooth trend estimate
Allows estimates at end points
Allows changing seasonality
Developed for economic data

Disadvantages

Only developed for quarterly and monthly data

The details.

seats_dcmp <- us_retail_employment %>%
    model(seats = feasts:::SEATS(Employed)) %>%
    components()
seats_dcmp

# A dable: 357 x 7 [1M]
# Key:     .model [1]
# :        Employed = trend * seasonal * irregular
   .model    Month Employed  trend seasonal irregular season_adjust
   <chr>     <mth>    <dbl>  <dbl>    <dbl>     <dbl>         <dbl>
 1 seats  1990 Jan   13256. 13261.    0.999     1.00         13266.
 2 seats  1990 Feb   12966. 13243.    0.980     0.999        13235.
 3 seats  1990 Mar   12938. 13236.    0.977     1.00         13238.
 4 seats  1990 Apr   13012. 13233.    0.983     1.00         13235.
 5 seats  1990 May   13108. 13222.    0.991     1.00         13223.
 6 seats  1990 Jun   13183. 13206.    0.998     1.00         13205.
 7 seats  1990 Jul   13170. 13187.    0.999     1.00         13190.
 8 seats  1990 Aug   13160. 13165.    1.00      1.00         13162.
 9 seats  1990 Sep   13113. 13145.    0.998     1.00         13146.
10 seats  1990 Oct   13185. 13129.    1.00      1.00         13126.
# … with 347 more rows

A Plot

autoplot(seats_dcmp) + labs(title = "SEATS decomposition of total US retail employment") +
    hrbrthemes::theme_ipsum_rc()

STL decomposition

STL: “Seasonal and Trend decomposition using Loess”
Very versatile and robust.
Unlike X-12-ARIMA, STL will handle any type of seasonality.
Seasonal component allowed to change over time, and rate of change controlled by user.
Smoothness of trend-cycle also controlled by user.
Robust to outliers
Not trading day or calendar adjustments.
Only additive.
Take logs to get multiplicative decomposition.
Use Box-Cox transformations to get other decompositions.

STL decomposition

us_retail_employment %>%
    model(STL(Employed ~ season(window = 9), robust = TRUE)) %>%
    components() %>%
    autoplot() + ggtitle("STL decomposition: US retail employment")

STL decomposition

s_windows <- seq(5, 55, by = 2)
stl_defs <- purrr::map(s_windows, function(s_window) {
    STL(Employed ~ season(window = s_window), robust = TRUE)
})
names(stl_defs) <- sprintf("season(window=%02d)", s_windows)

us_retail_employment %>%
    model(!!!stl_defs) %>%
    components() %>%
    as_tibble() %>%
    pivot_longer(Employed:remainder, names_to = "component", names_ptypes = list(component = factor(levels = c("Employed",
        "trend", "season_year", "remainder"))), values_to = "Employed") %>%
    ggplot(aes(x = Month, y = Employed)) + geom_line() + facet_grid(rows = vars(component),
    scales = "free_y") + labs(title = "STL decomposition of US retail employment",
    subtitle = "{closest_state}") + transition_states(.model)

STL decomposition

us_retail_employment %>%
    model(STL(Employed ~ season(window = 5))) %>%
    components()

us_retail_employment %>%
    model(STL(Employed ~ trend(window = 15) + season(window = "periodic"), robust = TRUE)) %>%
    components()

trend(window = ?) controls wiggliness of trend component.
season(window = ?) controls variation on seasonal component.
season(window = 'periodic') is equivalent to an infinite window.

STL decomposition

us_retail_employment %>%
    model(STL(Employed)) %>%
    components() %>%
    autoplot()

STL() chooses season(window=13) by default
Can include transformations.

STL decomposition

Algorithm that updates trend and seasonal components iteratively.
Starts with \hat{T}_t=0
Uses a mixture of loess and moving averages to successively refine the trend and seasonal estimates.
The trend window controls loess bandwidth applied to deasonalised values.
The season window controls loess bandwidth applied to detrended subseries.
Robustness weights based on remainder.
Default season window = 13 0 Default trend window = nextodd(ceiling((1.5*period)/(1-(1.5/s.window)))

When things go wrong

The ABS stuff-up

employed

# A tsibble: 440 x 4 [1M]
       Time Month  Year Employed
      <mth> <ord> <dbl>    <dbl>
 1 1978 Feb Feb    1978    5986.
 2 1978 Mar Mar    1978    6041.
 3 1978 Apr Apr    1978    6054.
 4 1978 May May    1978    6038.
 5 1978 Jun Jun    1978    6031.
 6 1978 Jul Jul    1978    6036.
 7 1978 Aug Aug    1978    6005.
 8 1978 Sep Sep    1978    6024.
 9 1978 Oct Oct    1978    6046.
10 1978 Nov Nov    1978    6034.
# … with 430 more rows

The ABS stuff-up

Details:

employed %>%
    autoplot(Employed) + ggtitle("Total employed") + ylab("Thousands") + xlab("Year")

The ABS stuff-up

employed %>%
    filter(Year >= 2005) %>%
    autoplot(Employed) + ggtitle("Total employed") + ylab("Thousands") + xlab("Year")

The ABS stuff-up

employed %>%
    filter(Year >= 2005) %>%
    gg_season(Employed) + ggtitle("Total employed") + ylab("Thousands")

The ABS stuff-up

employed %>%
    mutate(diff = difference(Employed)) %>%
    filter(Month == "Sep") %>%
    ggplot(aes(y = diff, x = 1)) + geom_boxplot() + coord_flip() + ggtitle("Sep - Aug: total employed") +
    xlab("") + ylab("Thousands") + scale_x_continuous(breaks = NULL, labels = NULL)

The ABS stuff-up

dcmp <- employed %>%
    filter(Year >= 2005) %>%
    model(stl = STL(Employed ~ season(window = 11), robust = TRUE))
components(dcmp) %>%
    autoplot()

The ABS stuff-up

components(dcmp) %>%
    filter(year(Time) == 2013) %>%
    gg_season(season_year) + ggtitle("Seasonal component") + guides(colour = "none")

The ABS stuff-up

components(dcmp) %>%
    as_tsibble() %>%
    autoplot(season_adjust)

The ABS stuff-up

August 2014 employment numbers higher than expected.
Supplementary survey usually conducted in August for employed people.
Most likely, some employed people were claiming to be unemployed in August to avoid supplementary questions.
Supplementary survey not run in 2014, so no motivation to lie about employment.
In previous years, seasonal adjustment fixed the problem.
The ABS has now adopted a new method to avoid the bias.

Some Data for Today and General Considerations

Panel data. Multiple time series are often described as a panel, a cross-section of time series, or a time series of cross-sections. The data structure has two [non-overlapping] indices. Let’s review, and discuss a bit, what exactly we mean.

Extending the Data

fredr is amazing.

US.Employment <- map_dfr(
c(rownames(table(us_employment$Series_ID))), ~fredr::fredr_series_observations(.))
save(US.Employment, file="USEmployment.RData")
load(url("https://github.com/robertwwalker/xaringan/raw/master/CMF-Week-9/USEmployment.RData"))

load("USEmployment.RData")
us_employment %>%
    data.frame() %>%
    group_by(Series_ID) %>%
    summarise(Title = first(Title)) %>%
    mutate(series_id = Series_ID) %>%
    ungroup() %>%
    select(-Series_ID) -> Names.List
US.Employment.T <- left_join(US.Employment, Names.List, by = c(series_id = "series_id")) %>%
    mutate(YM = yearmonth(date)) %>%
    rename(Employed = value) %>%
    as_tsibble(., index = YM, key = Title)

Additional Features

For much of the study of time series, the key issue is one known as stationarity. For now, we will do at least some hand waving to be clarified in chapters 5 and more in 9. But we want to compute things and then build out all the details. Let’s take my new retail employment data.

A Recreation on New Data

EMPN <- US.Employment.T %>%
    filter(YM > yearmonth("1990-01") & Title == "Retail Trade") %>%
    as_tsibble(index = YM)
EMPO <- us_employment %>%
    filter(Title == "Retail Trade" & Month > yearmonth("1990-01")) %>%
    as_tsibble(., index = Month)
Plot1 <- ggplot(EMPN, aes(x = YM, y = Employed)) + geom_line(color = "red") +
    geom_line(data = EMPO, aes(x = Month, y = Employed), inherit.aes = FALSE)
Plot1

Data are Revised Occasionally

library(patchwork)
dcmp <- EMPO %>%
    model(stl = STL(Employed))
Plot2 <- components(dcmp) %>%
    autoplot()
dcmp <- EMPN %>%
    model(stl = STL(Employed))
Plot3 <- components(dcmp) %>%
    autoplot()
Plot1/(Plot2 + Plot3)

Three Sectors

USET <- US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Retail Trade", "Financial Activities",
        "Manufacturing")) %>%
    as_tsibble(., index = YM, key = Title)
USET %>%
    autoplot(Employed)

Retail (season)

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Retail Trade")) %>%
    as_tsibble(., index = YM) %>%
    gg_season(Employed)

Retail (subseries)

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Retail Trade")) %>%
    as_tsibble(., index = YM) %>%
    gg_subseries(Employed)

Retail (lag)

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Retail Trade")) %>%
    as_tsibble(., index = YM) %>%
    gg_lag(Employed)

Manufacturing

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Manufacturing")) %>%
    as_tsibble(., index = YM) %>%
    gg_season(Employed)

Manufacturing

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Manufacturing")) %>%
    as_tsibble(., index = YM) %>%
    gg_subseries(Employed)

Manufacturing

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Manufacturing")) %>%
    as_tsibble(., index = YM) %>%
    gg_lag(Employed)

Financial

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Financial Activities")) %>%
    as_tsibble(., index = YM) %>%
    gg_season(Employed)

Financial

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Financial Activities")) %>%
    as_tsibble(., index = YM) %>%
    gg_subseries(Employed)

Financial

US.Employment.T %>%
    filter(YM > yearmonth("1990-01"), Title %in% c("Financial Activities")) %>%
    as_tsibble(., index = YM) %>%
    gg_lag(Employed)

Features: Summary

The features command is the magic tool for tidy summary and statistics for time series in this index/key format. For example, basic summary

USET %>%
    features(Employed, features = list(mean = mean, min = min, max = max, sd = sd,
        quantile))

# A tibble: 3 × 10
  Title         mean    min    max    sd   `0%`  `25%`  `50%`  `75%` `100%`
  <chr>        <dbl>  <dbl>  <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 Financial …  7767.  6472   8846   640.  6472   7363   7876   8226.  8846 
2 Manufactur… 14554. 11340  17870  2241. 11340  12333. 14219  17088  17870 
3 Retail Tra… 14746. 12548. 16394.  915. 12548. 14336. 14962. 15387. 16394.

Features: Correlation Features

Learning about the time series properties

USET %>%
    features(Employed, features = feat_acf)

# A tibble: 3 × 8
  Title                 acf1 acf10 diff1_…¹ diff1…² diff2…³ diff2…⁴ seaso…⁵
  <chr>                <dbl> <dbl>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 Financial Activities 0.990  8.94   0.283    0.165  -0.313   0.415   0.883
2 Manufacturing        0.995  9.35   0.0466   0.128  -0.499   0.505   0.925
3 Retail Trade         0.951  7.29   0.133    0.377  -0.198   0.305   0.876
# … with abbreviated variable names ¹diff1_acf1, ²diff1_acf10,
#   ³diff2_acf1, ⁴diff2_acf10, ⁵season_acf1

USET %>%
    group_by(Title) %>%
    ACF(Employed) %>%
    autoplot()

For Contrast: Ford Returns

library(tidyquant)
Ford <- tq_get("F", from = "2000-01-01")
FordT <- Ford %>%
    as_tsibble(index = date)
FordT %>%
    autoplot(adjusted)

FC <- Ford %>%
    tq_transmute(adjusted, mutate_fun = periodReturn, period = "monthly") %>%
    mutate(YM = yearmonth(date)) %>%
    as_tsibble(., index = YM)
FC %>%
    autoplot(monthly.returns)

Ford’s ACF

The 6/7 and 12/13 patterns are interesting….

library(patchwork)
FC1 <- FC %>%
    ACF(monthly.returns) %>%
    autoplot()
FC2 <- FC %>%
    PACF(monthly.returns) %>%
    autoplot()
FC1 + FC2

Decomposition Features

USET %>%
    features(Employed, feat_stl)

# A tibble: 3 × 10
  Title     trend…¹ seaso…² seaso…³ seaso…⁴ spiki…⁵ linea…⁶ curva…⁷ stl_e…⁸
  <chr>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 Financia…   0.999   0.820       6       3  1.28e1  10603.  -3014.   0.718
2 Manufact…   0.999   0.555       8       3  8.41e3 -39200.   4365.   0.612
3 Retail T…   0.983   0.834      11       3  1.11e5  13098.  -6397.   0.506
# … with 1 more variable: stl_e_acf10 <dbl>, and abbreviated variable
#   names ¹trend_strength, ²seasonal_strength_year, ³seasonal_peak_year,
#   ⁴seasonal_trough_year, ⁵spikiness, ⁶linearity, ⁷curvature, ⁸stl_e_acf1

With More Data

NUSET8k <- US.Employment.T %>%
    data.frame() %>%
    group_by(Title) %>%
    summarise(MaxE = max(Employed)) %>%
    arrange(desc(MaxE)) %>%
    filter(MaxE > 8000 & MaxE < 120000)
USET8k <- left_join(NUSET8k, US.Employment.T) %>%
    as_tsibble(., index = YM, key = Title)

An Improvement on the Trend/Season

The details are at the bottom for other statistics.

library(kableExtra)
USET8k %>%
    features(Employed, feat_stl) %>%
    knitr::kable(format = "html") %>%
    scroll_box(width = "100%", height = "300px")

Title	trend_strength	seasonal_strength_year	seasonal_peak_year	seasonal_trough_year	spikiness	linearity	curvature	stl_e_acf1	stl_e_acf10
Construction	0.999	0.963	8	2	2.10e+02	53918	-1084.7	0.576	0.526
Durable Goods	0.994	0.215	9	7	3.42e+03	1295	-40394.6	0.750	1.251
Education and Health Services	1.000	0.715	11	7	7.21e+03	217955	58537.7	0.529	0.580
Education and Health Services: Health Care	0.999	0.354	0	4	1.59e+04	46025	-436.7	0.495	0.533
Education and Health Services: Health Care and Social Assistance	0.999	0.313	0	4	6.19e+04	63326	17.6	0.517	0.584
Financial Activities	1.000	0.870	7	4	6.92e-01	78437	-273.1	0.723	0.879
Goods-Producing	0.996	0.812	9	2	1.12e+04	43343	-64790.2	0.734	1.081
Government	1.000	0.981	11	7	3.94e+02	190941	-19815.3	0.599	0.538
Government: Local Government	1.000	0.986	5	7	3.14e+02	96008	-15450.2	0.640	0.692
Government: Local Government Education	1.000	0.996	3	7	3.54e+01	54936	-8479.9	0.538	0.480
Leisure and Hospitality	0.997	0.607	7	4	5.47e+05	136395	22427.3	0.522	0.604
Leisure and Hospitality: Accommodation and Food Services	0.971	0.473	8	4	5.27e+06	31615	-1157.7	0.499	0.541
Leisure and Hospitality: Food Services and Drinking Places	0.975	0.404	6	4	3.26e+06	29889	-377.9	0.476	0.497
Manufacturing	0.997	0.434	9	2	5.19e+03	-8507	-64155.3	0.772	1.262
Private Service-Providing	1.000	0.543	0	4	1.50e+07	906495	115965.6	0.545	0.623
Professional and Business Services	1.000	0.675	11	1	3.62e+03	187091	41544.7	0.626	0.810
Professional and Business Services: Administrative and Support Services	0.995	0.808	11	1	1.84e+04	21351	-8027.1	0.617	0.797
Professional and Business Services: Administrative and Waste Services	0.995	0.810	11	1	1.93e+04	22521	-8025.8	0.617	0.804
Professional and Business Services: Professional and Technical Services	1.000	0.691	2	5	1.81e+02	29125	-1380.0	0.626	0.719
Retail Trade	1.000	0.881	0	4	5.78e+03	135654	-6877.1	0.511	0.473
Trade, Transportation, and Utilities	1.000	0.845	0	4	2.02e+04	211628	-7668.1	0.566	0.583

`coef_hurst`

A measure of the degree to which adjacent observations depend on one another over time. Generically, this statistic takes values between zero and one with one indicating very high levels of dependence through time.

USET %>%
    features(Employed, coef_hurst)

# A tibble: 3 × 2
  Title                coef_hurst
  <chr>                     <dbl>
1 Financial Activities      1.00 
2 Manufacturing             1.00 
3 Retail Trade              0.999

Middling for Ford

FC %>%
    features(monthly.returns, features = coef_hurst)

# A tibble: 1 × 1
  coef_hurst
       <dbl>
1      0.500

`feat_spectral`

USET %>%
    features(Employed, feat_spectral)

# A tibble: 3 × 2
  Title                spectral_entropy
  <chr>                           <dbl>
1 Financial Activities            0.263
2 Manufacturing                   0.180
3 Retail Trade                    0.412

FC %>%
    features(monthly.returns, features = feat_spectral)

# A tibble: 1 × 1
  spectral_entropy
             <dbl>
1            0.988

The Absence of Correlation

Ljung-Box modifies the idea in the Box-Pierce statistic for assessing whether or not a given series [or transformation thereof] is essentially uncorrelated. In both cases, we will get to the details next week [chapter 5]. For now, the idea is simply that k squared autocorrelations will sum to a chi-squared distribution with k degrees of freedom. Large correlations reveal dependence.

USET %>%
    features(Employed, features = list(box_pierce, ljung_box))

# A tibble: 3 × 5
  Title                bp_stat bp_pvalue lb_stat lb_pvalue
  <chr>                  <dbl>     <dbl>   <dbl>     <dbl>
1 Financial Activities    365.         0    368.         0
2 Manufacturing           368.         0    371.         0
3 Retail Trade            337.         0    339.         0

FC %>%
    features(monthly.returns, features = list(box_pierce, ljung_box))

# A tibble: 1 × 4
  bp_stat bp_pvalue lb_stat lb_pvalue
    <dbl>     <dbl>   <dbl>     <dbl>
1  0.0359     0.850  0.0363     0.849

`feat_pacf`

USET %>%
    features(Employed, feat_pacf)

# A tibble: 3 × 5
  Title                pacf5 diff1_pacf5 diff2_pacf5 season_pacf
  <chr>                <dbl>       <dbl>       <dbl>       <dbl>
1 Financial Activities 0.987       0.712       1.00      -0.0555
2 Manufacturing        0.994       0.238       0.791      0.0348
3 Retail Trade         1.08        0.834       1.06      -0.0188

FC %>%
    features(monthly.returns, features = feat_pacf)

# A tibble: 1 × 4
    pacf5 diff1_pacf5 diff2_pacf5 season_pacf
    <dbl>       <dbl>       <dbl>       <dbl>
1 0.00612       0.632        1.04       0.109

Unit Roots

The stationarity issue from earlier is given much attention. Can we reasonably think of characteristics as fixed? There are three means of assessment with details to Chapter 9.

USET %>%
    features(Employed, features = list(unitroot_kpss, unitroot_pp, unitroot_ndiffs,
        unitroot_nsdiffs)) %>%
    knitr::kable(format = "html")

Title	kpss_stat	kpss_pvalue	pp_stat	pp_pvalue	ndiffs	nsdiffs
Financial Activities	4.63	0.01	-1.193	0.100	1	1
Manufacturing	5.68	0.01	-0.938	0.100	1	0
Retail Trade	3.91	0.01	-2.636	0.089	1	1

FC %>%
    features(monthly.returns, features = list(unitroot_kpss, unitroot_pp, unitroot_ndiffs,
        unitroot_nsdiffs))

# A tibble: 1 × 6
  kpss_stat kpss_pvalue pp_stat pp_pvalue ndiffs nsdiffs
      <dbl>       <dbl>   <dbl>     <dbl>  <int>   <int>
1    0.0895         0.1   -16.6      0.01      0       0

Tiling

A reminder

USET %>%
    features(Employed, features = list(var_tiled_mean, var_tiled_var))

# A tibble: 3 × 3
  Title                var_tiled_mean var_tiled_var
  <chr>                         <dbl>         <dbl>
1 Financial Activities          1.02      0.0000411
2 Manufacturing                 1.03      0.0000923
3 Retail Trade                  0.922     0.0136

FC %>%
    features(monthly.returns, features = list(var_tiled_mean, var_tiled_var))

# A tibble: 1 × 2
  var_tiled_mean var_tiled_var
           <dbl>         <dbl>
1          0.118          2.59

Detecting Shifts

USET %>%
    features(Employed, features = list(shift_level_max, shift_var_max, shift_kl_max)) %>%
    kable(format = "html")

Title	shift_level_max	shift_level_index	shift_var_max	shift_var_index	shift_kl_max	shift_kl_index
Financial Activities	371	229	24037	233	0.299	227
Manufacturing	1559	228	417020	235	0.522	227
Retail Trade	777	226	788931	354	1.841	227

FC %>%
    features(monthly.returns, features = list(shift_level_max, shift_var_max,
        shift_kl_max)) %>%
    kable(format = "html")

shift_level_max	shift_level_index	shift_var_max	shift_var_index	shift_kl_max	shift_kl_index
0.258	110	0.194	113	36.8	112

Crossings and Flat Spots

USET %>%
    features(Employed, features = list(n_crossing_points, longest_flat_spot)) %>%
    kable(format = "html")

Title	n_crossing_points	longest_flat_spot
Financial Activities	5	40
Manufacturing	11	52
Retail Trade	31	10

FC %>%
    features(monthly.returns, features = list(n_crossing_points, longest_flat_spot)) %>%
    kable(format = "html")

n_crossing_points	longest_flat_spot
129	8

ARCH

What proportion of the current squared residual is explained by the prior squared residual? This reports R^2; if the variance explained is large, volatility is persistent. There is a chi-square statistic also.

USET %>%
    features(Employed, features = stat_arch_lm) %>%
    kable(format = "html")

Title	stat_arch_lm
Financial Activities	0.989
Manufacturing	0.972
Retail Trade	0.917

FC %>%
    features(monthly.returns, features = stat_arch_lm) %>%
    kable(format = "html")

stat_arch_lm
0.05

The Box-Cox

USET %>%
    features(Employed, features = guerrero) %>%
    kable(format = "html")

Title	lambda_guerrero
Financial Activities	0.948
Manufacturing	1.037
Retail Trade	1.186

FC %>%
    features(monthly.returns, features = guerrero) %>%
    kable(format = "html")

lambda_guerrero
0.78

USET %>%
    features(Employed, features = guerrero)

# A tibble: 3 × 2
  Title                lambda_guerrero
  <chr>                          <dbl>
1 Financial Activities           0.948
2 Manufacturing                  1.04 
3 Retail Trade                   1.19

Filtered Manufacturing

USET %>%
    filter(Title == "Manufacturing") %>%
    mutate(Filt = box_cox(Employed, 1.0369662)) %>%
    select(YM, Filt, Employed) %>%
    pivot_longer(c(Filt, Employed)) %>%
    autoplot(value)

USET %>%
    filter(Title == "Financial Activities") %>%
    autoplot(box_cox(Employed, 0.9481456))

USET %>%
    filter(Title == "Retail Trade") %>%
    autoplot(box_cox(Employed, 1.1860464))

FC %>%
    features(monthly.returns, features = guerrero)

# A tibble: 1 × 1
  lambda_guerrero
            <dbl>
1           0.780

FC %>%
    autoplot(box_cox(monthly.returns, 0.6857523))

Australian Tourism

The example is great.

Principal Components

Lets walk through this example.