The decomposition_model function allows for a separate approach to the seasonal and seasonally adjusted components. The latter is often combined with a model covered later including Holt’s method or ARIMA models.
Two Examples
us_retail_employment <- us_employment %>%filter(year(Month) >=1990, Title =="Retail Trade")fit_dcmp <- us_retail_employment %>%model(stlf =decomposition_model(STL(Employed ~trend(window =7), robust =TRUE),NAIVE(season_adjust)))fit_dcmp %>%forecast() %>%autoplot(us_retail_employment) +labs(y ="Number of people", title ="US retail employment")
fit_dcmp %>%gg_tsresiduals()
fit_dcmp <- us_retail_employment %>%model(stlf =decomposition_model(STL(Employed ~trend(window =7), robust =TRUE),RW(season_adjust)))fit_dcmp %>%forecast() %>%autoplot(us_retail_employment) +labs(y ="Number of people", title ="US retail employment")
fit_dcmp %>%gg_tsresiduals()
Evaluating Forecast Accuracy of Points
On training and test [usually 20%]
filter and slice
Scale dependent errors
MAE: mean absolute error
RMSE: root mean squared error
Percentage errors
MAPE: mean absolute percentage error
sMAPE: symmetric MAPE
Scaled errors [def in 5.8]
MASE: mean absolute scaled error
RMSSE: root mean square scaled error
accuracy
Evaluating Accuracy of Distributions
accuracy(., list(...))
Quantile scores quantile_score
Winkler scores winkler_score
Continuous ranked probability scores CRPS
Skill scores skill_score() with a statistic for comparison.
Time Series Cross-Validation
Multiple training and test sets based on cross-validation ideas. NB: Dates are discontinuous.
The core ideas carry over from any regression context.
y_{t} = \alpha + \beta*X_{t} + \epsilon_{t}
with data subscripted by time. Patterns over time will be key but there is nothing special except that y will be a linear function of x with all the pattern inheritance that this implies.
fit %>% gg_tsresiduals() and augment(fit) %>% features(.innov, ljung_box, lag = ??, dof = ??) for time series information in residuals.
plot of .innov or .resid against the predictors. This many require joining the residuals to the original tsibble, e.g. data %>% left_join(residuals(fit), by="Index").
plot of residuals [.resid] and fitted.values [.fitted].
outliers and influence are always worthwhile to consider. Regular regression diagnostics are relevant here. It is just a tibble with a index: timetk.
Handy predictors
trend()
season() the dummy/binary variable month(label=TRUE), weekday and the trap
interventions, trading days, holidays, and distributed lags.
the Fourier series.
Predictor Selection
Fit criterion. A reference.
Best subsets and stepwise.
Inference and stepwise.
Forecasting
ex ante and and ex post dynamic models come up in chapter 10.
intervals and their dynamism merit attention.
Forecasting and Scenarios
Take the model of quarterly consumption as a function of income.
Nonlinearity; they argue against nonlinear trends. So now what? Knots. Trends are not always monotonic. Witness the example of population in Afghanistan. NB: But this probably gets unrealistic pretty fast.
global_economy %>%filter(Country =="Afghanistan") %>%autoplot(Population/1e+06) +labs(y ="Population (millions)") +geom_ribbon(aes(xmin =1979.98,xmax =1989.13), fill ="pink", alpha =0.4) +annotate("text", x =1984.5,y =10, label ="Soviet-Afghan war", col ="red", size =3)
mable stores models. The rows are given by the key and the columns are given by the model names. For diagnostics on a particular model/subset, we will need a filter/select combination.
fc <- fit %>%forecast(h ="5 years")autoplot(fc) +autolayer(filter(global_economy %>%filter(Country =="Afghanistan")), Population)
Simple Exponential Smoothing
Appropriate for absence of seasons and trends. We could use naive (most recent observation receives all the weight, \alpha=1) and average (all values equally weighted, \alpha=0) methods for such data. Simple exponential smoothing occupies the space between with parameter \alpha such that
Describe the outcome as two separate but combineable forms: a Forecast equation \hat{y}_{t+h|t} = S_{t} and a Smoothing Equation S_{t} = \alpha y_{t} + (1-\alpha) S_{t-1}.
If Y_{t} is a stationary time series, then for all s, the distribution y_{t}, ..., y_{t+s} doesn’t depend on t.
The implies that - the mean (roughly horizontal), - the variance (constant variance), - and the covariance (no predictable long-term patterns) between elements of Y are constants.
as_tsibble(fma::strikes) %>%autoplot(value) +ylab("Number of strikes") +xlab("Year") +theme_ipsum_rc()
Stationary?
as_tsibble(fma::hsales) %>%autoplot(value) +labs(x ="Year", y ="Total sales", title ="Sales of new one-family houses, USA") +theme_ipsum_rc()
Stationary?
as_tsibble(fma::eggs) %>%autoplot(value) +labs(x ="Year", y ="$", title ="Price of a dozen eggs in 1993 dollars") +theme_ipsum_rc()
Stationary?
aus_livestock %>%filter(Animal =="Pigs", State =="Victoria", year(Month) >=2010) %>%autoplot(Count/1000) +labs(x ="Year", y ="thousands", title ="Number of pigs slaughtered in Victoria") +theme_ipsum_rc()
Stationary?
pelt %>%autoplot(Lynx) +labs(x ="Year", y ="Number trapped", title ="Annual Canadian Lynx Trappings") +theme_ipsum_rc()
Definition
If \{y_t\} is a stationary time series, then for all s, the distribution of (y_t,\dots,y_{t+s}) does not depend on t.
Transformations help to stabilize the variance.
For ARIMA modelling, we also need to stabilize the mean.
Non-stationarity in the mean
Identifying non-stationary series
time plot.
The ACF of stationary data drops to zero relatively quickly
The ACF of non-stationary data decreases slowly.
For non-stationary data, the value of r_1 is often large and positive.
The price is unlikely stationary, the change appears to be.
Differencing
Definition:
- The differenced series is the change between each observation in the original series: y'_t = y_t - y_{t-1}.
Differencing helps to stabilize the mean.
The differenced series will have only T-1 values since it is not possible to calculate a difference y_1' for the first observation.
Sometimes one isn’t enough
Instead of the change, we need model the change in change – a second difference.
y^{''}_{t} = y^{'}_{t} - y^{'}_{t - 1}
y^{''}_{t} = (y_t - y_{t-1}) - (y_{t-1}-y_{t-2})
y^{''}_{t} = y_t - 2y_{t-1} +y_{t-2}
y_t'' will have T-2 values.
In practice, it is almost never necessary to go beyond second-order differences.
Seasonal differencing
A seasonal difference is the difference between an observation and the corresponding observation from the previous year. y^{'seasonal}_t = y_t - y_{t-m} where m= number of seasons.
For monthly data m=12.
For quarterly data m=4.
Electricity Generation
The oscillation appears more variable in the latter half than the former half.
Seasonally differenced series is closer to being stationary.
Remaining non-stationarity can be removed with further first difference.
If y'_t = y_t - y_{t-12} denotes seasonally differenced series, then twice-differenced series is
y^*_t = y'_t - y'_{t-1}
y^*_t = (y_t - y_{t-12}) - (y_{t-1} - y_{t-13})
y^*_t = y_t - y_{t-1} - y_{t-12} + y_{t-13}
Seasonal differencing
When both seasonal and first differences are applied \dots
it makes no difference which is done first – the result will be the same.
If seasonality is strong, we recommend that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for further first difference.
It is important that if differencing is used, the differences are interpretable.
Interpretation of differencing
first differences are the change between one observation and the next;
seasonal differences are the change between one year to the next.
But taking lag 3 differences for yearly data, for example, results in a model which cannot be sensibly interpreted.
Unit root tests
Statistical tests can suggest the required order of differencing.
Augmented Dickey Fuller test: null hypothesis is that the data are non-stationary and non-seasonal.
Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: null hypothesis is that the data are stationary and non-seasonal.
Other tests available for seasonal data.
You May Recall from Time Series features
KPSS test
Google prices won’t work; differences are indicated.
google_2018 %>%features(Close, features =list(unitroot_kpss, unitroot_ndiffs))
# A tibble: 1 × 4
Symbol kpss_stat kpss_pvalue ndiffs
<chr> <dbl> <dbl> <int>
1 GOOG 0.573 0.0252 1
For the tourism dataset, compute the total number of trips and find an appropriate differencing (after transformation if necessary) to obtain stationary data.
tourism %>%summarise(Trips =sum(Trips)) %>%autoplot() +labs(title ="Total trips in tourism", x ="Quarter", y ="Trips") +theme_ipsum_rc()
The Backshift Operator
A very useful notational device is the backward shift operator, B, which is used as follows: B y_{t} = y_{t - 1}
In other words, B, operating on y_{t}, has the effect of shifting the data back one period. Two applications of B to y_{t}shifts the data back two periods: B(By_{t}) = B^{2}y_{t} = y_{t-2}
For monthly data, if we wish to shift attention to “the same month last year”, then B^{12} is used, and the notation is B^{12}y_{t} = y_{t-12}.
Backshift notation
The backward shift operator is convenient for describing the process of differencing.
A first difference can be written as y'_{t} = y_{t} - y_{t-1} = y_t - By_{t} = (1 - B)y_{t}
Note that a first difference is represented by (1 - B).
Similarly, if second-order differences (i.e., first differences of first differences) have to be computed, then:
Any MA( q ) process can be written as an AR( \infty ) process if we impose some constraints on the MA parameters.
Then the MA model is called invertible.
Invertible models have some mathematical properties that make them easier to use in practice.
Invertibility of an ARIMA model is equivalent to forecastability of an ETS model.
Invertibility
General condition for invertibility: Complex roots of 1+\theta_1 z + \theta_2 z^2 + \dots + \theta_qz^q lie outside the unit circle on the complex plane.
where L is the maximised likelihood fitted to the differenced data, k=1 if c\neq 0 and k=0 otherwise.
Step 1: Select current model (with smallest AICc) from:
ARIMA(2,d,2)
ARIMA(0,d,0)
ARIMA(1,d,0)
ARIMA(0,d,1)
Step 2: Consider variations of current model - vary one of p,q, from current model by \pm1; - p,q both vary from current model by \pm1; - Include/exclude c from current model.
If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
If the data are non-stationary: take first differences of the data until the data are stationary. 4. Examine the ACF/PACF: Is an AR(p) or MA(q) model appropriate?
Try your chosen model(s), and use the AICc to search for a better model.
Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
Once the residuals look like white noise, calculate forecasts.
Automatic modelling procedure with ARIMA()
Plot the data. Identify any unusual observations.
If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
Use ARIMA to automatically select a model.
Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
Once the residuals look like white noise, calculate forecasts.
Time plot shows sudden changes, particularly big drop in 2008/2009 due to global economic environment. Otherwise nothing unusual and no need for data adjustments.
No evidence of changing variance, so no Box-Cox transformation.
Data are clearly non-stationary, so we take first differences.
The US Census Bureau uses the following models most often:
ARIMA(0,1,1)(0,1,1)_m& with log transformation
ARIMA(0,1,2)(0,1,1)_m& with log transformation
ARIMA(2,1,0)(0,1,1)_m& with log transformation
ARIMA(0,2,2)(0,1,1)_m& with log transformation
ARIMA(2,1,2)(0,1,1)_m& with no transformation
Seasonal ARIMA models
The seasonal part of an AR or MA model will be seen in the seasonal lags of the PACF and ACF.
ARIMA(0,0,0)(0,0,1)_{12} will show:
a spike at lag 12 in the ACF but no other significant spikes.
The PACF will show exponential decay in the seasonal lags; that is, at lags 12, 24, 36, .