class: center, middle, inverse, title-slide .title[ # Time Series: Reconciliation and Advanced Methods ] .subtitle[ ## FPP3, Chapter 11 and 12 ] .author[ ### Robert W. Walker ] .institute[ ### AGSM ] .date[ ### 2022-12-05 ] --- # An Overview Expanding the space of relevant models still operates in the circle of chapter 5. ![Forecasting Diagram](img/Workflow.PNG) --- ### An Overview Expanding the space of relevant models still operates in the circle of chapter 5. + Linear Models: tidying up + Univariate 1: Exponential Smoothing + Univariate 2: ARIMA models + Dynamic Regression Tying it Together + Hierarchical and Grouped Data: **Forecasting extensions** + Advanced Forecasting **Extending the ties** ![Forecasting Diagram](img/Workflow.PNG) --- ## Packages Getting started ```r library(forecast) library(tidyverse) library(fpp3) library(purrr) library(gganimate) library(seasonal) library(tidyquant) library(magrittr) vic_elec_daily <- vic_elec %>% filter(year(Time) == 2014) %>% index_by(Date = date(Time)) %>% summarise( Demand = sum(Demand)/1e3, Temperature = max(Temperature), Holiday = any(Holiday) ) %>% mutate(Day_Type = case_when( Holiday ~ "Holiday", wday(Date) %in% 2:6 ~ "Weekday", TRUE ~ "Weekend" )) ``` --- # The magic of `forecast` The method: 1. Tidy 2. Visualise 3. Model a. Specify b. Estimate c. Evaluate d. Visualize 4. Forecast --- # Models ``` 3. Model a. Specify b. Estimate c. Evaluate d. Visualize ``` --- # Hierarchies and Groups The key issue is internal consistency. Three approaches: + Bottom up + Top-down + Middle out `reconcile` is magic. --- # Complex seasonality --- ## Examples ```r gasoline <- as_tsibble(fpp2::gasoline) gasoline %>% autoplot(value) + labs(x = "Year", y = "Thousands of barrels per day", title = "Weekly US finished motor gasoline products") ``` <img src="index_files/figure-html/gasolinedata-1.png" width="576" /> --- ## Examples ```r calls <- read_tsv("http://robjhyndman.com/data/callcenter.txt") %>% gather("date", "volume", -X1) %>% transmute( time = X1, date = as.Date(date, format = "%d/%m/%Y"), datetime = as_datetime(date) + time, volume) %>% as_tsibble(index = datetime) calls %>% fill_gaps() %>% autoplot(volume) + labs(x = "Weeks", y = "Call volume", title = "5 minute call volume at North American bank") ``` --- ## Examples --- ## Examples ```r library(sugrrants) calls %>% filter(yearmonth(date) == yearmonth("2003 August")) %>% ggplot(aes(x = time, y = volume)) + geom_line() + facet_calendar(date) + labs(x = "Weeks", y = "Call volume", title = "5 minute call volume at North American bank") ``` --- ## Examples --- ## Examples ```r telec <- read_csv(url("https://raw.githubusercontent.com/robjhyndman/ETC3550Slides/eba103c4c86c0634df6cf3ee4a81dc2803c198b6/data/turkey_elec.csv"), col_names = "Demand") turkey_elec <- telec %>% mutate(Date = seq(ymd("2000-01-01"), ymd("2008-12-31"), by = "day")) %>% as_tsibble(index = Date) turkey_elec %>% autoplot(Demand) + labs(title = "Turkish daily electricity demand", x = "Year", y = "Electricity Demand (GW)") ``` --- ## Examples <img src="index_files/figure-html/turk2-1.png" width="576" /> --- ## TBATS model TBATS `\(\textbf{T}\)`rigonometric terms for seasonality `\(\textbf{B}\)`ox-Cox transformations for heterogeneity `\(\textbf{A}\)`RMA errors for short-term dynamics `\(\textbf{T}\)`rend (possibly damped) `\(\textbf{S}\)`easonal (including multiple and non-integer periods) --- ## TBATS model $$ y_t = \text{observation at time `\(t\)`} $$ Box-Cox If `\(\omega \neq 0\)` then `$$y_t^{(\omega)} = (y_t^\omega-1)/\omega$$` Else `$$\log y_t$$` --- ## TBATS model $$ y_t = \text{observation at time `\(t\)`} $$ Box-Cox: If `\(\omega \neq 0\)` then `\(y_t^{(\omega)} = (y_t^\omega-1)/\omega\)` Else `\(\log y_t\)` `\(M\)` seasonal periods: `$$y_t^{(\omega)} = \ell_{t-1} + \phi b_{t-1} + \sum_{i=1}^M s_{t-m_i}^{(i)} + d_t$$` --- ## TBATS model $$ y_t = \text{observation at time `\(t\)`} $$ Box-Cox: If `\(\omega \neq 0\)` then `\(y_t^{(\omega)} = (y_t^\omega-1)/\omega\)` Else `\(\log y_t\)` `\(M\)` seasonal periods: `$$y_t^{(\omega)} = \ell_{t-1} + \phi b_{t-1} + \sum_{i=1}^M s_{t-m_i}^{(i)} + d_t$$` Local and global trends: `$$\ell_t = \ell_{t-1} + \phi b_{t-1} + \alpha d_t\\ b_t = (1-\phi) b + \phi b_{t-1} + \beta d_{t}$$` --- ## TBATS model `\(y_t = \text{observation at time t}\)` Box-Cox: If `\(\omega \neq 0\)` then `\(y_t^{(\omega)} = (y_t^\omega-1)/\omega\)` Else `\(\log y_t\)` `\(M\)` seasonal periods: `$$y_t^{(\omega)} = \ell_{t-1} + \phi b_{t-1} + \sum_{i=1}^M s_{t-m_i}^{(i)} + d_t$$` Local and global trends: `$$\ell_t = \ell_{t-1} + \phi b_{t-1} + \alpha d_t; b_t = (1-\phi) b + \phi b_{t-1} + \beta d_{t}$$` ARMA Errors: `$$d_t = \sum_{i=1}^p \phi_i d_{t-i} + \sum_{j=1}^q \theta_j \varepsilon_{t-j} + \varepsilon_t\\ s_t^{(i)} = \sum_{j=1}^{k_i} s_{j,t}^{(i)}$$` --- ## Fourier Terms Fourier seasonal terms: `$$s_{j,t}^{(i)} = \phantom{-}s_{j,t-1}^{(i)}\cos \lambda_j^{(i)} + s_{j,t-1}^{*(i)}\sin \lambda_j^{(i)} + \gamma_1^{(i)} d_t$$` and `$$s_{j,t}^{(i)} = -s_{j,t-1}^{(i)}\sin \lambda_j^{(i)} + s_{j,t-1}^{*(i)}\cos \lambda_j^{(i)} + \gamma_2^{(i)} d_t$$` --- ## Complex seasonality ```r fpp2::gasoline %>% tbats() %>% forecast() %>% autoplot() ``` --- ## Complex seasonality <img src="index_files/figure-html/gasoline2-1.png" width="576" /> --- ## Complex seasonality ```r fpp2::calls %>% tbats() %>% forecast() %>% autoplot() ``` <img src="index_files/figure-html/callcentref-1.png" width="576" /> --- ## Complex seasonality <img src="index_files/figure-html/callcentref2-1.png" width="576" /> --- ## Complex seasonality ```r telec <- ts(telec, start=as.Date("2000-01-01"), frequency=1) telec %>% tbats() %>% forecast() %>% autoplot() ``` --- ## Complex seasonality <img src="index_files/figure-html/telecf2-1.png" width="576" /> --- ## TBATS model `\(\textbf{T}\)`rigonometric terms for seasonality `\(\textbf{B}\)`ox-Cox transformations for heterogeneity `\(\textbf{A}\)`RMA errors for short-term dynamics `\(\textbf{T}\)`rend (possibly damped) `\(\textbf{S}\)`easonal (including multiple and non-integer periods) * Handles non-integer seasonality, multiple seasonal periods. * Entirely automated * Prediction intervals often too wide * Very slow on long series --- # Prophet A basic non-linear regression model: `$$y_t = g(t) + s(t) + h(t) + e_{t}$$` where `\(g\)` is piecewise linear *growth*; `\(s\)` is Fourier seasonal terms; and `\(h\)` is holiday dummies/indicators. For **non-daily data** specify the period completely, e.g. 4 for quarters, 12 for months. --- # Vector autoregression 1. The economist's view of causation is built on **Granger causality** which stresses the idea of temporal precedence: **cause must come before effect**. This ran into problems with Lucas's rational expectations thesis and with Sims *Macroeconomics and Reality*. 2. Normally, we conjecture a regression model with inputs and outputs with unidirectional "causation". Drivers and driven. 3. We can address the two problems above with a multiequation system. We choose the set of interrelated variables (k) and the number of relevant lags (p). --- ## VAR doesn't appear to work Using `fpp3` type syntax. It requires using the `fpp2` style. ```r VAR(fpp2::uschange[,c(1:2)]) ``` ``` Consumption Income Consumption 0.4294818 0.2434001 Income 0.2434001 0.8674235 ``` --- # Neural network models --- ## Neural network models ![](img/NNLR1.jpg) --- ## Neural network models ![](img/NNLR2.jpg) --- ## Neural network models ![](img/NN1R.jpg) --- ## Neural network models ![](img/NN2R.jpg) --- ## Neural network models Inputs to hidden neuron `\(j\)` linearly combined: `$$z_j = b_j + \sum_{i=1}^4 w_{i,j} x_i.$$` Modified using nonlinear function such as a sigmoid: `$$s(z) = \frac{1}{1+e^{-z}}$$` This tends to reduce the effect of extreme input values, thus making the network somewhat robust to outliers. --- ## Neural network models * Weights take random values to begin with, which are then updated using the observed data. * There is an element of randomness in the predictions. So the network is usually trained several times using different random starting points, and the results are averaged. * Number of hidden layers, and the number of nodes in each hidden layer, must be specified in advance. --- ## NNAR models * Lagged values of the time series can be used as inputs to a neural network. * NNAR($p,k$): `\(p\)` lagged inputs and `\(k\)` nodes in the single hidden layer. * NNAR($p,0$) model is equivalent to an ARIMA($p,0,0$) model but without stationarity restrictions. * Seasonal NNAR($p,P,k$): inputs `\((y_{t-1},y_{t-2},\dots,y_{t-p},y_{t-m},y_{t-2m},y_{t-Pm})\)` and `\(k\)` neurons in the hidden layer. * NNAR($p,P,0)_m$ model is equivalent to an ARIMA$(p,0,0)(P,0,0)_m$ model but without stationarity restrictions. --- ## NNAR models in R * The `nnetar()` function fits an NNAR($p,P,k)_m$ model. * If `\(p\)` and `\(P\)` are not specified, they are automatically selected. * For non-seasonal time series, default `\(p=\)` optimal number of lags (according to the AIC) for a linear AR($p$) model. * For seasonal time series, defaults are `\(P=1\)` and `\(p\)` is chosen from the optimal linear model fitted to the seasonally adjusted data. * Default `\(k=(p+P+1)/2\)` (rounded to the nearest integer). --- ## Sunspots * Surface of the sun contains magnetic regions that appear as dark spots. * These affect the propagation of radio waves and so telecommunication companies like to predict sunspot activity in order to plan for any future difficulties. * Sunspots follow a cycle of length between 9 and 14 years. --- ## NNAR(9,5) model for sunspots ```r sunspots <- as_tsibble(fpp2::sunspotarea) fit <- sunspots %>% model(NNETAR(value)) fit %>% forecast(h=20) %>% autoplot(sunspots, level = NULL) ``` --- ## Prediction intervals by simulation ```r fit %>% forecast(h=20) %>% autoplot(sunspots) ``` --- # Bootstrapping and bagging