class: center, middle, inverse, title-slide # Time Series: Dynamic Regression ## FPP3, Chapter 10 ### Robert W. Walker ### AGSM ### 2021-03-19 --- # An Overview Expanding the space of relevant models still operates in the circle of chapter 5. ![Forecasting Diagram](img/Workflow.PNG) --- ### An Overview Expanding the space of relevant models still operates in the circle of chapter 5. + Linear Models: tidying up + Univariate 1: Exponential Smoothing + Univariate 2: ARIMA models + Dynamic Regression **Tying it Together** ![Forecasting Diagram](img/Workflow.PNG) --- ## Packages Getting started ```r library(tidyverse) library(fpp3) library(purrr) library(gganimate) library(seasonal) library(tidyquant) library(magrittr) vic_elec_daily <- vic_elec %>% filter(year(Time) == 2014) %>% index_by(Date = date(Time)) %>% summarise( Demand = sum(Demand)/1e3, Temperature = max(Temperature), Holiday = any(Holiday) ) %>% mutate(Day_Type = case_when( Holiday ~ "Holiday", wday(Date) %in% 2:6 ~ "Weekday", TRUE ~ "Weekend" )) ``` --- # The magic of `forecast` The method: 1. Tidy 2. Visualise 3. Model a. Specify b. Estimate c. Evaluate d. Visualize 4. Forecast --- # Models ``` 3. Model a. Specify b. Estimate c. Evaluate d. Visualize ``` --- # First: The Linear Model `$$y_{t} = \beta_{0} + \beta_{1}x_{t} + \epsilon_{t}$$` --- ### Strengths and Limitations Strengths: + Trends can be linear or nonlinear [knots]. + Seasonality can be naturally accommodated. + Can accommodate covariates and lag structures. Weaknesses: + Persistence of unknown origin and the need for decay. This will be inherently nonlinear. --- ## Second: Exponential Smoothing Describe the outcome as two separate but combineable forms: a Forecast equation `$$\hat{y}_{t+h|t} = S_{t}$$` and a Smoothing Equation `$$S_{t} = \alpha y_{t} + (1-\alpha) S_{t-1}$$`. If we combine them, we get: `$$\hat{y}_{t+h|t} = \alpha y_{t} + (1-\alpha) S_{t-1}$$` --- ## The Model: `ETS` the `error` component (either `M` multiplicative or `A` additive) the `trend` the `season` Some comments on options -- the optimization method. --- # ARIMA AR: autoregressive (lagged observations as inputs) I: integrated MA: moving average (lagged errors as inputs) Use Case: An ARIMA model is rarely interpretable in terms of visible data structures like trend and seasonality. But it can capture a huge range of time series patterns. --- # Regression with ARIMA errors --- ## Regression with ARIMA errors `$$y_t = \beta_0 + \beta_1 x_{1,t} + \dots + \beta_k x_{k,t} + \varepsilon_t$$` where * `\(y_t\)` modeled as function of `\(k\)` explanatory variables `\(x_{1,t},\dots,x_{k,t}\)`. * In regression, we assume that `\(\varepsilon_t\)` was WN. * Now we want to allow `\(\varepsilon_t\)` to be autocorrelated. --- ### Example: ARIMA(1,1,1) errors `$$y_t = \beta_0 + \beta_1 x_{1,t} + \dots + \beta_k x_{k,t} + \eta_t,\\ (1-\phi_1B)(1-B)\eta_t = (1+\theta_1B)\varepsilon_t$$` where `\(\varepsilon_t\)` is white noise. --- ## Residuals and errors ### Example: `\(\eta_t\)` = ARIMA(1,1,1) `$$y_t = \beta_0 + \beta_1 x_{1,t} + \dots + \beta_k x_{k,t} + \eta_t,\\ (1-\phi_1B)(1-B)\eta_t = (1+\theta_1B)\varepsilon_t,$$` * Be careful in distinguishing `\(\eta_t\)` from `\(\varepsilon_t\)`. * Only the errors `\(\varepsilon_t\)` are assumed to be white noise. * In ordinary regression, `\(\eta_t\)` is assumed to be white noise and so `\(\eta_t = \varepsilon_t\)`. --- ## Estimation If we minimize `\(\sum \eta_t^2\)` (by using ordinary regression): 1. Estimated coefficients `\(\hat{\beta}_0,\dots,\hat{\beta}_k\)` are no longer optimal as some information ignored; 2. Statistical tests associated with the model (e.g., t-tests on the coefficients) are incorrect. 3. `\(p\)`-values for coefficients usually too small (`spurious regression`). 4. AIC of fitted models misleading. * Minimizing `\(\sum \varepsilon_t^2\)` avoids these problems. * Maximizing likelihood similar to minimizing `\(\sum \varepsilon_t^2\)`. --- class: inverse ## Stationarity **Regression with ARMA errors** `$$y_t = \beta_0 + \beta_1 x_{1,t} + \dots + \beta_k x_{k,t} + \eta_t,$$` where `\(\eta_t\)` is an ARMA process. * All variables in the model must be stationary. * If we estimate the model while any of these are non-stationary, the estimated coefficients can be incorrect. * Difference variables until all stationary. * If necessary, apply same differencing to all variables. --- ## Stationarity Model with ARIMA(1,1,1) errors `$$y_t = \beta_0 + \beta_1 x_{1,t} + \dots + \beta_k x_{k,t} + \eta_t,\\ (1-\phi_1B)(1-B)\eta_t = (1+\theta_1B)\varepsilon_t,$$` Equivalent to model with ARIMA(1,0,1) errors `$$y'_t = \beta_1 x'_{1,t} + \dots + \beta_k x'_{k,t} + \eta'_t,\\ (1-\phi_1B)\eta'_t = (1+\theta_1B)\varepsilon_t,$$` where `\(y'_t=y_t-y_{t-1}\)`, `\(x'_{t,i}=x_{t,i}-x_{t-1,i}\)` and `\(\eta'_t=\eta_t-\eta_{t-1}\)`. --- ## Regression with ARIMA errors Any regression with an ARIMA error can be rewritten as a regression with an ARMA error by differencing all variables with the same differencing operator as in the ARIMA model. Original data `$$y_t = \beta_0 + \beta_1 x_{1,t} + \dots + \beta_k x_{k,t} + \eta_t\\ \mbox{where}\quad \phi(B)(1-B)^d\eta_t = \theta(B)\varepsilon_t$$` After differencing all variables `$$y'_t = \beta_1 x'_{1,t} + \dots + \beta_k x'_{k,t} + \eta'_t.\\ \mbox{where}\quad \phi(B)\eta_t = \theta(B)\varepsilon_t$$` `$$\text{and}\quad y_t' = (1-B)^dy_t$$` --- ## Model selection * Fit regression model with automatically selected ARIMA errors. (R will take care of differencing before estimation.) * Check that `\(\varepsilon_t\)` series looks like white noise. --- ### Selecting predictors * AICc can be calculated for final model. * Repeat procedure for all subsets of predictors to be considered, and select model with lowest AICc value. --- ## US personal consumption and income ```r us_change %>% gather(key='variable', value='value') %>% ggplot(aes(y=value, x=Quarter, group=variable, colour=variable)) + geom_line() + facet_grid(variable ~ ., scales='free_y') + xlab("Year") + ylab("") + ggtitle("Quarterly changes in US consumption and personal income") + guides(colour="none") ``` <img src="index_files/figure-html/usconsump-1.png" width="612" /> --- ## US personal consumption and income * No need for transformations or further differencing. * Increase in income does not necessarily translate into instant increase in consumption (e.g., after the loss of a job, it may take a few months for expenses to be reduced to allow for the new circumstances). We will ignore this for now. --- ## US personal consumption and income ```r fit <- us_change %>% model(ARIMA(Consumption ~ Income)) report(fit) ``` ``` Series: Consumption Model: LM w/ ARIMA(1,0,2) errors Coefficients: ar1 ma1 ma2 Income intercept 0.7070 -0.6172 0.2066 0.1976 0.5949 s.e. 0.1068 0.1218 0.0741 0.0462 0.0850 sigma^2 estimated as 0.3113: log likelihood=-163.04 AIC=338.07 AICc=338.51 BIC=357.8 ``` **Write down the equations for the fitted model.** --- ## US personal consumption and income ```r residuals(fit, type='regression') %>% gg_tsdisplay(.resid, plot_type = 'partial') + ggtitle("Regression errors") ``` <img src="index_files/figure-html/unnamed-chunk-2-1.png" width="576" /> --- ## US personal consumption and income ```r residuals(fit, type='response') %>% gg_tsdisplay(.resid, plot_type = 'partial') + ggtitle("ARIMA errors") ``` <img src="index_files/figure-html/unnamed-chunk-3-1.png" width="576" /> --- ## US personal consumption and income ```r augment(fit) %>% features(.resid, ljung_box, dof = 5, lag = 12) ``` ``` # A tibble: 1 x 3 .model lb_stat lb_pvalue <chr> <dbl> <dbl> 1 ARIMA(Consumption ~ Income) 5.54 0.595 ``` --- ## US personal consumption and income ```r us_change_future <- new_data(us_change, 8) %>% mutate(Income = mean(us_change$Income)) forecast(fit, new_data = us_change_future) %>% autoplot(us_change) + labs(x = "Year", y = "Percentage change", title = "Forecasts from regression with ARIMA(1,0,2) errors") ``` <img src="index_files/figure-html/usconsump3-1.png" width="576" /> --- ## Forecasting * To forecast a regression model with ARIMA errors, we need to forecast the regression part of the model and the ARIMA part of the model and combine the results. * Some predictors are known into the future (e.g., time, dummies). * Separate forecasting models may be needed for other predictors. * Forecast intervals ignore the uncertainty in forecasting the predictors. --- ## Daily electricity demand Model daily electricity demand as a function of temperature using quadratic regression with ARMA errors. ```r vic_elec_daily %>% ggplot(aes(x=Temperature, y=Demand, colour=Day_Type)) + geom_point() + labs(x = "Maximum temperature", y = "Electricity demand (GW)") ``` <img src="index_files/figure-html/unnamed-chunk-5-1.png" width="576" /> --- ## Daily electricity demand ```r vic_elec_daily %>% gather("var", "value", Demand, Temperature) %>% ggplot(aes(x = Date, y = value)) + geom_line() + facet_grid(vars(var), scales = "free_y") ``` <img src="index_files/figure-html/unnamed-chunk-6-1.png" width="576" /> --- ## Daily electricity demand ```r fit <- vic_elec_daily %>% model(ARIMA(Demand ~ Temperature + I(Temperature^2) + (Day_Type=="Weekday"))) report(fit) ``` ``` Series: Demand Model: LM w/ ARIMA(2,1,2)(2,0,0)[7] errors Coefficients: ar1 ar2 ma1 ma2 sar1 sar2 Temperature -0.1093 0.7226 -0.0182 -0.9381 0.1958 0.4175 -7.6135 s.e. 0.0779 0.0739 0.0494 0.0493 0.0525 0.0570 0.4482 I(Temperature^2) Day_Type == "Weekday"TRUE 0.1810 30.4040 s.e. 0.0085 1.3254 sigma^2 estimated as 44.91: log likelihood=-1206.11 AIC=2432.21 AICc=2432.84 BIC=2471.18 ``` --- ## Daily electricity demand ```r gg_tsresiduals(fit) ``` <img src="index_files/figure-html/unnamed-chunk-8-1.png" width="576" /> --- ## Daily electricity demand ```r augment(fit) %>% features(.resid, ljung_box, dof = 9, lag = 14) ``` ``` # A tibble: 1 x 3 .model lb_stat lb_pvalue <chr> <dbl> <dbl> 1 "ARIMA(Demand ~ Temperature + I(Temperature^2) + (Day_Type … 28.4 0.0000304 ``` --- ## Daily electricity demand ```r # Forecast one day ahead vic_next_day <- new_data(vic_elec_daily, 1) %>% mutate(Temperature = 26, Day_Type = "Holiday") forecast(fit, vic_next_day) ``` ``` # A fable: 1 x 6 [1D] # Key: .model [1] .model Date Demand .mean Temperature Day_Type <chr> <date> <dist> <dbl> <dbl> <chr> 1 "ARIMA(Demand ~ Temperature … 2015-01-01 N(161, 45) 161. 26 Holiday ``` --- ## Daily electricity demand ```r vic_elec_future <- new_data(vic_elec_daily, 14) %>% mutate( Temperature = 26, Holiday = c(TRUE, rep(FALSE, 13)), Day_Type = case_when( Holiday ~ "Holiday", wday(Date) %in% 2:6 ~ "Weekday", TRUE ~ "Weekend" ) ) ``` --- ## Daily electricity demand ```r forecast(fit, new_data=vic_elec_future) %>% autoplot(vic_elec_daily) + ylab("Electricity demand (GW)") ``` <img src="index_files/figure-html/unnamed-chunk-12-1.png" width="576" /> --- # Stochastic and deterministic trends --- ## Stochastic \& deterministic trends Deterministic trend `$$y_t = \beta_0 + \beta_1 t + \eta_t$$` where `\(\eta_t\)` is ARMA process. Stochastic trend `$$y_t = \beta_0 + \beta_1 t + \eta_t$$` where `\(\eta_t\)` is ARIMA process with `\(d\ge1\)`. Difference both sides until `\(\eta_t\)` is stationary: `$$y'_t = \beta_1 + \eta'_t$$` where `\(\eta'_t\)` is ARMA process. --- ## International visitors ```r aus_visitors <- as_tsibble(fpp2::austa) aus_visitors %>% autoplot(value) + labs(x = "Year", y = "millions of people", title = "Total annual international visitors to Australia") ``` <img src="index_files/figure-html/unnamed-chunk-13-1.png" width="576" /> --- ## International visitors **Deterministic trend** ```r fit_deterministic <- aus_visitors %>% model(Deterministic = ARIMA(value ~ trend() + pdq(d = 0))) report(fit_deterministic) ``` ``` Series: value Model: LM w/ ARIMA(2,0,0) errors Coefficients: ar1 ar2 trend() intercept 1.1127 -0.3805 0.1710 0.4156 s.e. 0.1600 0.1585 0.0088 0.1897 sigma^2 estimated as 0.02979: log likelihood=13.6 AIC=-17.2 AICc=-15.2 BIC=-9.28 ``` --- `$$y_t = 0.42 + 0.17 t + \eta_t$$` `$$\eta_t = 1.11 \eta_{t-1} -0.38 \eta_{t-2} + \varepsilon_t$$` `$$\varepsilon_t \sim \text{NID}(0,0.0298).$$` --- ## International visitors **Stochastic trend** ```r fit_stochastic <- aus_visitors %>% model(Stochastic = ARIMA(value ~ pdq(d=1))) report(fit_stochastic) ``` ``` Series: value Model: ARIMA(0,1,1) w/ drift Coefficients: ma1 constant 0.3006 0.1735 s.e. 0.1647 0.0390 sigma^2 estimated as 0.03376: log likelihood=10.62 AIC=-15.24 AICc=-14.46 BIC=-10.57 ``` --- `$$y_t-y_{t-1} = 0.173 + \varepsilon_t + 0.301 \varepsilon_{t-1} \\ y_t = y_0 + 0.173 t + \eta_t \\ \eta_t = \eta_{t-1} + 0.301 \varepsilon_{t-1} + \varepsilon_{t}\\ \varepsilon_t \sim \text{NID}(0,0.034).$$` --- ## International visitors ```r fc_deterministic <- forecast(fit_deterministic, h = 10) fc_stochastic <- forecast(fit_stochastic, h = 10) rbind(fc_deterministic, fc_stochastic) %>% autoplot(aus_visitors) + facet_grid(vars(.model)) + labs(x = "Year", y = "Visitors to Australia (millions)", title = "Forecasts from trend models") + guides(colour = FALSE) ``` <img src="index_files/figure-html/unnamed-chunk-16-1.png" width="576" /> --- ## International visitors <img src="index_files/figure-html/unnamed-chunk-17-1.png" width="612" /> --- ## Forecasting with trend * Point forecasts are almost identical, but prediction intervals differ. * Stochastic trends have much wider prediction intervals because the errors are non-stationary. * Be careful of forecasting with deterministic trends too far ahead. --- # Dynamic harmonic regression --- ## Dynamic harmonic regression **Combine Fourier terms with ARIMA errors** ### Advantages * it allows any length seasonality; * for data with more than one seasonal period, you can include Fourier terms of different frequencies; * the seasonal pattern is smooth for small values of `\(K\)` (but more wiggly seasonality can be handled by increasing `\(K\)`); * the short-term dynamics are easily handled with a simple ARMA error. ### Disadvantages * seasonality is assumed to be fixed --- ## Eating-out expenditure ```r aus_cafe <- aus_retail %>% filter( Industry == "Cafes, restaurants and takeaway food services", year(Month) %in% 2004:2018 ) %>% summarise(Turnover = sum(Turnover)) aus_cafe %>% autoplot(Turnover) ``` <img src="index_files/figure-html/cafe-1.png" width="576" /> --- ## Eating-out expenditure ```r fit <- aus_cafe %>% model( `K = 1` = ARIMA(log(Turnover) ~ fourier(K = 1) + PDQ(0,0,0)), `K = 2` = ARIMA(log(Turnover) ~ fourier(K = 2) + PDQ(0,0,0)), `K = 3` = ARIMA(log(Turnover) ~ fourier(K = 3) + PDQ(0,0,0)), `K = 4` = ARIMA(log(Turnover) ~ fourier(K = 4) + PDQ(0,0,0)), `K = 5` = ARIMA(log(Turnover) ~ fourier(K = 5) + PDQ(0,0,0)), `K = 6` = ARIMA(log(Turnover) ~ fourier(K = 6) + PDQ(0,0,0))) glance(fit) ``` |.model | sigma2| log_lik| AIC| AICc| BIC| |:------|---------:|--------:|---------:|---------:|---------:| |K = 1 | 0.0017471| 317.2353| -616.4707| -615.4056| -587.7842| |K = 2 | 0.0010732| 361.8533| -699.7066| -697.8271| -661.4579| |K = 3 | 0.0007609| 393.6062| -763.2125| -761.3329| -724.9638| |K = 4 | 0.0005386| 426.7838| -821.5676| -818.2096| -770.5694| |K = 5 | 0.0003173| 473.7344| -919.4688| -916.9078| -874.8454| |K = 6 | 0.0003163| 474.0307| -920.0614| -917.5004| -875.4380| --- ## Eating-out expenditure ```r cafe_plot <- function(...){ fit %>% select(...) %>% forecast() %>% autoplot(aus_cafe) + ggtitle(sprintf("Log transformed %s, fourier(K = %s)", model_sum(select(fit,...)[[1]][[1]]), deparse(..1))) + geom_label( aes(x = yearmonth("2007 Jan"), y = 4250, label = paste0("AICc = ", format(AICc))), data = glance(select(fit,...)) ) + geom_line(aes(y = .fitted), colour = "red", augment(select(fit, ...))) + ylim(c(1500, 5100)) } ``` <img src="index_files/figure-html/cafe1-1.png" width="576" /> --- ## Eating-out expenditure <img src="index_files/figure-html/cafe2-1.png" width="576" /> --- ## Eating-out expenditure <img src="index_files/figure-html/cafe3-1.png" width="576" /> --- ## Eating-out expenditure <img src="index_files/figure-html/cafe4-1.png" width="576" /> --- ## Eating-out expenditure <img src="index_files/figure-html/cafe5-1.png" width="576" /> --- ## Eating-out expenditure <img src="index_files/figure-html/cafe6-1.png" width="576" /> --- ## Example: weekly gasoline products ```r fit <- us_gasoline %>% model(ARIMA(Barrels ~ fourier(K = 13) + PDQ(0,0,0))) report(fit) ``` ``` Series: Barrels Model: LM w/ ARIMA(0,1,1) errors Coefficients: ma1 fourier(K = 13)C1_52 fourier(K = 13)S1_52 -0.8934 -0.1121 -0.2300 s.e. 0.0132 0.0123 0.0122 fourier(K = 13)C2_52 fourier(K = 13)S2_52 0.0420 0.0317 s.e. 0.0099 0.0099 fourier(K = 13)C3_52 fourier(K = 13)S3_52 0.0832 0.0346 s.e. 0.0094 0.0094 fourier(K = 13)C4_52 fourier(K = 13)S4_52 0.0185 0.0398 s.e. 0.0092 0.0092 fourier(K = 13)C5_52 fourier(K = 13)S5_52 -0.0315 0.0009 s.e. 0.0091 0.0091 fourier(K = 13)C6_52 fourier(K = 13)S6_52 -0.0522 0.000 s.e. 0.0090 0.009 fourier(K = 13)C7_52 fourier(K = 13)S7_52 -0.0173 0.0053 s.e. 0.0090 0.0090 fourier(K = 13)C8_52 fourier(K = 13)S8_52 0.0074 0.0048 s.e. 0.0090 0.0090 fourier(K = 13)C9_52 fourier(K = 13)S9_52 -0.0024 -0.0035 s.e. 0.0090 0.0090 fourier(K = 13)C10_52 fourier(K = 13)S10_52 0.0151 -0.0037 s.e. 0.0090 0.0090 fourier(K = 13)C11_52 fourier(K = 13)S11_52 -0.0144 0.0191 s.e. 0.0090 0.0090 fourier(K = 13)C12_52 fourier(K = 13)S12_52 -0.0227 -0.0052 s.e. 0.0090 0.0090 fourier(K = 13)C13_52 fourier(K = 13)S13_52 intercept -0.0035 0.0038 0.0014 s.e. 0.0090 0.0090 0.0007 sigma^2 estimated as 0.06168: log likelihood=-21.96 AIC=101.93 AICc=103.24 BIC=253.04 ``` --- ## Example: weekly gasoline products ```r forecast(fit, h = "3 years") %>% autoplot(us_gasoline) ``` <img src="index_files/figure-html/gasf-1.png" width="576" /> --- ## 5-minute call centre volume ```r (calls <- readr::read_tsv("http://robjhyndman.com/data/callcenter.txt") %>% rename(time = X1) %>% pivot_longer(-time, names_to = "date", values_to = "volume") %>% mutate( date = as.Date(date, format = "%d/%m/%Y"), datetime = as_datetime(date) + time ) %>% as_tsibble(index = datetime)) ``` ``` # A tsibble: 27,716 x 4 [5m] <UTC> time date volume datetime <time> <date> <dbl> <dttm> 1 07:00 2003-03-03 111 2003-03-03 07:00:00 2 07:05 2003-03-03 113 2003-03-03 07:05:00 3 07:10 2003-03-03 76 2003-03-03 07:10:00 4 07:15 2003-03-03 82 2003-03-03 07:15:00 5 07:20 2003-03-03 91 2003-03-03 07:20:00 6 07:25 2003-03-03 87 2003-03-03 07:25:00 7 07:30 2003-03-03 75 2003-03-03 07:30:00 8 07:35 2003-03-03 89 2003-03-03 07:35:00 9 07:40 2003-03-03 99 2003-03-03 07:40:00 10 07:45 2003-03-03 125 2003-03-03 07:45:00 # … with 27,706 more rows ``` --- ## 5-minute call centre volume ```r calls %>% fill_gaps() %>% autoplot(volume) ``` <img src="index_files/figure-html/calls-plot-1.png" width="576" /> --- ## 5-minute call centre volume ```r calls %>% fill_gaps() %>% gg_season(volume, period = "day", alpha = 0.1) + guides(colour = FALSE) ``` <img src="index_files/figure-html/calls-season-1.png" width="576" /> --- ## 5-minute call centre volume ```r library(sugrrants) calls %>% filter(month(date, label = TRUE) == "Apr") %>% ggplot(aes(x = time, y = volume)) + geom_line() + facet_calendar(date) ``` <img src="index_files/figure-html/calls-cal-1.png" width="576" /> --- ## 5-minute call centre volume ```r calls_mdl <- calls %>% mutate(idx = row_number()) %>% update_tsibble(index = idx) fit <- calls_mdl %>% model(ARIMA(volume ~ fourier(169, K = 10) + pdq(d=0) + PDQ(0,0,0))) report(fit) ``` ``` Series: volume Model: LM w/ ARIMA(1,0,3) errors Coefficients: ar1 ma1 ma2 ma3 fourier(169, K = 10)C1_169 0.9894 -0.7383 -0.0333 -0.0282 -79.0702 s.e. 0.0010 0.0061 0.0075 0.0060 0.7001 fourier(169, K = 10)S1_169 fourier(169, K = 10)C2_169 55.2985 -32.3615 s.e. 0.7006 0.3784 fourier(169, K = 10)S2_169 fourier(169, K = 10)C3_169 13.7417 -9.3180 s.e. 0.3786 0.2725 fourier(169, K = 10)S3_169 fourier(169, K = 10)C4_169 -13.6446 -2.7913 s.e. 0.2726 0.2230 fourier(169, K = 10)S4_169 fourier(169, K = 10)C5_169 -9.5084 2.8975 s.e. 0.2230 0.1957 fourier(169, K = 10)S5_169 fourier(169, K = 10)C6_169 -2.2323 3.3085 s.e. 0.1957 0.1790 fourier(169, K = 10)S6_169 fourier(169, K = 10)C7_169 0.174 0.2968 s.e. 0.179 0.1680 fourier(169, K = 10)S7_169 fourier(169, K = 10)C8_169 0.857 -1.3878 s.e. 0.168 0.1604 fourier(169, K = 10)S8_169 fourier(169, K = 10)C9_169 0.8633 -0.3410 s.e. 0.1604 0.1548 fourier(169, K = 10)S9_169 fourier(169, K = 10)C10_169 -0.9754 0.8050 s.e. 0.1548 0.1507 fourier(169, K = 10)S10_169 intercept -1.1803 192.0789 s.e. 0.1507 1.7688 sigma^2 estimated as 242.5: log likelihood=-115410.9 AIC=230873.8 AICc=230873.9 BIC=231087.8 ``` --- ## 5-minute call centre volume ```r gg_tsresiduals(fit, lag = 338) ``` <img src="index_files/figure-html/callsres-1.png" width="576" /> --- ## 5-minute call centre volume ```r fit %>% forecast(h = 1690) %>% autoplot(calls_mdl) ``` <img src="index_files/figure-html/callsf-1.png" width="576" /> --- ## 5-minute call centre volume ```r fit %>% forecast(h = 1690) %>% autoplot(filter(calls_mdl, idx > 25000)) ``` <img src="index_files/figure-html/callsf2-1.png" width="576" /> --- # Lagged predictors --- ## Lagged predictors Sometimes a change in `\(x_t\)` does not affect `\(y_t\)` instantaneously - `\(y_t=\)` sales, `\(x_t=\)` advertising. - `\(y_t=\)` stream flow, `\(x_t=\)` rainfall. - `\(y_t=\)` size of herd, `\(x_t=\)` breeding stock. * These are dynamic systems with input ($x_t$) and output `\((y_t)\)`. * `\(x_t\)` is often a leading indicator. * There can be multiple predictors. --- ## Lagged predictors The model include present and past values of predictor: `\(x_t,x_{t-1},x_{t-2},\dots.\)` `$$y_t = a + \gamma_0x_t + \gamma_1x_{t-1} + \dots + \gamma_kx_{t-k} + \eta_t$$` where `\(\eta_t\)` is an ARIMA process. **Rewrite model as ** `$$y_{t} = a+ (\gamma_{0} + \gamma_{1} B + \gamma_{2} B^{2} + \dots + \gamma_{k} B^{k}) x_{t} +\eta_t \\ = a+ \gamma(B) x_{t} +\eta_t.$$` * `\(\gamma(B)\)` is called a \textit{transfer function} since it describes how change in `\(x_t\)` is transferred to `\(y_t\)`. * `\(x\)` can influence `\(y\)`, but `\(y\)` is not allowed to influence `\(x\)`. --- ## Example: Insurance quotes and TV adverts ```r insurance <- as_tsibble(fpp2::insurance, pivot_longer = FALSE) %>% rename(Month = index) insurance %>% pivot_longer(c(Quotes, TV.advert)) %>% ggplot(aes(x = Month, y = value)) + geom_line() + facet_grid(vars(name), scales = "free_y") + labs(x = "Year", y = NULL, title = "Insurance advertising and quotations") ``` <img src="index_files/figure-html/tvadvert-1.png" width="576" /> --- ## Example: Insurance quotes and TV adverts ```r insurance %>% mutate( lag1 = lag(TV.advert), lag2 = lag(lag1) ) %>% as_tibble() %>% select(-Month) %>% rename(lag0 = TV.advert) %>% pivot_longer(-Quotes, names_to="Lag", values_to="TV_advert") %>% ggplot(aes(x = TV_advert, y = Quotes)) + geom_point() + facet_grid(. ~ Lag) + labs(title = "Insurance advertising and quotations") ``` <img src="index_files/figure-html/tvadvertpairs-1.png" width="576" /> --- ## Example: Insurance quotes and TV adverts ```r fit <- insurance %>% # Restrict data so models use same fitting period mutate(Quotes = c(NA,NA,NA,Quotes[4:40])) %>% # Estimate models model( ARIMA(Quotes ~ pdq(d = 0) + TV.advert), ARIMA(Quotes ~ pdq(d = 0) + TV.advert + lag(TV.advert)), ARIMA(Quotes ~ pdq(d = 0) + TV.advert + lag(TV.advert) + lag(TV.advert, 2)), ARIMA(Quotes ~ pdq(d = 0) + TV.advert + lag(TV.advert) + lag(TV.advert, 2) + lag(TV.advert, 3)) ) ``` --- ## Example: Insurance quotes and TV adverts ```r glance(fit) ``` | Lag order| sigma2| log_lik| AIC| AICc| BIC| |---------:|---------:|---------:|--------:|--------:|--------:| | 0| 0.2649757| -28.28210| 66.56420| 68.32890| 75.00859| | 1| 0.2094368| -24.04404| 58.08809| 59.85279| 66.53249| | 2| 0.2150429| -24.01627| 60.03254| 62.57799| 70.16581| | 3| 0.2056454| -22.15731| 60.31461| 64.95977| 73.82565| --- ## Example: Insurance quotes and TV adverts ```r fit <- insurance %>% model(ARIMA(Quotes ~ pdq(3, 0, 0) + TV.advert + lag(TV.advert))) report(fit) ``` ``` Series: Quotes Model: LM w/ ARIMA(3,0,0) errors Coefficients: ar1 ar2 ar3 TV.advert lag(TV.advert) intercept 1.4117 -0.9317 0.3591 1.2564 0.1625 2.0393 s.e. 0.1698 0.2545 0.1592 0.0667 0.0591 0.9931 sigma^2 estimated as 0.2165: log likelihood=-23.89 AIC=61.78 AICc=65.28 BIC=73.6 ``` --- `$$y_t = 2.04 + 1.26 x_t + 0.16 x_{t-1} + \eta_t$$` `$$\eta_t = 1.41 \eta_{t-1} -0.932 \eta_{t-2} + 0.359 \eta_{t-3} + \varepsilon_t$$` --- ## Example: Insurance quotes and TV adverts ```r advert_a <- new_data(insurance, 20) %>% mutate(TV.advert = 10) forecast(fit, advert_a) %>% autoplot(insurance) ``` <img src="index_files/figure-html/unnamed-chunk-24-1.png" width="576" /> --- ## Example: Insurance quotes and TV adverts ```r advert_b <- new_data(insurance, 20) %>% mutate(TV.advert = 8) forecast(fit, advert_b) %>% autoplot(insurance) ``` <img src="index_files/figure-html/unnamed-chunk-25-1.png" width="576" /> --- ## Example: Insurance quotes and TV adverts ```r advert_c <- new_data(insurance, 20) %>% mutate(TV.advert = 6) forecast(fit, advert_c) %>% autoplot(insurance) ``` <img src="index_files/figure-html/unnamed-chunk-26-1.png" width="576" />