The ARIMA approach is fundamentally inductive. The workflow involves the use of empirical values of ACFs and PACFs to engage in model selection. Dynamic models engage theory/structure to impose more stringent assumptions for producing estimates.
First, a result. Aitken Theorem
In a now-classic paper, Aitken generalized the Gauss-Markov theorem to the class of Generalized Least Squares estimators. It is important to note that these are GLS and not FGLS estimators. What is the difference? The two GLS estimators considered by Stimson are not strictly speaking GLS.
Definition ˆβGLS=(X′Ω−1X)−1X′Ω−1y
Properties
(1) GLS is unbiased.
(2) Consistent.
(3) Asymptotically normal.
(4) MV(L)UE
The variance/covariance matrix of the errors for a first-order autoregressive process is useful to derive.
The variance/covariance matrix of the errors for a first-order autoregressive process is useful to derive.
The matrix is banded; observations separated by one point in time are correlated ρ. Period two is ρ2; the corners are ρT−1. The diagonal is one.
What I have actually described is the correlation; the relevant autocovariances are actually defined by σ2ρs1−ρ2 where s denotes the time period separation.
The variance/covariance matrix of the errors for a first-order autoregressive process is useful to derive.
The matrix is banded; observations separated by one point in time are correlated ρ. Period two is ρ2; the corners are ρT−1. The diagonal is one.
What I have actually described is the correlation; the relevant autocovariances are actually defined by σ2ρs1−ρ2 where s denotes the time period separation.
It is also straightforward to prove (tediously through induction) that this is invertible; it is square and the determinant is non-zero having assumed that |ρ<1|.
Φ=σ2Ψ=σ2e(1ρ1ρ2…ρT−1ρ11ρ1…ρT−2ρ2ρ11…ρT−3⋮⋮⋮⋱⋮ρT−1ρT−2ρT−3…1)
given that et=ρet−1+νt. A Toeplitz form....
If the variance is stationary, we can rewrite, σ2e=σ2ν1−ρ2
A comment on characteristic roots....
We have the two key elements to implement this except that we do not know ρ; we will have to estimate it and estimates have uncertainty. But it is important to note this imposes exactly an AR(1). If the process is incorrectly specified, then the optimal properties do not follow. Indeed, the optimal properties also depend on an additional important feature.
We need to estimate things to replace unknown covariance structures and coverage will depend on properties of the estimators of these covariances.
Consistent estimators will work but there is
euphemistically considerable variation
in the class of consistent
estimators.
Contrasting the Beck and Katz/White approach with the GLS approach is a valid difference in philosophies.1 One takes advantage of OLS and Basus Theorem; one goes full Aitken.
1 We will return to this when we look at Hausman because this is the essential issue.
yt=a1yt−1+ϵt
is the simplest dynamic model but it cannot be estimated consistently, in general terms, in the presence of serial correlation. Why?
yt=a1yt−1+ϵt
is the simplest dynamic model but it cannot be estimated consistently, in general terms, in the presence of serial correlation. Why? The key condition for unbiasedness is violated because E(yt−1ϵt)≠0. OLS will not generally work.
A note on dynamic interpretation.
yt=a1yt−1+βXt+ϵt
The problem is fitting and the key issue is white noise residuals post-estimation. But we have to assume a structure and implement it.
yt=α+β0Xt+β1xt−1+…+ϵt
The impact of x occurs over multiple periods. It relies on theory, or perhaps analysis using information criteria/F [owing to quasi-nesting and missing data]. OLS is a fine solution to this problem but the search space of models is often large.
In response to this problem, we have structured distributed lag models; there are many such schemes.
yt=α+β0Xt+β1xt−1+…+ϵt
The impact of x occurs over multiple periods. It relies on theory, or perhaps analysis using information criteria/F [owing to quasi-nesting and missing data]. OLS is a fine solution to this problem but the search space of models is often large.
In response to this problem, we have structured distributed lag models; there are many such schemes.
yt=α+β0Xt+β1xt−1+…+ϵt
The impact of x occurs over multiple periods. It relies on theory, or perhaps analysis using information criteria/F [owing to quasi-nesting and missing data]. OLS is a fine solution to this problem but the search space of models is often large.
In response to this problem, we have structured distributed lag models; there are many such schemes.
yt=α+γ1yt−1+β0Xt+β1Xt−1+β2Xt−2+…+ϵt
Data analysis can quite yield models comparisons among competing dynamic structures. The key issue is that the analyst need divine the process; what is the relevant error process and what is the structure and timing of effects alongside the potential question of incremental adjustment. We need good theory for that.
Given such theory, we can take an equations as analysis approach, measure the variables, and derive reduced forms, and then recover parameter estimates deploying simultaneous equations methods. Very large such systems were a core part of early empirical macroeconomics. The failures of such systems led to the proposal of alternatives.
Chris Sims suggested a more flexible approach: the VAR.
1 A nice blog post with an extended example in R can be found on towardsdatascience. Kit Baum has a similar worked example in slides.
1 A nice blog post with an extended example in R can be found on towardsdatascience. Kit Baum has a similar worked example in slides.
1 A nice blog post with an extended example in R can be found on towardsdatascience. Kit Baum has a similar worked example in slides.
The key insight is that this VAR is the reduced form to some more complicated as yet unspecified structural form.
But if the goal is to specify how variables related to one another and to use data to discover Granger causality and responses to impulse injected in the system.
library(forecast)mdeathsfdeathssave(mdeaths, fdeaths, file = "./img/LungDeaths.RData")
library(hrbrthemes)load(url("https://github.com/robertwwalker/Essex-Data/raw/main/LungDeaths.RData"))Males <- mdeaths; Females <- fdeathsLung.Deaths <- cbind(Males, Females) %>% as_tsibble()Lung.Deaths %>% autoplot() + theme_ipsum_rc()
lung_deaths <- cbind(mdeaths, fdeaths) %>% as_tsibble(pivot_longer = FALSE)fit <- lung_deaths %>% model(VAR(vars(mdeaths, fdeaths) ~ AR(3)))report(fit)
Series: mdeaths, fdeaths Model: VAR(3) w/ mean Coefficients for mdeaths: lag(mdeaths,1) lag(fdeaths,1) lag(mdeaths,2) lag(fdeaths,2) 0.6675 0.8074 0.3677 -1.4540s.e. 0.3550 0.8347 0.3525 0.8088 lag(mdeaths,3) lag(fdeaths,3) constant 0.2606 -1.1214 538.7817s.e. 0.3424 0.8143 137.1047Coefficients for fdeaths: lag(mdeaths,1) lag(fdeaths,1) lag(mdeaths,2) lag(fdeaths,2) 0.2138 0.4563 0.0937 -0.3984s.e. 0.1460 0.3434 0.1450 0.3328 lag(mdeaths,3) lag(fdeaths,3) constant 0.0250 -0.315 202.0027s.e. 0.1409 0.335 56.4065Residual covariance matrix: mdeaths fdeathsmdeaths 58985.95 22747.94fdeaths 22747.94 9983.95log likelihood = -812.35AIC = 1660.69 AICc = 1674.37 BIC = 1700.9
fit2 <- lung_deaths %>% model(VAR(vars(mdeaths, fdeaths) ~ AR(2)))report(fit2)
Series: mdeaths, fdeaths Model: VAR(2) w/ mean Coefficients for mdeaths: lag(mdeaths,1) lag(fdeaths,1) lag(mdeaths,2) lag(fdeaths,2) constant 0.9610 0.3340 0.1149 -1.3379 443.8492s.e. 0.3409 0.8252 0.3410 0.7922 124.4608Coefficients for fdeaths: lag(mdeaths,1) lag(fdeaths,1) lag(mdeaths,2) lag(fdeaths,2) constant 0.3391 0.2617 -0.0601 -0.2691 145.0546s.e. 0.1450 0.3510 0.1450 0.3369 52.9324Residual covariance matrix: mdeaths fdeathsmdeaths 62599.51 24942.79fdeaths 24942.79 11322.70log likelihood = -833.17AIC = 1694.35 AICc = 1701.98 BIC = 1725.83
fit %>% fabletools::forecast(h=12) %>% autoplot(lung_deaths)
lung_deaths %>%model(VAR(vars(mdeaths, fdeaths) ~ AR(3))) %>% residuals() %>% pivot_longer(., cols = c(mdeaths,fdeaths)) %>% filter(name=="fdeaths") %>% as_tsibble(index=index) %>% gg_tsdisplay(plot_type = "partial") + labs(title="Female residuals
lung_deaths %>%model(VAR(vars(mdeaths, fdeaths) ~ AR(3))) %>% residuals() %>% pivot_longer(., cols = c(mdeaths,fdeaths)) %>% filter(name=="mdeaths") %>% as_tsibble(index=index) %>% gg_tsdisplay(plot_type = "partial") + labs(title="Male residuals
What happens if I shock one of the series; how does it work through the system?
The idea behind an impulse-response is core to counterfactual analysis with time series. What does our future world look like and what predictions arise from it and the model we have deployed?
What happens if I shock one of the series; how does it work through the system?
The idea behind an impulse-response is core to counterfactual analysis with time series. What does our future world look like and what predictions arise from it and the model we have deployed?
Whether VARs or dynamic linear models or ADL models, these are key to interpreting a model in the real world.
VARMF <- cbind(Males,Females)mod1 <- vars::VAR(VARMF, p=3, type="const")plot(vars::irf(mod1, boot=TRUE, impulse="Males"))
plot(vars::irf(mod1, boot=TRUE, impulse="Females"))
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |