Day 3: Heterogeneity and Dynamics in Data

Dynamic Regression Models

Robert W. Walker

2025-07-23

Outline for Day 3

Dynamic Models
GLS vs. OLS and fixes

Day 3: Dynamic Models

The ARIMA approach is fundamentally inductive. The workflow involves the use of empirical values of ACFs and PACFs to engage in model selection. Dynamic models engage theory/structure to impose more stringent assumptions for producing estimates.

Time Series Linear Models/Dynamic Models

First, a result. Aitken Theorem

In a now-classic paper, Aitken generalized the Gauss-Markov theorem to the class of Generalized Least Squares estimators. It is important to note that these are GLS and not FGLS estimators. What is the difference? The two GLS estimators considered by Stimson are not strictly speaking GLS.

Definition \hat{\beta}_{GLS} = (\mathbf{X}^{\prime}\Omega^{-1}\mathbf{X})^{-1}\mathbf{X}^{\prime}\Omega^{-1}\mathbf{y} > Properties >
> (1) GLS is unbiased.
> (2) Consistent.
> (3) Asymptotically normal.
> (4) MV(L)UE

A Quick Example

The variance/covariance matrix of the errors for a first-order autoregressive process is useful to derive.

The matrix is banded; observations separated by one point in time are correlated \rho. Period two is \rho^2; the corners are \rho^{T-1}. The diagonal is one.
What I have actually described is the correlation; the relevant autocovariances are actually defined by \frac{\sigma^{2}\rho^{s}}{1 - \rho^2} where s denotes the time period separation.
It is also straightforward to prove (tediously through induction) that this is invertible; it is square and the determinant is non-zero having assumed that |\rho < 1|.
- The 2x2 determinant is \frac{1}{1-\rho^2}.
- The 3x3 is 1*(1-\rho^2) - \rho(\rho - \rho^3) + \rho^2(\rho^2 - \rho^2). The first term is positive and the second term is non-zero so long as \rho \neq 0. But even if \rho=0, we would have an identity matrix which is invertible.

\Phi = \sigma^{2}\Psi = \sigma^{2}_{e} \left(\begin{array}{ccccc}1 & \rho^{1} & \rho^{2} & \ldots & \rho^{T-1} \\ \rho^1 & 1 & \rho^1 & \ldots & \rho^{T-2} \\ \rho^{2} & \rho^1 & 1 & \ldots & \rho^{T-3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \rho^{T-1} & \rho^{T-2} & \rho^{T-3} & \ldots & 1 \end{array}\right)

given that e_{t} = \rho e_{t-1} + \nu_{t}. A Toeplitz form….

If the variance is stationary, we can rewrite, \sigma^{2}_{e} = \frac{\sigma^{2}_{\nu}}{1 - \rho^{2}}

A comment on characteristic roots….

Cochrane-Orcutt

We have the two key elements to implement this except that we do not know \rho; we will have to estimate it and estimates have uncertainty. But it is important to note this imposes exactly an AR(1). If the process is incorrectly specified, then the optimal properties do not follow. Indeed, the optimal properties also depend on an additional important feature.

What does the feasible do?

We need to estimate things to replace unknown covariance structures and coverage will depend on properties of the estimators of these covariances.

Consistent estimators will work but there is euphemistically considerable variation in the class of consistent estimators.
Contrasting the Beck and Katz/White approach with the GLS approach is a valid difference in philosophies.¹ One takes advantage of OLS and Basus Theorem; one goes full Aitken.

Incremental Models

y_{t} = a_{1} y_{t-1} + \epsilon_{t}

is the simplest dynamic model but it cannot be estimated consistently, in general terms, in the presence of serial correlation. Why?

The key condition for unbiasedness is violated because \mathbb{E}(y_{t-1}\epsilon_{t}) \neq 0. OLS will not generally work.

A note on dynamic interpretation.

Incremental Models with Covariates

y_{t} = a_{1} y_{t-1} + \beta X_t + \epsilon_{t}

The problem is fitting and the key issue is white noise residuals post-estimation. But we have to assume a structure and implement it.

Distributed Lag Models

y_{t} = \alpha + \beta_{0} X_t + \beta_{1}x_{t-1} + \ldots + \epsilon_{t}

The impact of x occurs over multiple periods. It relies on theory, or perhaps analysis using information criteria/F [owing to quasi-nesting and missing data]. OLS is a fine solution to this problem but the search space of models is often large.

In response to this problem, we have structured distributed lag models; there are many such schemes.

Koyck/Geometric decay:
short run and long-run effects are parametrically identified y_t = \alpha + \beta(1-\lambda)\sum_{j=0}^{\infty}\lambda^{j}X_{t-j} + \epsilon
Almon (more arbitrary decay) y_{it} = \sum_{t_{A}=0}^{T_{F}} \rho_{t_{A}}x_{t - t_{A}} + \epsilon_{t} with coefficients that are ordinates of some general polynomial of degree T_{F} >> q. The \rho_{t_{A}} = \sum_{k=0}^{T_{F}} \gamma_{k}t^{k}.

Autoregressive Distributed Lag Models

y_{t} = \alpha + \gamma_{1}y_{t-1} + \beta_{0} X_t + \beta_{1}X_{t-1} + \beta_{2}X_{t-2} + \ldots + \epsilon_{t}

OLS is often used if iid; \epsilon_{t} is unrelated to y_{t-1} is common if nonsensical.
If not iid: GLS is needed.
The authors argue that the lagged dependent variable often yields white noise for free. As they also note, there is a deBoef and Keele paper showing the relationship between these models and a form of error correction models. More on that tomorrow.
There is substance to the timing of impacts.

A Cottage Industry of ADL’s

As recently as April of 2025, a paper appeared in the Journal of Politics advocating the use of ADL(2,2). The paper, by Kagalwala and Whitten, called The Answer was There All Along: Worry about the dynamics!. A previous argument was made for the ADL(1,1).

Structural vs. Non-structural

Data analysis can quite yield models comparisons among competing dynamic structures. The key issue is that the analyst need divine the process; what is the relevant error process and what is the structure and timing of effects alongside the potential question of incremental adjustment. We need good theory for that.

Given such theory, we can take an equations as analysis approach, measure the variables, and derive reduced forms, and then recover parameter estimates deploying simultaneous equations methods. Very large such systems were a core part of early empirical macroeconomics. The failures of such systems led to the proposal of alternatives.

Chris Sims suggested a more flexible approach: the VAR.

VAR: Vector AutoRegression

Choose a relevant set of lag lengths and write each variable in the system as a function of lags of itself and other variables to the chosen lengths.¹ For Stata and for R¹

The key insight is that this VAR is the reduced form to some more complicated as yet unspecified structural form.
But if the goal is to specify how variables related to one another and to use data to discover Granger causality and responses to impulse injected in the system.

A very simple example

library(forecast)
mdeaths
fdeaths
save(mdeaths, fdeaths, file = "./img/LungDeaths.RData")

library(hrbrthemes)
load(url("https://github.com/robertwwalker/Essex-Data/raw/main/LungDeaths.RData"))
Males <- mdeaths; Females <- fdeaths
Lung.Deaths <- cbind(Males, Females) %>% as_tsibble()
Lung.Deaths %>% autoplot() + theme_ipsum_rc()

a VAR

lung_deaths <- cbind(mdeaths, fdeaths) %>%
  as_tsibble(pivot_longer = FALSE)
fit <- lung_deaths %>%
  model(VAR(vars(mdeaths, fdeaths) ~ AR(3)))
report(fit)

Series: mdeaths, fdeaths 
Model: VAR(3) w/ mean 

Coefficients for mdeaths:
      lag(mdeaths,1)  lag(fdeaths,1)  lag(mdeaths,2)  lag(fdeaths,2)
              0.6675          0.8074          0.3677         -1.4540
s.e.          0.3550          0.8347          0.3525          0.8088
      lag(mdeaths,3)  lag(fdeaths,3)  constant
              0.2606         -1.1214  538.7817
s.e.          0.3424          0.8143  137.1047

Coefficients for fdeaths:
      lag(mdeaths,1)  lag(fdeaths,1)  lag(mdeaths,2)  lag(fdeaths,2)
              0.2138          0.4563          0.0937         -0.3984
s.e.          0.1460          0.3434          0.1450          0.3328
      lag(mdeaths,3)  lag(fdeaths,3)  constant
              0.0250          -0.315  202.0027
s.e.          0.1409           0.335   56.4065

Residual covariance matrix:
         mdeaths  fdeaths
mdeaths 58985.95 22747.94
fdeaths 22747.94  9983.95

log likelihood = -812.35
AIC = 1660.69   AICc = 1674.37  BIC = 1700.9

fit2 <- lung_deaths %>%
  model(VAR(vars(mdeaths, fdeaths) ~ AR(2)))
report(fit2)

Series: mdeaths, fdeaths 
Model: VAR(2) w/ mean 

Coefficients for mdeaths:
      lag(mdeaths,1)  lag(fdeaths,1)  lag(mdeaths,2)  lag(fdeaths,2)  constant
              0.9610          0.3340          0.1149         -1.3379  443.8492
s.e.          0.3409          0.8252          0.3410          0.7922  124.4608

Coefficients for fdeaths:
      lag(mdeaths,1)  lag(fdeaths,1)  lag(mdeaths,2)  lag(fdeaths,2)  constant
              0.3391          0.2617         -0.0601         -0.2691  145.0546
s.e.          0.1450          0.3510          0.1450          0.3369   52.9324

Residual covariance matrix:
         mdeaths  fdeaths
mdeaths 62599.51 24942.79
fdeaths 24942.79 11322.70

log likelihood = -833.17
AIC = 1694.35   AICc = 1701.98  BIC = 1725.83

fit %>%
  fabletools::forecast(h=12) %>%
  autoplot(lung_deaths)

Female

lung_deaths %>%
model(VAR(vars(mdeaths, fdeaths) ~ AR(3))) %>%
  residuals() %>% 
  pivot_longer(., cols = c(mdeaths,fdeaths)) %>% 
  filter(name=="fdeaths") %>% 
  as_tsibble(index=index) %>% 
  gg_tsdisplay(plot_type = "partial") + labs(title="Female residuals

Male

lung_deaths %>%
model(VAR(vars(mdeaths, fdeaths) ~ AR(3))) %>%
  residuals() %>% 
  pivot_longer(., cols = c(mdeaths,fdeaths)) %>% 
  filter(name=="mdeaths") %>% 
  as_tsibble(index=index) %>% 
  gg_tsdisplay(plot_type = "partial") + labs(title="Male residuals

Easy Impulse Response

What happens if I shock one of the series; how does it work through the system?

The idea behind an impulse-response is core to counterfactual analysis with time series. What does our future world look like and what predictions arise from it and the model we have deployed?

Whether VARs or dynamic linear models or ADL models, these are key to interpreting a model in the real world.

Males

VARMF <- cbind(Males,Females)
mod1 <- vars::VAR(VARMF, p=3, type="const")
plot(vars::irf(mod1, boot=TRUE, impulse="Males"))

Female

plot(vars::irf(mod1, boot=TRUE, impulse="Females"))