Day 10: Wrapping Up

Robert W. Walker

2025-08-01

Outline for Day 10:

  1. Implementation of DPD/GMM
  2. Conclusions of Panel GLM
  3. TWFE
  4. Causal Inference in Panel Data

On Diff in Diff

A very useful paper I just ran across on Andrew Baker’s website. The Box note cites a post but he took his netlify site down. This paper is a really nice practical introduction.

The Data for Implementation of GMM

Contains data from abdata.dta
  obs:         1,031                          Layard & Nickell, Unemployment
                                                in Britain, Economica 53, 1986
 ------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
ind             int    %8.0g                  industry
year            int    %8.0g
emp             float  %9.0g                  employment
wage            float  %9.0g                  real wage
cap             float  %9.0g                  gross capital stock
indoutpt        float  %9.0g                  industry output
n               float  %9.0g                  log(employment)
w               float  %9.0g                  log(real wage)
k               float  %9.0g                  log(gross capital stock)
ys              float  %9.0g                  log(industry output)
yr1980          float  %9.0g
yr1981          float  %9.0g
yr1982          float  %9.0g
yr1983          float  %9.0g
yr1984          float  %9.0g
id              float  %9.0g                  firm ID
-------------------------------------------------------------------------------
Sorted by:  id  year

Implementation

  • \texttt{xtregar}: , \texttt{re} and \texttt{fe} options

    • Fit a first order autoregressive structure to TSCS data.
    • Defaults to an iterative estimator but \texttt{twostep} is available.
    • \texttt{lbi} gives a test of the hypothesis that \rho is zero. (not a default)
  • \texttt{xtabond}

    • \texttt{estat abond} gives a test for autocorrelation
    • \texttt{estat sargan} gives the overidentifying restrictions test
  • \texttt{xtlsdvc y x, initial(ah or ab or bb) vcov(1000 bs iter)} will handle unbalanced \ Bias-corrected least-squares dummy variable (LSDV) estimators for the standard autoregressive panel-data model using the bias approximations in Bruno (2005a) for unbalanced panels

  • \texttt{xtivreg}

  • \texttt{xtdpd} fits Arellano-Bond and Arellano-Bover/Blundell-Bond

    • \texttt{estat abond} gives a test for autocorrelation
    • \texttt{estat sargan} gives the overidentifying restrictions test (Rejection implies failure of assumptions)

More on DPD

  • David Roodman’s excellent and well documented xtabond2 extends the Stata command and incorporates orthogonal deviations transformation that assist in gapped panels. I personally think it is the best software for this.
  • Systems DPD is complicated but perhaps very useful.
  • As an aside, I laughed pretty hard at a post on econ job rumours where someone claimed that no one actually understands these models! [Not true, I am positive that Hansen does…..]

                     firm      year    sector         emp       wage    capital
Grand mean        73.2037 1979.6508    5.1232      7.8917    23.9188     2.5074
S.D.              41.2333    2.2161    2.6781     15.9349     5.6484     6.2487
TSS          1751193.2260 5058.2968 7387.3560 261539.3894 32861.7647 40217.7902
Between S.D.      40.5586    0.6001    2.6774     16.1689     5.1840     6.1048
BSS          1751193.2260  368.2968 7387.3560 256508.7790 28458.3312 39065.0725
Within S.D.        0.0000    2.1339    0.0000      2.2100     2.0677     1.0579
WSS                0.0000 4690.0000    0.0000   5030.6104  4403.4335  1152.7176
% Within           0.0000    0.9272    0.0000      0.0192     0.1340     0.0287
                  output
Grand mean      103.8012
S.D.              9.9380
TSS          101726.9240
Between S.D.      4.3649
BSS           19218.0111
Within S.D.       8.9502
WSS           82508.9129
% Within          0.8111

# To make it match the Stata data.
EmplUK$n <- log(EmplUK$emp)
EmplUK$w <- log(EmplUK$wage)
EmplUK$k <- log(EmplUK$capital)
EmplUK$ys <- log(EmplUK$output)

# Can just use log syntax to solve it.
# Arellano and Bond (1991), table 4(a1) 
Table4.a1 <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + lag(log(capital), 0:2) + lag(log(output), 0:2) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "onestep")
summary(Table4.a1)

Twoways effects One-step model Difference GMM 

Call:
pgmm(formula = log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 
    0:1) + lag(log(capital), 0:2) + lag(log(output), 0:2) | lag(log(emp), 
    2:99), data = EmplUK, effect = "twoways", model = "onestep")

Unbalanced Panel: n = 140, T = 7-9, N = 1031

Number of Observations Used: 611
Residuals:
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.6006508 -0.0299498  0.0000000 -0.0001193  0.0311461  0.5693264 

Coefficients:
                         Estimate Std. Error z-value  Pr(>|z|)    
lag(log(emp), 1:2)1      0.686226   0.144594  4.7459 2.076e-06 ***
lag(log(emp), 1:2)2     -0.085358   0.056016 -1.5238 0.1275510    
lag(log(wage), 0:1)0    -0.607821   0.178205 -3.4108 0.0006478 ***
lag(log(wage), 0:1)1     0.392623   0.167993  2.3371 0.0194319 *  
lag(log(capital), 0:2)0  0.356846   0.059020  6.0462 1.483e-09 ***
lag(log(capital), 0:2)1 -0.058001   0.073180 -0.7926 0.4280206    
lag(log(capital), 0:2)2 -0.019948   0.032713 -0.6098 0.5420065    
lag(log(output), 0:2)0   0.608506   0.172531  3.5269 0.0004204 ***
lag(log(output), 0:2)1  -0.711164   0.231716 -3.0691 0.0021469 ** 
lag(log(output), 0:2)2   0.105798   0.141202  0.7493 0.4536974    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(25) = 48.74983 (p-value = 0.0030295)
Autocorrelation test (1): normal = -3.599593 (p-value = 0.00031872)
Autocorrelation test (2): normal = -0.5160282 (p-value = 0.60583)
Wald test for coefficients: chisq(10) = 408.2859 (p-value = < 2.22e-16)
Wald test for time dummies: chisq(6) = 11.57904 (p-value = 0.072046)

## Arellano and Bond (1991), table 4b 
Table4.b <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1)
           + log(capital) + lag(log(output), 0:1) | lag(log(emp), 2:99),
            data = EmplUK, effect = "twoways", model = "twosteps")
# To make it match Stata
summary(Table4.b, robust=FALSE)

Twoways effects Two-steps model Difference GMM 

Call:
pgmm(formula = log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 
    0:1) + log(capital) + lag(log(output), 0:1) | lag(log(emp), 
    2:99), data = EmplUK, effect = "twoways", model = "twosteps")

Unbalanced Panel: n = 140, T = 7-9, N = 1031

Number of Observations Used: 611
Residuals:
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.6190677 -0.0255683  0.0000000 -0.0001339  0.0332013  0.6410272 

Coefficients:
                        Estimate Std. Error  z-value  Pr(>|z|)    
lag(log(emp), 1:2)1     0.474151   0.085303   5.5584 2.722e-08 ***
lag(log(emp), 1:2)2    -0.052967   0.027284  -1.9413 0.0522200 .  
lag(log(wage), 0:1)0   -0.513205   0.049345 -10.4003 < 2.2e-16 ***
lag(log(wage), 0:1)1    0.224640   0.080063   2.8058 0.0050192 ** 
log(capital)            0.292723   0.039463   7.4177 1.191e-13 ***
lag(log(output), 0:1)0  0.609775   0.108524   5.6188 1.923e-08 ***
lag(log(output), 0:1)1 -0.446373   0.124815  -3.5763 0.0003485 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(25) = 30.11247 (p-value = 0.22011)
Autocorrelation test (1): normal = -2.427829 (p-value = 0.01519)
Autocorrelation test (2): normal = -0.3325401 (p-value = 0.73948)
Wald test for coefficients: chisq(7) = 371.9877 (p-value = < 2.22e-16)
Wald test for time dummies: chisq(6) = 26.9045 (p-value = 0.0001509)

# Or with Robust [Notice it is default]
summary(Table4.b)

Twoways effects Two-steps model Difference GMM 

Call:
pgmm(formula = log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 
    0:1) + log(capital) + lag(log(output), 0:1) | lag(log(emp), 
    2:99), data = EmplUK, effect = "twoways", model = "twosteps")

Unbalanced Panel: n = 140, T = 7-9, N = 1031

Number of Observations Used: 611
Residuals:
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.6190677 -0.0255683  0.0000000 -0.0001339  0.0332013  0.6410272 

Coefficients:
                        Estimate Std. Error z-value  Pr(>|z|)    
lag(log(emp), 1:2)1     0.474151   0.185398  2.5575 0.0105437 *  
lag(log(emp), 1:2)2    -0.052967   0.051749 -1.0235 0.3060506    
lag(log(wage), 0:1)0   -0.513205   0.145565 -3.5256 0.0004225 ***
lag(log(wage), 0:1)1    0.224640   0.141950  1.5825 0.1135279    
log(capital)            0.292723   0.062627  4.6741 2.953e-06 ***
lag(log(output), 0:1)0  0.609775   0.156263  3.9022 9.530e-05 ***
lag(log(output), 0:1)1 -0.446373   0.217302 -2.0542 0.0399605 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(25) = 30.11247 (p-value = 0.22011)
Autocorrelation test (1): normal = -1.53845 (p-value = 0.12394)
Autocorrelation test (2): normal = -0.2796829 (p-value = 0.77972)
Wald test for coefficients: chisq(7) = 142.0353 (p-value = < 2.22e-16)
Wald test for time dummies: chisq(6) = 16.97046 (p-value = 0.0093924)

## Blundell and Bond (1998) table 4
Table4.BB <- pgmm(log(emp) ~ lag(log(emp), 1)+ lag(log(wage), 0:1) +
           lag(log(capital), 0:1) | lag(log(emp), 2:99) +
           lag(log(wage), 2:99) + lag(log(capital), 2:99),        
           data = EmplUK, effect = "twoways", model = "onestep", 
           transformation = "ld") 
summary(Table4.BB, robust = TRUE)

Twoways effects One-step model System GMM 

Call:
pgmm(formula = log(emp) ~ lag(log(emp), 1) + lag(log(wage), 0:1) + 
    lag(log(capital), 0:1) | lag(log(emp), 2:99) + lag(log(wage), 
    2:99) + lag(log(capital), 2:99), data = EmplUK, effect = "twoways", 
    model = "onestep", transformation = "ld")

Unbalanced Panel: n = 140, T = 7-9, N = 1031

Number of Observations Used: 1642
Residuals:
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.7530341 -0.0369030  0.0000000  0.0002882  0.0466069  0.6001503 

Coefficients:
                         Estimate Std. Error z-value  Pr(>|z|)    
lag(log(emp), 1)         0.935605   0.026295 35.5810 < 2.2e-16 ***
lag(log(wage), 0:1)0    -0.630976   0.118054 -5.3448 9.050e-08 ***
lag(log(wage), 0:1)1     0.482620   0.136887  3.5257 0.0004224 ***
lag(log(capital), 0:1)0  0.483930   0.053867  8.9838 < 2.2e-16 ***
lag(log(capital), 0:1)1 -0.424393   0.058479 -7.2572 3.952e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(100) = 118.763 (p-value = 0.097096)
Autocorrelation test (1): normal = -4.808434 (p-value = 1.5212e-06)
Autocorrelation test (2): normal = -0.2800133 (p-value = 0.77947)
Wald test for coefficients: chisq(5) = 11174.82 (p-value = < 2.22e-16)
Wald test for time dummies: chisq(7) = 14.71138 (p-value = 0.039882)

References

The manual for R package \texttt{plm} was published in the Journal of Statistical Software. It is nice and extensive excepting the application of dpd models; only some are available. There is also panelvar for panel VAR models that estimates even more dpd estimators. Kit Baum has a very nice discussion of this in Stata in a set of course slides on the web at Boston College [search google for Baum Dynamic Panel Data Estimators].

A Brief Point on FEVD

Plumper and Troeger have designed a procedure to solve one of the principle problems that arises in fixed effects regressions: it is either impossible or suboptimal to estimate the effects of time-invariant or nearly time-invariant regressors. Their approach plays off of the generic consistency of the fixed effects estimator. In general, they begin by estimating an LSDV model. y_{it} = \alpha_{i} + X_{it}\beta + \epsilon_{it} They then proceed to model the unit effects as a function of (largely) time-invariant regressors that they denote as Z \alpha_{i} = Z_{i}\gamma + \psi_{i} In a third stage, they then construct the regression with an offset. In effect, they take the offset and add it to the regression such as, y_{it} = \psi_{i} + X_{it}\beta + Z_{i}\gamma + \nu_{it} and adjust the variance/covariance matrix of the errors accordingly.

Some Summary Remarks on Philosophy

  • Poolers begin with the assumption that the data can be pooled and the composite/population averaged inference is general.
  • Time series scholars begin with the belief that things should only be pooled with evidence in support of pooling.
  • These are beliefs and proclivities more than insights from general rules.
  • Substance and theory should drive model choices; the reverse is ridiculous given that we assume the right model.
  • I am surprised that GEE type models do not get far more play because of the typical nature of social science theorizing these days.

On Dynamics in Panel GLM

\texttt{xtgee} syntax

\texttt{xtgee} operates off of the GLM family and link function ideas. For example, probits and logits are family (binomial) with a probit or logit link. The key issue becomes specifying a working correlation matrix (within-groups/units) from among the options of exchangeable, independent, unstructured, fixed (must be user specified), ar (of order), stationary (of order), and nonstationary (of order).

A Within-Between GEE

panelr estimates a Within-Between gee regression that extends the logic of GEE to limited outcomes.

Binary Dynamics

There are four classes of discrete time series models that we might use for incorporating dynamics for binary observations varying across both time and space. These get some treatment in the paper by Beck, et. al.

  • Latent dependence (Dynamic Linear Models)
  • State dependence (Markov Processes)
  • Autoregressive disturbances
  • Duration (survival models and isomorphisms)

Latent Dependence

Carry on the setup from yesterday.

u_{i,t}^{*} = X\beta + \rho u^{*}_{i,t-1} + \epsilon_{it}

This is the analog of a lagged dependent variable regression fit in the latent space rather than the observed data. Such models are probably easiest to fit using Bayesian data augmentation.

Autoregressive Errors and Serial Correlation

u_{i,t}^{*} = X\beta + \epsilon_{it} \epsilon_{i,t} = \rho \epsilon_{i,t-1} + \nu_{t}

where \nu are i.i.d. The model is odd in the sense that a shock to X dies immediately but a shock to an omitted thing has dynamic impacts. There are some suggested tests for serial correlation. We will implement one of them that employs the generalized residual. The idea is similar to what we have seen before. Here, we have two outcome values and two possible generalized residuals. We either have the density over the CDF or the negative of the density over one minus the CDF. Then we want the covariance in time of the generalized residuals and need to calculate a variance given as V(s) = \sum_{t=2}^{T} \frac{\phi_{t}^{2} \phi^{2}_{t-1}}{\Phi_{t}(1-\Phi_{t})\Phi_{t-1}(1-\Phi_{t-1})} We could apply this individually or collectively to the whole set with N summations added to the mix. One can show that the covariance over the square root of V(s) has an asymptotic normal distribution.

BKT 1998

Beck, Katz, and Tucker (1998) point out that BTSCS are grouped duration data. Indeed, a cloglog discrete choice model is a Cox proportional hazards model. They are not similar, like each other, whatever. They are isomorphic. One can leverage this to do something about the temporal evolution of binary processes. The details are in the lab on Box. As an addition to this, Carter and Signorino have a compelling argument that instead of a completely saturated set of dummy variables for time since event, it is generically superior to use the Taylor series of time, time squared, and time cubed.

Markov Processes

Markov processes extend to a general class of discrete events observed through time and across units. While the reading discusses the binary case, extensions for ordered and multinomial events are straightforward. I will show two examples.

\mathbf{P} = \left(\begin{array}{cccc}\pi_{11} & \pi_{12} &\ldots & \pi_{1J} \\ \pi_{21} & \ddots & \hdots & \vdots \\ \vdots & \ddots & \hdots & \vdots \\ \pi_{J1} & \pi_{J2} & \hdots & \pi_{JJ}\end{array}\right)

  • Rows represent s^{t}: the state up to time t
  • Columns represent y^{t}
  • Rows sum to unity

The idea is that the current outcome depends on covariates and the prior state. We can do a lot with that.

Implementation

Is trivial. It is a discrete choice model with interactions between the covariates [however structured in time] and the prior state. By construction, it treats heterogeneity as uniformly a function of the prior state, at least in the simplest cast. It also enables to distinct classes of tests. Do the effects of a covariate depend on the prior state and what is the effect of some change in a covariate given an assumed state. The range of counterfactuals is not small.

Some General Comments on Panel GLM

  • One has to be careful with these extensions of standard linear models. Ex. Random effects probit and fixed effects logit.
  • The orthogonality of the random effects and the regressors is maintained.
  • In most cases, the real trouble is incidental parameters. That may not be as harsh as it initially seems. William Greene has an interesting argument about this in his paper, Estimating Econometric Models with Fixed Effects.

To My Examples

Questions that arise:

  • What do the state dependent parameters represent? Interpreting interaction terms.
  • Do the effects of a given variable depend on the prior state?
  • Is the effect given a prior state differentiable from zero?
  • How do we calculate these things?

Two Way Fixed Effects and Causal Inference

Some DiD and TWFE

DiD as the double difference from Andrew C. Baker

TWFE

Equivalence

The Problem

Glynn and Blackwell

Repeated measurements of the same countries, people, or groups over time are vital to many fields of political science. These measurements, sometimes called time-series cross-sectional (TSCS) data, allow researchers to estimate a broad set of causal quantities, including contemporaneous and lagged treatment effects. Unfortunately, popular methods for TSCS data can only produce valid inferences for lagged effects under very strong assumptions. In this paper, we use potential outcomes to define causal quantities of interest in this settings and clarify how standard models like the autoregressive distributed lag model can produce biased estimates of these quantities due to post-treatment conditioning. We then describe two estimation strategies that avoid these post-treatment biases-inverse probability weighting and structural nested mean models-and show via simulations that they can outperform standard approaches in small sample settings.

Imai and Kim

Many researchers use unit fixed effects regression models as their default methods for causal inference with longitudinal data. We show that the ability of these models to adjust for unobserved time-invariant confounders comes at the expense of dynamic causal relationships, which are allowed to exist under an alternative selection-on-observables approach. Using the nonparametric directed acyclic graph, we highlight the two key causal identification assumptions of fixed effects models: past treatments do not directly influence current outcome, and past outcomes do not affect current treatment. Furthermore, we introduce a new nonparametric matching framework that elucidates how various fixed effects models implicitly compare treated and control observations to draw causal inference. By establishing the equivalence between matching and weighted fixed effects estimators, this framework enables a diverse set of identification strategies to adjust for unobservables provided that the treatment and outcome variables do not influence each other over time.

Wooldridge

Two-way fixed effects works quite well so long as we are careful about what we measure and what the fixed effects capture.

Let’s focus on 3.3.

A brief review of Mundlak

As typically written, the Mundlak estimator is presented [by Baltagi and Mundlak] as a random effects regression that must satisfy the random effects moment condition, e.g. \mathbb{E}(\alpha_{i}X_{it}) = 0 by including regressors capturing the time invariant unit averages. The regression is (with K regressors):

y_{it} = X_{it}\beta + \overline{x}_{i}\beta_{k} + (\alpha_{i} + \epsilon_{it})

where the parenthetical is the resultant error term consisting of the unit random effects and the IID error. The general idea is to simply include the between information that could be correlated with the fixed effects [in econometrics language]. Mundlak shows we recover the fixed effects or within estimator from \beta.

Two Way Mundlak

Baltagi’s equation 8 [or Wooldridge 2021]

y_{it} = X_{it}\beta + \overline{x}_{i}\beta_{i} + \overline{x}_{t}\beta_{t} + (\alpha_{i} + \alpha_{t} + \epsilon_{it})

Baltagi’s discussion on page 8 shows F tests of the unit or time averages [or both] can be used to examine whether it is the time-averaged or unit-averaged, or both, that violate the random effects moment conditions we impose.

The major contribution of Wooldridge and Baltagi is to show that OLS applied to this problem is equivalent to GLS estimation.

Some Concluding Remarks

Plumper, et. al. 2005 point out that specification issues matter, alot.

  • Absorbing cross-sectional variance by unit dummies.
  • Absorbing time-series variance with lagged DV
  • Lag structure matters
  • Slope heterogeneity is a relevant consideration

Findings may not be at all robust.

My Own View

My own view of this, to borrow a phrase but use it a bit differently than the original authors, is to think of models as treatments with our data as the subject. We make one set of assumptions and we treat our subject. Change that around a bit and treat again. Do it a third time and so on and so on. In the end, we have sets of models related by subtle differences in assumptions about the process that generated the data and estimates obtained across models toward this end. Our inferential process should be inherently Bayesian in the sense that we update the strength of conclusions on the basis of findings differing in predictable ways given these differing sets of assumptions.

There is no single right model or magic bullet for diagnosing an unknown data generating process.