Claude on Models

Author

Robert W. Walker

Published

March 12, 2026

Overview

This document illustrates how ordinary least squares (OLS) regression applies to cost accounting. We simulate 52 weeks of production data, estimate a cost function, and explore the regression output interactively.

The data-generating process (DGP) is:

\[\text{Cost}_t = \underbrace{16{,}750}_{\text{fixed overhead}} + \underbrace{150}_{\beta_1} \times \text{Units}_t + \underbrace{\varepsilon_t}_{\text{error}}\]

where \(\varepsilon_t \sim \mathcal{N}(16{,}750,\; 4{,}000^2)\) and \(\text{Units}_t\) follows a Poisson distribution with arrival rate \(\lambda = 2{,}000\) truncated below at \(1{,}500\).

NoteIntercept interpretation

The error term has a non-zero mean of $16,750, so the effective DGP intercept is \(16{,}750 + 16{,}750 = \$33{,}500\). The OLS estimator targets this combined value. Managers should distinguish the accounting fixed cost ($16,750) from the statistical intercept estimated by regression.


Parameters

Simulation parameters
Parameter Symbol Value
Poisson arrival rate \(\lambda\) 2,000 units
Truncation lower bound 1,500 units
Variable cost per unit \(\beta_1\) $150
Fixed overhead \(\beta_0^{\text{acct}}\) $16,750
Error mean \(\mu_\varepsilon\) $16,750
Error std dev \(\sigma_\varepsilon\) $4,000
Effective DGP intercept \(\beta_0^{\text{DGP}}\) $33,500
Sample size \(n\) 52 weeks

Interactive explorer

The widget below generates a fresh 52-week sample and provides four analytical views. Use the New sample button to resample and observe how estimates vary.

Observations

52

Slope (β₁)

Intercept (β₀)

Show:
X-axis:

Residual mean

Residual std dev

RMSE

Drag slider to set production volume and read off cost forecast.

2,000

OLS prediction

95% PI lower

95% PI upper

Decomposition


Conceptual notes

Scatter plot

Each point is one week. The OLS line minimises the sum of squared vertical distances from each point to the line. Key comparisons:

  • OLS fit vs True DGP — in practice the true line is never observed; this toggle shows how well estimation recovers it from one year of data.
  • 95% CI band — confidence in the mean cost at a given volume level, narrowest near \(\bar{x}\) and widening at the extremes.

Residuals diagnostic

TipWhat to look for

A well-specified model shows residuals randomly scattered around zero with constant spread. Systematic patterns suggest:

  • Fan shape — heteroskedasticity (variance grows with output)
  • Curve — non-linearity (a quadratic term may help)
  • Drift over time — an omitted seasonal or trend variable

Distributions

Distribution panel guide
Panel What it shows
Units Left boundary at 1,500 visible; Poisson shape above
Total costs Driven by both volume variation and the error term
Residuals Should look approximately bell-shaped, centred near zero

Prediction intervals

The 95% prediction interval (PI) is wider than a confidence interval (CI) because it covers a single future week, not the mean:

\[\hat{y} \pm t_{n-2,\,0.025} \cdot \hat{\sigma}\sqrt{1 + \frac{1}{n} + \frac{(x^* - \bar{x})^2}{\sum(x_i - \bar{x})^2}}\]


Key formula reference

Parameter truth vs. estimation
Component True value OLS target
Intercept \(\beta_0\) $33,500 Varies by sample
Slope \(\beta_1\) $150 / unit Varies by sample
Error std dev \(\sigma\) $4,000 RMSE \(\approx\) $4,000
WarningSampling variability

Resample repeatedly to observe how much \(\hat{\beta}_0\), \(\hat{\beta}_1\), and \(R^2\) fluctuate. Regression coefficients are themselves random variables.


Simulated data — not real company figures.