Claude on Models

Author

Robert W. Walker

Published

March 12, 2026

Overview

This document illustrates how ordinary least squares (OLS) regression applies to cost accounting. We simulate 52 weeks of production data, estimate a cost function, and explore the regression output interactively.

The data-generating process (DGP) is:

\[\text{Cost}_t = \underbrace{16{,}750}_{\text{fixed overhead}} + \underbrace{150}_{\beta_1} \times \text{Units}_t + \underbrace{\varepsilon_t}_{\text{error}}\]

where $\varepsilon_t \sim \mathcal{N}(16{,}750,\; 4{,}000^2)$ and $\text{Units}_t$ follows a Poisson distribution with arrival rate $\lambda = 2{,}000$ truncated below at $1{,}500$.

Intercept interpretation

The error term has a non-zero mean of $16,750, so the effective DGP intercept is $16{,}750 + 16{,}750 = \$33{,}500$. The OLS estimator targets this combined value. Managers should distinguish the accounting fixed cost ($16,750) from the statistical intercept estimated by regression.

Parameters

Simulation parameters
Parameter	Symbol	Value
Poisson arrival rate	$\lambda$	2,000 units
Truncation lower bound	—	1,500 units
Variable cost per unit	$\beta_1$	$150
Fixed overhead	$\beta_0^{\text{acct}}$	$16,750
Error mean	$\mu_\varepsilon$	$16,750
Error std dev	$\sigma_\varepsilon$	$4,000
Effective DGP intercept	$\beta_0^{\text{DGP}}$	$33,500
Sample size	$n$	52 weeks

Interactive explorer

The widget below generates a fresh 52-week sample and provides four analytical views. Use the New sample button to resample and observe how estimates vary.

Observations

R²

—

Slope (β₁)

—

Intercept (β₀)

—

Show: OLS fit True DGP 95% CI band

X-axis:

Residual mean

—

Residual std dev

—

RMSE

—

Drag slider to set production volume and read off cost forecast.

Units: 2,000

OLS prediction

—

95% PI lower

—

95% PI upper

—

Decomposition

—

Conceptual notes

Scatter plot

Each point is one week. The OLS line minimises the sum of squared vertical distances from each point to the line. Key comparisons:

OLS fit vs True DGP — in practice the true line is never observed; this toggle shows how well estimation recovers it from one year of data.
95% CI band — confidence in the mean cost at a given volume level, narrowest near $\bar{x}$ and widening at the extremes.

Residuals diagnostic

What to look for

A well-specified model shows residuals randomly scattered around zero with constant spread. Systematic patterns suggest:

Fan shape — heteroskedasticity (variance grows with output)
Curve — non-linearity (a quadratic term may help)
Drift over time — an omitted seasonal or trend variable

Distributions

Distribution panel guide
Panel	What it shows
Units	Left boundary at 1,500 visible; Poisson shape above
Total costs	Driven by both volume variation and the error term
Residuals	Should look approximately bell-shaped, centred near zero

Prediction intervals

The 95% prediction interval (PI) is wider than a confidence interval (CI) because it covers a single future week, not the mean:

\[\hat{y} \pm t_{n-2,\,0.025} \cdot \hat{\sigma}\sqrt{1 + \frac{1}{n} + \frac{(x^* - \bar{x})^2}{\sum(x_i - \bar{x})^2}}\]

Key formula reference

Parameter truth vs. estimation
Component	True value	OLS target
Intercept $\beta_0$	$33,500	Varies by sample
Slope $\beta_1$	$150 / unit	Varies by sample
Error std dev $\sigma$	$4,000	RMSE $\approx$ $4,000

Sampling variability

Resample repeatedly to observe how much $\hat{\beta}_0$, $\hat{\beta}_1$, and $R^2$ fluctuate. Regression coefficients are themselves random variables.

Simulated data — not real company figures.