Claude on Models
Overview
This document illustrates how ordinary least squares (OLS) regression applies to cost accounting. We simulate 52 weeks of production data, estimate a cost function, and explore the regression output interactively.
The data-generating process (DGP) is:
\[\text{Cost}_t = \underbrace{16{,}750}_{\text{fixed overhead}} + \underbrace{150}_{\beta_1} \times \text{Units}_t + \underbrace{\varepsilon_t}_{\text{error}}\]
where \(\varepsilon_t \sim \mathcal{N}(16{,}750,\; 4{,}000^2)\) and \(\text{Units}_t\) follows a Poisson distribution with arrival rate \(\lambda = 2{,}000\) truncated below at \(1{,}500\).
The error term has a non-zero mean of $16,750, so the effective DGP intercept is \(16{,}750 + 16{,}750 = \$33{,}500\). The OLS estimator targets this combined value. Managers should distinguish the accounting fixed cost ($16,750) from the statistical intercept estimated by regression.
Parameters
| Parameter | Symbol | Value |
|---|---|---|
| Poisson arrival rate | \(\lambda\) | 2,000 units |
| Truncation lower bound | — | 1,500 units |
| Variable cost per unit | \(\beta_1\) | $150 |
| Fixed overhead | \(\beta_0^{\text{acct}}\) | $16,750 |
| Error mean | \(\mu_\varepsilon\) | $16,750 |
| Error std dev | \(\sigma_\varepsilon\) | $4,000 |
| Effective DGP intercept | \(\beta_0^{\text{DGP}}\) | $33,500 |
| Sample size | \(n\) | 52 weeks |
Interactive explorer
The widget below generates a fresh 52-week sample and provides four analytical views. Use the New sample button to resample and observe how estimates vary.
Observations
52
R²
—
Slope (β₁)
—
Intercept (β₀)
—
Residual mean
—
Residual std dev
—
RMSE
—
Drag slider to set production volume and read off cost forecast.
OLS prediction
—
95% PI lower
—
95% PI upper
—
Decomposition
—
Conceptual notes
Scatter plot
Each point is one week. The OLS line minimises the sum of squared vertical distances from each point to the line. Key comparisons:
- OLS fit vs True DGP — in practice the true line is never observed; this toggle shows how well estimation recovers it from one year of data.
- 95% CI band — confidence in the mean cost at a given volume level, narrowest near \(\bar{x}\) and widening at the extremes.
Residuals diagnostic
A well-specified model shows residuals randomly scattered around zero with constant spread. Systematic patterns suggest:
- Fan shape — heteroskedasticity (variance grows with output)
- Curve — non-linearity (a quadratic term may help)
- Drift over time — an omitted seasonal or trend variable
Distributions
| Panel | What it shows |
|---|---|
| Units | Left boundary at 1,500 visible; Poisson shape above |
| Total costs | Driven by both volume variation and the error term |
| Residuals | Should look approximately bell-shaped, centred near zero |
Prediction intervals
The 95% prediction interval (PI) is wider than a confidence interval (CI) because it covers a single future week, not the mean:
\[\hat{y} \pm t_{n-2,\,0.025} \cdot \hat{\sigma}\sqrt{1 + \frac{1}{n} + \frac{(x^* - \bar{x})^2}{\sum(x_i - \bar{x})^2}}\]
Key formula reference
| Component | True value | OLS target |
|---|---|---|
| Intercept \(\beta_0\) | $33,500 | Varies by sample |
| Slope \(\beta_1\) | $150 / unit | Varies by sample |
| Error std dev \(\sigma\) | $4,000 | RMSE \(\approx\) $4,000 |
Resample repeatedly to observe how much \(\hat{\beta}_0\), \(\hat{\beta}_1\), and \(R^2\) fluctuate. Regression coefficients are themselves random variables.
Simulated data — not real company figures.