Linking Probability and Data
2026-03-17
The paired t-test is a single sample average applied to the measured difference. The key is knowing who is matched with whom and believing that there is at least some variation in both outcomes that depends on the unit in question.
There is a linguistic distinction; this is an average difference instead of the previous difference in averages. The language describes the order of operation. For the paired comparison, I must know what to subtract from what.
Take the example of Concrete.
Twelve distinct batches of concrete are subjected to an additive. Does the additive strengthen concrete?
For each batch, we can calculate the difference between those with and without an additive. To do so, we need mutate. Let’s plot it.
One Sample t-test
data: Concrete$Difference
t = 2.8793, df = 11, p-value = 0.01499
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
37.29936 279.36730
sample estimates:
mean of x
158.3333
With 95% confidence, the additive increases the average break-weight of the concrete by 37.3 to 279.4 pounds.
Probably not.
One Sample t-test
data: Concrete$Difference
t = 2.8793, df = 11, p-value = 0.007495
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
59.57616 Inf
sample estimates:
mean of x
158.3333
In 90 percent of your applications, additive concrete is 60 units stronger.
With proportions: If we claim a value of \pi, then use it for the standard error. If not, use the data.
With quantities: If comparing them, are samples independent or dependent? All the relevant quantities are known [or assumed] functions of the data.
For differences, the magic number is nearly always zero – no difference.
For one proportion:
ResampleProps::ResampleProp(vec.data, k=1000, tab.col=1)binom.test(x, n, alternative="?", p=0.5, conf.level=0.95)prop.test(x, n, alternative="?", p=0.5, conf.level=0.95, correct=TRUE)For two proportions:
prop.test(x=c(10,20), n=c(50, 50), alternative="?", p=0.5, conf.level=0.95, correct=TRUE)For one mean:
ResampleMeans::ResampleMean(vec.data, k=1000)t.test(x, alternative="?", mu=0, conf.level=0.95)For two means:
ResampleDiffMeans(vec.1, vec.2, k=1000)
Two types:
t.test(x, y, alternative="?", mu=0, conf.level=0.95, paired=FALSE, var.equal=FALSE)t.test(x, y, alternative="?", mu=0, conf.level=0.95, paired=TRUE)```
Do we want an hypothesis test or a confidence interval?
Choose a level of confidence/significance/probability.
If hypothesis test, derive the relevant critical value by combining the reference distribution with the level of confidence/significance/probability. What value of z or t would be required to reject the hypothesis? If an confidence interval, how many standard errors from the mean are covered by the given level of probability.
If an hypothesis test, calculate the test statistic.
Compute the confidence interval or compare the test statistic to the relevant critical value.

BUS 1301-SP26