Comparing Quantitative Data and Sampling

Linking Probability and Data

Author

Robert W. Walker

Published

March 17, 2026

How’s that done?

load(url("https://github.com/robertwwalker/DADMStuff/raw/master/Big-Week-6-RSpace.RData"))

Paired t-test

Twinsberg

The paired t-test is a single sample average applied to the measured difference. The key is knowing who is matched with whom and believing that there is at least some variation in both outcomes that depends on the unit in question.

There is a linguistic distinction; this is an average difference instead of the previous difference in averages. The language describes the order of operation. For the paired comparison, I must know what to subtract from what.

Take the example of Concrete.

Twelve distinct batches of concrete are subjected to an additive. Does the additive strengthen concrete?

How’s that done?

Concrete %>% datatable()

The Average Difference

For each batch, we can calculate the difference between those with and without an additive. To do so, we need mutate. Let’s plot it.

How’s that done?

library(ResampleProps)
Concrete <- Concrete %>% mutate(Difference = Additive - No.Add)
hist(Concrete$Difference)

Resampling

How’s that done?

Concrete.Diff <- data.frame(Sampled.Mean.Difference = ResampleMean(Concrete$Difference))
quantile(Concrete.Diff$Sampled.Mean.Difference, probs=c(0.025, 0.975))

     2.5%     97.5% 
 54.16667 254.16667

Graphic

How’s that done?

Concrete.Diff %>% ggplot() + aes(x=Sampled.Mean.Difference) + geom_density() + theme_ipsum_rc()

t-test

How’s that done?

t.test(Concrete$Difference)


    One Sample t-test

data:  Concrete$Difference
t = 2.8793, df = 11, p-value = 0.01499
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
  37.29936 279.36730
sample estimates:
mean of x 
 158.3333

With 95% confidence, the additive increases the average break-weight of the concrete by 37.3 to 279.4 pounds.

Is this what Marketing Wants?

Probably not.

How’s that done?

t.test(Concrete$Difference, alt="greater")


    One Sample t-test

data:  Concrete$Difference
t = 2.8793, df = 11, p-value = 0.007495
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
 59.57616      Inf
sample estimates:
mean of x 
 158.3333

In 90 percent of your applications, additive concrete is 60 units stronger.

An Inference Map

With proportions: If we claim a value of \(\pi\), then use it for the standard error. If not, use the data.

With quantities: If comparing them, are samples independent or dependent? All the relevant quantities are known [or assumed] functions of the data.

For differences, the magic number is nearly always zero – no difference.

Code Summary

Discrete

For one proportion:

ResampleProps::ResampleProp(vec.data, k=1000, tab.col=1)
binom.test(x, n, alternative="?", p=0.5, conf.level=0.95)
prop.test(x, n, alternative="?", p=0.5, conf.level=0.95, correct=TRUE)

For two proportions:

prop.test(x=c(10,20), n=c(50, 50), alternative="?", p=0.5, conf.level=0.95, correct=TRUE)

Quantities

For one mean:

ResampleMeans::ResampleMean(vec.data, k=1000)
t.test(x, alternative="?", mu=0, conf.level=0.95)

For two means:

ResampleDiffMeans(vec.1, vec.2, k=1000)
Two types:
- Independent: t.test(x, y, alternative="?", mu=0, conf.level=0.95, paired=FALSE, var.equal=FALSE)
- Paired: t.test(x, y, alternative="?", mu=0, conf.level=0.95, paired=TRUE)

```

A Workflow

Do we want an hypothesis test or a confidence interval?
Choose a level of confidence/significance/probability.
If hypothesis test, derive the relevant critical value by combining the reference distribution with the level of confidence/significance/probability. What value of \(z\) or \(t\) would be required to reject the hypothesis? If an confidence interval, how many standard errors from the mean are covered by the given level of probability.
If an hypothesis test, calculate the test statistic.
Compute the confidence interval or compare the test statistic to the relevant critical value.

Other Formats

Paired t-test

Twinsberg

The Average Difference

Resampling

Graphic

t-test

Is this what Marketing Wants?

An Inference Map

Code Summary

Discrete

Quantities

A Workflow