Today, we will spend extensive time revisiting the logic of hypothesis testing and apply it to relationships among two variables along with a backward looking exploration of existing tests.

Two key bits of data

CADDS.
Berkeley.

Hypotheses must be complete

With binary data, all we have is \(\pi\) or \(\hat{p}\). Whichever we have, we use.

Slides

Link is here

The class plan:

Squares, correlation and regression
a Claude app

Your post-class exercise:

The midterm.
Use an LLM and the suggested materials to grapple with Pearl’s argument.

Hypothesis Testing

How’s that done?

Concrete <- read.csv("data/Concrete.csv")
Concrete

   Batch No.Add Additive Diff
1      1   4550     4600  -50
2      2   4950     4900   50
3      3   6250     6650 -400
4      4   5700     5950 -250
5      5   5350     5700 -350
6      6   5300     5400 -100
7      7   5150     5400 -250
8      8   5800     5850  -50
9      9   4900     4850   50
10    10   6050     6450 -400
11    11   5550     5850 -300
12    12   5750     5600  150

How’s that done?

mean(Concrete$Diff)

[1] -158.3333

How’s that done?

sd(Concrete$Diff)

[1] 190.4938

How’s that done?

sd(Concrete$Diff)/sqrt(12)

[1] 54.99082

How’s that done?

(mean(Concrete$Diff) - 0)/(sd(Concrete$Diff)/sqrt(12))

[1] -2.879269

On Causality

Causation is at the heart of the highest order human reasoning. Doing so with data is an objective if not an end result of modern fascination with machine learning. Yet, these are age old philosophical questions and modern work at the intersection of data and causation is perhaps best exemplified in the work of Judea Pearl. His most recent work, The Book of Why, details a lifetime of investigating causes and causal models at the intersection of computing, philosophy, and statistics. Though wide ranging, his podcast with Lex Fridman is worth listening to. The excerpt on correlation and causation is very useful.

He develops a ladder of causation. This is quite well explained in this two page primer.

Associational
Interventional
Counterfactual

We want to understand precisely how these various levels influence what we learn from data and deploy data to accomplish.

Judea Pearl’s website

The book on statistics and causal inference

A lecture on the Book of Why

Sections 2.1 to 2.10 of the Causal Mixtape are a very succinct read.

Illustrating an Hypothesis Test with the Normal

Let’s take the example of Berkeley. Let’s test the hypothesis that \(\pi=0.5\) first and let’s examine it with 99% confidence.

I will use the class tool to find that percentage.

Anything within 2.576 standard deviations above or below the mean is possible with 99% confidence.

The standard error in this case is

\[\sqrt{\frac{\pi(1-\pi)}{n}}\]

This gives us \(0.5 \pm z*\) 0.0074321.

Or \(0.5 \pm 0.019\).

Anything between 0.481 and 0.519 could be observed if 0.5 is true.

A Single Tail

Let’s take the example of Berkeley. Let’s test the hypothesis that \(\pi \leq 0.5\) against the alternative that it is bigger and let’s examine it with 99% confidence.

I will use the class tool to find that percentage.

Anything within 2.236 standard deviations above the mean is possible with 99% confidence.

The standard error in this case is

\[\sqrt{\frac{\pi(1-\pi)}{n}}\]

This gives us \(0.5 - z*\) 0.0074321.

Or \(0.5 + 2.236*0.019\).

Anything below 0.5166 could be observed if 0.5 or less is true.