Squares Scale – DADM for Professionals 2025: SLE

A link to the slides for the day.

`gvlma`

The most important piece of code.

How’s that done?

load(url("https://github.com/robertwwalker/DADMStuff/raw/master/RegressionExamples.RData"))
library(radiant)
result <- regress(
  EPL, 
  rvar = "Points", 
  evar = "Wage.Bill.milGBP"
)
summary(result)

Linear regression (OLS)
Data     : EPL 
Response variable    : Points 
Explanatory variables: Wage.Bill.milGBP 
Null hyp.: the effect of Wage.Bill.milGBP on Points is zero
Alt. hyp.: the effect of Wage.Bill.milGBP on Points is not zero

                  coefficient std.error t.value p.value    
 (Intercept)           32.116     2.749  11.683  < .001 ***
 Wage.Bill.milGBP       0.240     0.030   8.042  < .001 ***

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-squared: 0.782,  Adjusted R-squared: 0.77 
F-statistic: 64.675 df(1,18), p.value < .001
Nr obs: 20

How’s that done?

library(gvlma)
gvlma(result$model)


Call:
lm(formula = form_upper, data = dataset)

Coefficients:
     (Intercept)  Wage.Bill.milGBP  
         32.1159            0.2402  


ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
Level of Significance =  0.05 

Call:
 gvlma(x = result$model) 

                    Value p-value                Decision
Global Stat        4.8001 0.30843 Assumptions acceptable.
Skewness           0.2378 0.62583 Assumptions acceptable.
Kurtosis           0.1547 0.69412 Assumptions acceptable.
Link Function      1.5649 0.21096 Assumptions acceptable.
Heteroscedasticity 2.8428 0.09178 Assumptions acceptable.

The Death of Online Surveys

PNAS

Measuring AI and Water

Empire of AI

Returns to AI?

New Yorker

The squares criterion is broadly applied.

Principal components derives the variable that accounts for the maximum variation in the collection of variables under study.
Regression trees split continuous variables into groupings with minimal variation in the outcomes.
Clustering identifies variance reducing groups in multidimensional data.

Each of these falls under the general guise of machine learning along with the principle of training and testing on data.