Squares and Black Boxes – DADM for Professionals 2025: PDX

Returns to AI?

New Yorker

The squares criterion is broadly applied.

Principal components derives the variable that accounts for the maximum variation in the collection of variables under study.
Regression trees split continuous variables into groupings with minimal variation in the outcomes.
Clustering identifies variance reducing groups in multidimensional data.

Each of these falls under the general guise of machine learning along with the principle of training and testing on data.

A link to the slides for the day.

Slides

Slides-old

The most important piece of code.

How’s that done?

load(url("https://github.com/robertwwalker/DADMStuff/raw/master/RegressionExamples.RData"))
library(radiant)

Loading required package: radiant.data

Loading required package: magrittr

Loading required package: ggplot2

Loading required package: lubridate


Attaching package: 'lubridate'

The following objects are masked from 'package:base':

    date, intersect, setdiff, union

Loading required package: tidyr


Attaching package: 'tidyr'

The following object is masked from 'package:magrittr':

    extract

Loading required package: dplyr


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union


Attaching package: 'radiant.data'

The following objects are masked from 'package:lubridate':

    month, wday

The following object is masked from 'package:ggplot2':

    diamonds

The following object is masked from 'package:magrittr':

    set_attr

The following object is masked from 'package:base':

    date

Loading required package: radiant.design

Loading required package: radiant.basics

Loading required package: radiant.model

Loading required package: radiant.multivariate

How’s that done?

result <- regress(
  EPL, 
  rvar = "Points", 
  evar = "Wage.Bill.milGBP"
)
summary(result)

Linear regression (OLS)
Data     : EPL 
Response variable    : Points 
Explanatory variables: Wage.Bill.milGBP 
Null hyp.: the effect of Wage.Bill.milGBP on Points is zero
Alt. hyp.: the effect of Wage.Bill.milGBP on Points is not zero

                  coefficient std.error t.value p.value    
 (Intercept)           32.116     2.749  11.683  < .001 ***
 Wage.Bill.milGBP       0.240     0.030   8.042  < .001 ***

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-squared: 0.782,  Adjusted R-squared: 0.77 
F-statistic: 64.675 df(1,18), p.value < .001
Nr obs: 20

How’s that done?

library(gvlma)
gvlma(result$model)


Call:
lm(formula = form_upper, data = dataset)

Coefficients:
     (Intercept)  Wage.Bill.milGBP  
         32.1159            0.2402  


ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
Level of Significance =  0.05 

Call:
 gvlma(x = result$model) 

                    Value p-value                Decision
Global Stat        4.8001 0.30843 Assumptions acceptable.
Skewness           0.2378 0.62583 Assumptions acceptable.
Kurtosis           0.1547 0.69412 Assumptions acceptable.
Link Function      1.5649 0.21096 Assumptions acceptable.
Heteroscedasticity 2.8428 0.09178 Assumptions acceptable.