Squares and Black Boxes

Covariance Structures

Author

Robert W. Walker

Published

November 12, 2025

Returns to AI?

The squares criterion is broadly applied.

  • Principal components derives the variable that accounts for the maximum variation in the collection of variables under study.

  • Regression trees split continuous variables into groupings with minimal variation in the outcomes.

  • Clustering identifies variance reducing groups in multidimensional data.

Each of these falls under the general guise of machine learning along with the principle of training and testing on data.

The most important piece of code.

How’s that done?
load(url("https://github.com/robertwwalker/DADMStuff/raw/master/RegressionExamples.RData"))
library(radiant)
Loading required package: radiant.data
Loading required package: magrittr
Loading required package: ggplot2
Loading required package: lubridate

Attaching package: 'lubridate'
The following objects are masked from 'package:base':

    date, intersect, setdiff, union
Loading required package: tidyr

Attaching package: 'tidyr'
The following object is masked from 'package:magrittr':

    extract
Loading required package: dplyr

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Attaching package: 'radiant.data'
The following objects are masked from 'package:lubridate':

    month, wday
The following object is masked from 'package:ggplot2':

    diamonds
The following object is masked from 'package:magrittr':

    set_attr
The following object is masked from 'package:base':

    date
Loading required package: radiant.design
Loading required package: radiant.basics
Loading required package: radiant.model
Loading required package: radiant.multivariate
How’s that done?
result <- regress(
  EPL, 
  rvar = "Points", 
  evar = "Wage.Bill.milGBP"
)
summary(result)
Linear regression (OLS)
Data     : EPL 
Response variable    : Points 
Explanatory variables: Wage.Bill.milGBP 
Null hyp.: the effect of Wage.Bill.milGBP on Points is zero
Alt. hyp.: the effect of Wage.Bill.milGBP on Points is not zero

                  coefficient std.error t.value p.value    
 (Intercept)           32.116     2.749  11.683  < .001 ***
 Wage.Bill.milGBP       0.240     0.030   8.042  < .001 ***

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-squared: 0.782,  Adjusted R-squared: 0.77 
F-statistic: 64.675 df(1,18), p.value < .001
Nr obs: 20 
How’s that done?
library(gvlma)
gvlma(result$model)

Call:
lm(formula = form_upper, data = dataset)

Coefficients:
     (Intercept)  Wage.Bill.milGBP  
         32.1159            0.2402  


ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
Level of Significance =  0.05 

Call:
 gvlma(x = result$model) 

                    Value p-value                Decision
Global Stat        4.8001 0.30843 Assumptions acceptable.
Skewness           0.2378 0.62583 Assumptions acceptable.
Kurtosis           0.1547 0.69412 Assumptions acceptable.
Link Function      1.5649 0.21096 Assumptions acceptable.
Heteroscedasticity 2.8428 0.09178 Assumptions acceptable.