LLM Distributions Letter: Results

There is commented code for loading in each of the four data files. I have combined them and mutated them because it is a long operation.

library(magrittr)
load("Complete.Data.01.2026.RData")

Fresh.Outcome

I have recreated the outcomes below in a variable called Fresh.Outcome. We should be careful about how these are recoded to be both transparent about what it does and for our clarity in reporting. Here is that table.

Complete.Data.01.2026$Fresh.Outcome <- c(1:196000) %>% map_chr(., function(x) {Complete.Data.01.2026$response$body$choices[[x]]$message$content})
Complete.Data.01.2026$model <- Complete.Data.01.2026$body$model
Complete.Data.01.2026 <- Complete.Data.01.2026 |> mutate(modelF = factor(model, levels = c('gpt-4-turbo-2024-04-09', 'gpt-4-0613', 'gpt-4o-2024-08-06', 'gpt-5.2-2025-12-11')))
Complete.Data.01.2026 |> group_by(Fresh.Outcome, model) |> summarise(Count = n()) |>  ungroup() |> pivot_wider(names_from = model, values_from = Count) |> gt()

Fresh.Outcome	gpt-5.2-2025-12-11	gpt-4-0613	gpt-4-turbo-2024-04-09	gpt-4o-2024-08-06
	15	NA	NA	NA
Arcsine	36	NA	NA	NA
Bernoulli	744	1071	690	835
Beta	4165	14	8885	2141
Betabinomial	33	NA	NA	NA
Bimodal	NA	49	NA	NA
Binomial	15525	84	1969	1028
Categorical	86	NA	NA	NA
Cauchy	35	NA	6	NA
Chisquare	9	NA	NA	NA
Degenerate	71	71	71	71
ExGaussian	3	NA	NA	NA
Exponential	196	156	930	5266
Gamma	2714	NA	517	17
Geometric	583	NA	NA	NA
Gumbel	76	NA	NA	NA
Laplace	3	NA	454	NA
Log-Normal	NA	NA	NA	3
Log-normal	NA	NA	NA	938
Logistic	NA	NA	7	NA
Lognormal	8127	3979	7974	468
Mixture	1	NA	1	NA
Multinomial	24	NA	NA	NA
NegBinomial	5	NA	NA	NA
Negative Binomial	NA	5	NA	NA
NegativeBinomial	224	4	NA	NA
Negativebinomial	197	NA	NA	NA
Negbinomial	1	NA	NA	NA
Normal	5201	29210	3230	12090
Pareto	83	NA	126	142
Pascal	77	NA	NA	NA
Poisson	8553	12981	22764	17042
Skew-normal	NA	NA	10	NA
Skew-right	NA	3	NA	NA
SkewNormal	NA	NA	160	NA
Skewed	NA	NA	NA	1
Skewness	NA	245	5	NA
Skewnormal	15	NA	3	NA
Student	29	NA	NA	NA
Studentt	1	NA	NA	NA
Triangular	16	NA	1	NA
Uniform	1250	1128	1141	8557
Weibull	78	NA	56	400
ZIP	1	NA	NA	NA
Zero-inflated	NA	NA	NA	1
ZeroInflatedPoisson	4	NA	NA	NA
ZeroinflatedPoisson	8	NA	NA	NA
Zipf	2	NA	NA	NA
arcsine	1	NA	NA	NA
beta	4	NA	NA	NA
binomial	1	NA	NA	NA
categorical	3	NA	NA	NA
exgaussian	1	NA	NA	NA
exponential	3	NA	NA	NA
gamma	33	NA	NA	NA
geometric	3	NA	NA	NA
lognormal	723	NA	NA	NA
negativebinomial	8	NA	NA	NA
normal	11	NA	NA	NA
skewnormal	3	NA	NA	NA
t	13	NA	NA	NA
uniform	2	NA	NA	NA

I am going to recreate the Outcome variable as follows.

Complete.Data.01.2026 <- Complete.Data.01.2026 |> 
  mutate(Outcome = case_match(Fresh.Outcome, 
                              c("arcsine","Arcsine") ~ "Arcsine",
                              c("categorical","Categorical") ~ "Categorical",
                              c("exponential","Exponential") ~ "Exponential",
                              c("beta","Beta") ~ "Beta",
                              c("Chisquare","Chi-Square") ~ "Chi-square",
                              c("exgaussian","ExGaussian") ~ "ExGaussian",
                              c("Normal","normal") ~ "Normal",
                              c("binomial","Binomial") ~ "Binomial",
                              c("geometric","Geometric") ~ "Geometric",
                              c("gamma","Gamma") ~ "Gamma",                              c("Lognormal","lognormal","Log-Normal","Log-normal") ~ "Lognormal",
                              c("NegativeBinomial","Negative Binomial","NegBinomial","Negbinomial","negativebinomial","Negativebinomial") ~ "Negative Binomial",
                              c("Skewed","Skewnormal","SkewNormal","Skew-normal","Skew-right","Skewness","skewnormal")~ "Skewed",
                              c("Student","Studentt","Studentt","t")~ "Student's t", c("Uniform","uniform")~ "Uniform",
                              c("ZeroInflatedPoisson","ZeroinflatedPoisson","Zero-inflated","ZIP")~ "ZIP",
                              .default = Fresh.Outcome))
Complete.Data.01.2026 |> group_by(Outcome, model) |> summarise(Count = n()) |>  ungroup() |> pivot_wider(names_from = model, values_from = Count) |> gt()

Outcome	gpt-5.2-2025-12-11	gpt-4-0613	gpt-4-turbo-2024-04-09	gpt-4o-2024-08-06
	15	NA	NA	NA
Arcsine	37	NA	NA	NA
Bernoulli	744	1071	690	835
Beta	4169	14	8885	2141
Betabinomial	33	NA	NA	NA
Bimodal	NA	49	NA	NA
Binomial	15526	84	1969	1028
Categorical	89	NA	NA	NA
Cauchy	35	NA	6	NA
Chi-square	9	NA	NA	NA
Degenerate	71	71	71	71
ExGaussian	4	NA	NA	NA
Exponential	199	156	930	5266
Gamma	2747	NA	517	17
Geometric	586	NA	NA	NA
Gumbel	76	NA	NA	NA
Laplace	3	NA	454	NA
Logistic	NA	NA	7	NA
Lognormal	8850	3979	7974	1409
Mixture	1	NA	1	NA
Multinomial	24	NA	NA	NA
Negative Binomial	435	9	NA	NA
Normal	5212	29210	3230	12090
Pareto	83	NA	126	142
Pascal	77	NA	NA	NA
Poisson	8553	12981	22764	17042
Skewed	18	248	178	1
Student's t	43	NA	NA	NA
Triangular	16	NA	1	NA
Uniform	1252	1128	1141	8557
Weibull	78	NA	56	400
ZIP	13	NA	NA	1
Zipf	2	NA	NA	NA

The `discrete` tag

Complete.Data.01.2026 <- Complete.Data.01.2026 |> mutate(Type = case_match(name, c("Binomial","Geometric","Poisson") ~ "Discrete", .default = "not Discrete"))
Complete.Data.01.2026 <- Complete.Data.01.2026 |> mutate(Correct = as.character(name==Outcome)) |> mutate(Correct = case_match(Correct, "TRUE" ~ "Correct", .default = "Incorrect"))

Basic Percent Correctly Predicted

MNTotals <- Complete.Data.01.2026 |> group_by(model, name) |> summarise(Total = n()) |> ungroup()
PCP <- Complete.Data.01.2026 |> group_by(model, name, Correct) |> summarise(count = n()) |> ungroup()
PCPTab <- left_join(PCP, MNTotals) |> filter(Correct=="Correct") |> mutate(PCP = count / Total) |> select(model, name, PCP)
PCPTab

# A tibble: 29 × 3
   model                  name          PCP
   <chr>                  <chr>       <dbl>
 1 gpt-4-0613             Beta      0.0028 
 2 gpt-4-0613             Binomial  0.00394
 3 gpt-4-0613             Lognormal 0.880  
 4 gpt-4-0613             Normal    1      
 5 gpt-4-0613             Poisson   0.554  
 6 gpt-4-0613             Uniform   0.0002 
 7 gpt-4-turbo-2024-04-09 Beta      0.972  
 8 gpt-4-turbo-2024-04-09 Binomial  0.104  
 9 gpt-4-turbo-2024-04-09 Gamma     0.044  
10 gpt-4-turbo-2024-04-09 Lognormal 0.857  
# ℹ 19 more rows

The Plot

Tab.All <- Complete.Data.01.2026 |> group_by(model, name, Outcome) |> summarise(Count = n()) |> ungroup() |> group_by(model,name) |> mutate(Total = sum(Count)) |> ungroup() |> mutate(PCP = round(Count / Total, digits=3)) |> ungroup()

Tab.All <- Tab.All |> mutate(PCPa = round(PCP, digits=2))
Tab.All$PCPa1 <- as.character(Tab.All$PCPa)
Tab.All <- Tab.All |> mutate(PCPa1 = case_when(PCPa1 == "0" ~ "<0.01", .default = PCPa1))
FF3 <- ggplot(Tab.All) + 
  aes(y=fct_rev(Outcome), x=name, fill=PCP, facet=modelF) + 
  geom_tile(aes(color=Dist.Match), lwd=0.2, width=0.95, height=0.95) + 
  scale_color_manual(values = c("white","black")) + 
  guides(color="none") +
  scale_fill_gradientn(colors = my_greys) + 
  labs(x="Known Distribution", y="Model Outcome") +
  geom_text(aes(label=PCPa1), size=2, color="black") +
  labs(caption="Values are row proportions \n Outlined cells isolate row/column matches", title="Known Distributions and Model Responses", fill="Row. Pct.") + 
  theme_minimal() + 
  theme(legend.location = "plot", 
        panel.spacing = unit(0.5, "lines"), 
        plot.label= element_text(size=5), 
        plot.title=element_text(size=10, face = "bold", hjust=0),
        plot.title.position = "plot", 
        axis.text.x = element_text(size=8, angle=30), 
        axis.text.y = element_text(size=5), 
        legend.direction = "horizontal",
        legend.text = element_text(size = 4), 
        legend.title = element_text(size = 6),
        legend.key.width = unit(0.1, "in"),
        legend.position = c(0.0, -0.1)
) +
    facet_wrap(vars(modelF), scales="free_y")
ggsave(FF3, filename="FinalFigureFreeY-v2.jpeg", width=6.5, height=6.5, dpi=900, units="in")

References

knitr::write_bib(names(sessionInfo()$otherPkgs), file="bibliography.bib")

References

Bache, Stefan Milton, and Hadley Wickham. 2025. Magrittr: A Forward-Pipe Operator for r. https://magrittr.tidyverse.org.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. https://www.jstatsoft.org/v40/i03/.

Iannone, Richard, Joe Cheng, Barret Schloerke, Shannon Haughton, Ellis Hughes, Alexandra Lauer, Romain François, JooYoung Seo, Ken Brevoort, and Olivier Roy. 2025. Gt: Easily Create Presentation-Ready Display Tables. https://gt.rstudio.com.

Mock, Thomas. 2025. gtExtras: Extending Gt for Beautiful HTML Tables. https://github.com/jthomasmock/gtExtras.

Müller, Kirill, and Hadley Wickham. 2026. Tibble: Simple Data Frames. https://tibble.tidyverse.org/.

Scheinin, Ilari. 2019. Rasterpdf: Plot Raster Graphics in PDF Files. https://ilarischeinin.github.io/rasterpdf.

Spinu, Vitalie, Garrett Grolemund, and Hadley Wickham. 2024. Lubridate: Make Dealing with Dates a Little Easier. https://lubridate.tidyverse.org.

Urbanek, Simon, and Jeffrey Horner. 2025. Cairo: R Graphics Device Using Cairo Graphics Library for Creating High-Quality Bitmap (PNG, JPEG, TIFF), Vector (PDF, SVG, PostScript) and Display (X11 and Win32) Output. http://www.rforge.net/Cairo/.

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.

———. 2023. Tidyverse: Easily Install and Load the Tidyverse. https://tidyverse.tidyverse.org.

———. 2025a. Forcats: Tools for Working with Categorical Variables (Factors). https://forcats.tidyverse.org/.

———. 2025b. Stringr: Simple, Consistent Wrappers for Common String Operations. https://stringr.tidyverse.org.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, Dewey Dunnington, and Teun van den Brand. 2025. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://ggplot2.tidyverse.org.

Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://dplyr.tidyverse.org.

Wickham, Hadley, and Lionel Henry. 2026. Purrr: Functional Programming Tools. https://purrr.tidyverse.org/.

Wickham, Hadley, Lionel Henry, Thomas Lin Pedersen, T Jake Luciani, Matthieu Decorde, and Vaudor Lise. 2025. Svglite: An SVG Graphics Device. https://svglite.r-lib.org.

Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2025. Readr: Read Rectangular Text Data. https://readr.tidyverse.org.

Wickham, Hadley, Davis Vaughan, and Maximilian Girlich. 2025. Tidyr: Tidy Messy Data. https://tidyr.tidyverse.org.

Fresh.Outcome

The discrete tag

Basic Percent Correctly Predicted

The Plot

References

References

The `discrete` tag