library(magrittr)
load("Complete.Data.01.2026.RData")LLM Distributions Letter: Results
There is commented code for loading in each of the four data files. I have combined them and mutated them because it is a long operation.
Fresh.Outcome
I have recreated the outcomes below in a variable called Fresh.Outcome. We should be careful about how these are recoded to be both transparent about what it does and for our clarity in reporting. Here is that table.
Complete.Data.01.2026$Fresh.Outcome <- c(1:196000) %>% map_chr(., function(x) {Complete.Data.01.2026$response$body$choices[[x]]$message$content})
Complete.Data.01.2026$model <- Complete.Data.01.2026$body$model
Complete.Data.01.2026 <- Complete.Data.01.2026 |> mutate(modelF = factor(model, levels = c('gpt-4-turbo-2024-04-09', 'gpt-4-0613', 'gpt-4o-2024-08-06', 'gpt-5.2-2025-12-11')))
Complete.Data.01.2026 |> group_by(Fresh.Outcome, model) |> summarise(Count = n()) |> ungroup() |> pivot_wider(names_from = model, values_from = Count) |> gt()| Fresh.Outcome | gpt-5.2-2025-12-11 | gpt-4-0613 | gpt-4-turbo-2024-04-09 | gpt-4o-2024-08-06 |
|---|---|---|---|---|
| 15 | NA | NA | NA | |
| Arcsine | 36 | NA | NA | NA |
| Bernoulli | 744 | 1071 | 690 | 835 |
| Beta | 4165 | 14 | 8885 | 2141 |
| Betabinomial | 33 | NA | NA | NA |
| Bimodal | NA | 49 | NA | NA |
| Binomial | 15525 | 84 | 1969 | 1028 |
| Categorical | 86 | NA | NA | NA |
| Cauchy | 35 | NA | 6 | NA |
| Chisquare | 9 | NA | NA | NA |
| Degenerate | 71 | 71 | 71 | 71 |
| ExGaussian | 3 | NA | NA | NA |
| Exponential | 196 | 156 | 930 | 5266 |
| Gamma | 2714 | NA | 517 | 17 |
| Geometric | 583 | NA | NA | NA |
| Gumbel | 76 | NA | NA | NA |
| Laplace | 3 | NA | 454 | NA |
| Log-Normal | NA | NA | NA | 3 |
| Log-normal | NA | NA | NA | 938 |
| Logistic | NA | NA | 7 | NA |
| Lognormal | 8127 | 3979 | 7974 | 468 |
| Mixture | 1 | NA | 1 | NA |
| Multinomial | 24 | NA | NA | NA |
| NegBinomial | 5 | NA | NA | NA |
| Negative Binomial | NA | 5 | NA | NA |
| NegativeBinomial | 224 | 4 | NA | NA |
| Negativebinomial | 197 | NA | NA | NA |
| Negbinomial | 1 | NA | NA | NA |
| Normal | 5201 | 29210 | 3230 | 12090 |
| Pareto | 83 | NA | 126 | 142 |
| Pascal | 77 | NA | NA | NA |
| Poisson | 8553 | 12981 | 22764 | 17042 |
| Skew-normal | NA | NA | 10 | NA |
| Skew-right | NA | 3 | NA | NA |
| SkewNormal | NA | NA | 160 | NA |
| Skewed | NA | NA | NA | 1 |
| Skewness | NA | 245 | 5 | NA |
| Skewnormal | 15 | NA | 3 | NA |
| Student | 29 | NA | NA | NA |
| Studentt | 1 | NA | NA | NA |
| Triangular | 16 | NA | 1 | NA |
| Uniform | 1250 | 1128 | 1141 | 8557 |
| Weibull | 78 | NA | 56 | 400 |
| ZIP | 1 | NA | NA | NA |
| Zero-inflated | NA | NA | NA | 1 |
| ZeroInflatedPoisson | 4 | NA | NA | NA |
| ZeroinflatedPoisson | 8 | NA | NA | NA |
| Zipf | 2 | NA | NA | NA |
| arcsine | 1 | NA | NA | NA |
| beta | 4 | NA | NA | NA |
| binomial | 1 | NA | NA | NA |
| categorical | 3 | NA | NA | NA |
| exgaussian | 1 | NA | NA | NA |
| exponential | 3 | NA | NA | NA |
| gamma | 33 | NA | NA | NA |
| geometric | 3 | NA | NA | NA |
| lognormal | 723 | NA | NA | NA |
| negativebinomial | 8 | NA | NA | NA |
| normal | 11 | NA | NA | NA |
| skewnormal | 3 | NA | NA | NA |
| t | 13 | NA | NA | NA |
| uniform | 2 | NA | NA | NA |
I am going to recreate the Outcome variable as follows.
Complete.Data.01.2026 <- Complete.Data.01.2026 |>
mutate(Outcome = case_match(Fresh.Outcome,
c("arcsine","Arcsine") ~ "Arcsine",
c("categorical","Categorical") ~ "Categorical",
c("exponential","Exponential") ~ "Exponential",
c("beta","Beta") ~ "Beta",
c("Chisquare","Chi-Square") ~ "Chi-square",
c("exgaussian","ExGaussian") ~ "ExGaussian",
c("Normal","normal") ~ "Normal",
c("binomial","Binomial") ~ "Binomial",
c("geometric","Geometric") ~ "Geometric",
c("gamma","Gamma") ~ "Gamma", c("Lognormal","lognormal","Log-Normal","Log-normal") ~ "Lognormal",
c("NegativeBinomial","Negative Binomial","NegBinomial","Negbinomial","negativebinomial","Negativebinomial") ~ "Negative Binomial",
c("Skewed","Skewnormal","SkewNormal","Skew-normal","Skew-right","Skewness","skewnormal")~ "Skewed",
c("Student","Studentt","Studentt","t")~ "Student's t", c("Uniform","uniform")~ "Uniform",
c("ZeroInflatedPoisson","ZeroinflatedPoisson","Zero-inflated","ZIP")~ "ZIP",
.default = Fresh.Outcome))
Complete.Data.01.2026 |> group_by(Outcome, model) |> summarise(Count = n()) |> ungroup() |> pivot_wider(names_from = model, values_from = Count) |> gt()| Outcome | gpt-5.2-2025-12-11 | gpt-4-0613 | gpt-4-turbo-2024-04-09 | gpt-4o-2024-08-06 |
|---|---|---|---|---|
| 15 | NA | NA | NA | |
| Arcsine | 37 | NA | NA | NA |
| Bernoulli | 744 | 1071 | 690 | 835 |
| Beta | 4169 | 14 | 8885 | 2141 |
| Betabinomial | 33 | NA | NA | NA |
| Bimodal | NA | 49 | NA | NA |
| Binomial | 15526 | 84 | 1969 | 1028 |
| Categorical | 89 | NA | NA | NA |
| Cauchy | 35 | NA | 6 | NA |
| Chi-square | 9 | NA | NA | NA |
| Degenerate | 71 | 71 | 71 | 71 |
| ExGaussian | 4 | NA | NA | NA |
| Exponential | 199 | 156 | 930 | 5266 |
| Gamma | 2747 | NA | 517 | 17 |
| Geometric | 586 | NA | NA | NA |
| Gumbel | 76 | NA | NA | NA |
| Laplace | 3 | NA | 454 | NA |
| Logistic | NA | NA | 7 | NA |
| Lognormal | 8850 | 3979 | 7974 | 1409 |
| Mixture | 1 | NA | 1 | NA |
| Multinomial | 24 | NA | NA | NA |
| Negative Binomial | 435 | 9 | NA | NA |
| Normal | 5212 | 29210 | 3230 | 12090 |
| Pareto | 83 | NA | 126 | 142 |
| Pascal | 77 | NA | NA | NA |
| Poisson | 8553 | 12981 | 22764 | 17042 |
| Skewed | 18 | 248 | 178 | 1 |
| Student's t | 43 | NA | NA | NA |
| Triangular | 16 | NA | 1 | NA |
| Uniform | 1252 | 1128 | 1141 | 8557 |
| Weibull | 78 | NA | 56 | 400 |
| ZIP | 13 | NA | NA | 1 |
| Zipf | 2 | NA | NA | NA |
The discrete tag
Complete.Data.01.2026 <- Complete.Data.01.2026 |> mutate(Type = case_match(name, c("Binomial","Geometric","Poisson") ~ "Discrete", .default = "not Discrete"))
Complete.Data.01.2026 <- Complete.Data.01.2026 |> mutate(Correct = as.character(name==Outcome)) |> mutate(Correct = case_match(Correct, "TRUE" ~ "Correct", .default = "Incorrect"))Basic Percent Correctly Predicted
MNTotals <- Complete.Data.01.2026 |> group_by(model, name) |> summarise(Total = n()) |> ungroup()
PCP <- Complete.Data.01.2026 |> group_by(model, name, Correct) |> summarise(count = n()) |> ungroup()
PCPTab <- left_join(PCP, MNTotals) |> filter(Correct=="Correct") |> mutate(PCP = count / Total) |> select(model, name, PCP)
PCPTab# A tibble: 29 × 3
model name PCP
<chr> <chr> <dbl>
1 gpt-4-0613 Beta 0.0028
2 gpt-4-0613 Binomial 0.00394
3 gpt-4-0613 Lognormal 0.880
4 gpt-4-0613 Normal 1
5 gpt-4-0613 Poisson 0.554
6 gpt-4-0613 Uniform 0.0002
7 gpt-4-turbo-2024-04-09 Beta 0.972
8 gpt-4-turbo-2024-04-09 Binomial 0.104
9 gpt-4-turbo-2024-04-09 Gamma 0.044
10 gpt-4-turbo-2024-04-09 Lognormal 0.857
# ℹ 19 more rows
The Plot
Tab.All <- Complete.Data.01.2026 |> group_by(model, name, Outcome) |> summarise(Count = n()) |> ungroup() |> group_by(model,name) |> mutate(Total = sum(Count)) |> ungroup() |> mutate(PCP = round(Count / Total, digits=3)) |> ungroup()Tab.All <- Tab.All |> mutate(PCPa = round(PCP, digits=2))
Tab.All$PCPa1 <- as.character(Tab.All$PCPa)
Tab.All <- Tab.All |> mutate(PCPa1 = case_when(PCPa1 == "0" ~ "<0.01", .default = PCPa1))
FF3 <- ggplot(Tab.All) +
aes(y=fct_rev(Outcome), x=name, fill=PCP, facet=modelF) +
geom_tile(aes(color=Dist.Match), lwd=0.2, width=0.95, height=0.95) +
scale_color_manual(values = c("white","black")) +
guides(color="none") +
scale_fill_gradientn(colors = my_greys) +
labs(x="Known Distribution", y="Model Outcome") +
geom_text(aes(label=PCPa1), size=2, color="black") +
labs(caption="Values are row proportions \n Outlined cells isolate row/column matches", title="Known Distributions and Model Responses", fill="Row. Pct.") +
theme_minimal() +
theme(legend.location = "plot",
panel.spacing = unit(0.5, "lines"),
plot.label= element_text(size=5),
plot.title=element_text(size=10, face = "bold", hjust=0),
plot.title.position = "plot",
axis.text.x = element_text(size=8, angle=30),
axis.text.y = element_text(size=5),
legend.direction = "horizontal",
legend.text = element_text(size = 4),
legend.title = element_text(size = 6),
legend.key.width = unit(0.1, "in"),
legend.position = c(0.0, -0.1)
) +
facet_wrap(vars(modelF), scales="free_y")
ggsave(FF3, filename="FinalFigureFreeY-v2.jpeg", width=6.5, height=6.5, dpi=900, units="in")
References
knitr::write_bib(names(sessionInfo()$otherPkgs), file="bibliography.bib")References
Bache, Stefan Milton, and Hadley Wickham. 2025. Magrittr: A Forward-Pipe Operator for r. https://magrittr.tidyverse.org.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. https://www.jstatsoft.org/v40/i03/.
Iannone, Richard, Joe Cheng, Barret Schloerke, Shannon Haughton, Ellis Hughes, Alexandra Lauer, Romain François, JooYoung Seo, Ken Brevoort, and Olivier Roy. 2025. Gt: Easily Create Presentation-Ready Display Tables. https://gt.rstudio.com.
Mock, Thomas. 2025. gtExtras: Extending Gt for Beautiful HTML Tables. https://github.com/jthomasmock/gtExtras.
Müller, Kirill, and Hadley Wickham. 2026. Tibble: Simple Data Frames. https://tibble.tidyverse.org/.
Scheinin, Ilari. 2019. Rasterpdf: Plot Raster Graphics in PDF Files. https://ilarischeinin.github.io/rasterpdf.
Spinu, Vitalie, Garrett Grolemund, and Hadley Wickham. 2024. Lubridate: Make Dealing with Dates a Little Easier. https://lubridate.tidyverse.org.
Urbanek, Simon, and Jeffrey Horner. 2025. Cairo: R Graphics Device Using Cairo Graphics Library for Creating High-Quality Bitmap (PNG, JPEG, TIFF), Vector (PDF, SVG, PostScript) and Display (X11 and Win32) Output. http://www.rforge.net/Cairo/.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2023. Tidyverse: Easily Install and Load the Tidyverse. https://tidyverse.tidyverse.org.
———. 2025a. Forcats: Tools for Working with Categorical Variables (Factors). https://forcats.tidyverse.org/.
———. 2025b. Stringr: Simple, Consistent Wrappers for Common String Operations. https://stringr.tidyverse.org.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, Dewey Dunnington, and Teun van den Brand. 2025. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://ggplot2.tidyverse.org.
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://dplyr.tidyverse.org.
Wickham, Hadley, and Lionel Henry. 2026. Purrr: Functional Programming Tools. https://purrr.tidyverse.org/.
Wickham, Hadley, Lionel Henry, Thomas Lin Pedersen, T Jake Luciani, Matthieu Decorde, and Vaudor Lise. 2025. Svglite: An SVG Graphics Device. https://svglite.r-lib.org.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2025. Readr: Read Rectangular Text Data. https://readr.tidyverse.org.
Wickham, Hadley, Davis Vaughan, and Maximilian Girlich. 2025. Tidyr: Tidy Messy Data. https://tidyr.tidyverse.org.