The performance of widely used language models tasked with identifying the distribution responsible for generating simulated data

Author

JE, TJ, RWW

The Organization of the Website

Eaglesmith, Justus, Tim Johnson, and Robert W. Walker. 2025. The performance of widely used language models tasked with identifying the distribution responsible for generating simulated data.

The support document for the letter.

TLDR;

The directory Final-Prompt-Results contains all the results. They are organized hierarchically by model [0409-turbo, 0806, 0613, 1211] and then by distribution.

A sample prompt

In each subdirectory, there are:

  • a series of .qmd files containing Combiner. These combine the input and output files [in jsonl format] with the original .RData files.
  • a series of .RData files with Full that represent the combined input, output and RData files.
  • a series of .RData files with an R random generation function that store the original calls and data.
  • a .R file containining Combiner that binds the rows of the Full data files.
  • Two sets of .jsonl files: output contains the OpenAI responses and -Call-Revised.jsonl contains the batch files sent to OpenAI.

Details

Taking the example of the beta distribution, there are three sets of results.

For gpt-4o-2024-08-06:

The results are combined in:

For gpt-4-0613:

The results are combined in:

For gpt-4-turbo-2024-04-09:

The results are combined in:

Other results follow an identical pattern substituting the names of the distributions and parameters relevant in the given case, e.g. Final-Prompt-Results/0409-turbo/beta-turbo/Beta-Combiner52-0409.html becomes Final-Prompt-Results/0409-turbo/poisson-turbo/Poisson-Combined-5-0409.html

As the files make clear, the keys for merging the batch results and the batch queries are the unique-id’s that we assign for each distribution/parameter/iteration.

The Software Bibliography

Download PDF file.

References

library(knitr)
library(quarto)
write_bib(names(sessionInfo()$otherPkgs), file="bibliography.bib")

References

Allaire, JJ, and Christophe Dervieux. 2025. Quarto: R Interface to Quarto Markdown Publishing System. https://github.com/quarto-dev/quarto-r.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2025. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.