The performance of widely used language models tasked with identifying the distribution responsible for generating simulated data

Author

JE, TJ, RWW

The Organization of the Website

Eaglesmith, Justus, Tim Johnson, and Robert W. Walker. 2025. The performance of widely used language models tasked with identifying the distribution responsible for generating simulated data.

The support document for the letter.

TLDR;

The directory Final-Prompt-Results contains all the results. They are organized hierarchically by model [0409-turbo, 0806, 0613, 1211] and then by distribution.

A sample prompt

In each subdirectory, there are:

a series of .qmd files containing Combiner. These combine the input and output files [in jsonl format] with the original .RData files.
a series of .RData files with Full that represent the combined input, output and RData files.
a series of .RData files with an R random generation function that store the original calls and data.
a .R file containining Combiner that binds the rows of the Full data files.
Two sets of .jsonl files: output contains the OpenAI responses and -Call-Revised.jsonl contains the batch files sent to OpenAI.

Details

Taking the example of the beta distribution, there are three sets of results.

For gpt-4o-2024-08-06:

The results are combined in:

For gpt-4-0613:

The results are combined in:

For gpt-4-turbo-2024-04-09:

The results are combined in:

Other results follow an identical pattern substituting the names of the distributions and parameters relevant in the given case, e.g. Final-Prompt-Results/0409-turbo/beta-turbo/Beta-Combiner52-0409.html becomes Final-Prompt-Results/0409-turbo/poisson-turbo/Poisson-Combined-5-0409.html

The performance of widely used language models tasked with identifying the distribution responsible for generating simulated data

The Organization of the Website

Details

The Software Bibliography

References

References