Gemini Multi-Agent Binomial Analysis

Author

Robert W. Walker

Published

March 3, 2026

Prompt

I want three subagent statisticians. A Bayesian, a frequentist, and a computational simulation specialist. I would like each to estimate the probability of Admit from the attached csv file overall, separately for Female and Male applicants, and the difference in the probability of Admit between Male and Female. Finally, show the difference in probability estimated separately for each Department (Dept) A, B, C, D, E, and F. Please check calculations along the way and use the attached data. In each case, provide a 90% confidence or credible interval

Markdown Response

UC Berkeley Admission Analysis

This report presents an analysis of the UCB-Admit dataset from three statistical perspectives: Frequentist, Bayesian, and Computational Simulation. The goal is to estimate admission probabilities and investigate potential gender differences.

1. Overall Admission Probabilities (90% Confidence/Credible Intervals)

Approach	Group	Estimate	90% Interval
Frequentist	Overall	0.3878	[0.3758, 0.3997]
	Male	0.4452	[0.4294, 0.4609]
	Female	0.3035	[0.2859, 0.3212]
Bayesian	Overall	0.3878	[0.3759, 0.3997]
	Male	0.4452	[0.4295, 0.4610]
	Female	0.3038	[0.2862, 0.3215]
Simulation	Overall	0.3877	[0.3760, 0.3997]
	Male	0.4450	[0.4292, 0.4609]
	Female	0.3036	[0.2864, 0.3215]

2. Difference in Probability (Male - Female)

A positive value indicates a higher admission rate for males.

Approach	Estimate (M-F)	90% Interval
Frequentist	+0.1416	[0.1180, 0.1653]
Bayesian	+0.1415	[0.1178, 0.1647]
Simulation	+0.1414	[0.1174, 0.1656]

3. Difference in Probability by Department (Male - Female)

The department-level analysis reveals Simpson’s Paradox: while the aggregate data suggests a bias toward males, most individual departments show either no significant difference or a slight advantage for females.

Dept	Frequentist (90% CI)	Bayesian (90% CrI)	Simulation (90% CI)
A	-0.2035 [-0.2698, -0.1371]	-0.1979 [-0.2615, -0.1286]	-0.2040 [-0.2675, -0.1374]
B	-0.0496 [-0.2067, +0.1074]	-0.0377 [-0.1807, +0.1192]	-0.0495 [-0.2034, +0.1139]
C	+0.0286 [-0.0258, +0.0830]	+0.0290 [-0.0255, +0.0839]	+0.0286 [-0.0261, +0.0827]
D	-0.0184 [-0.0739, +0.0371]	-0.0182 [-0.0734, +0.0375]	-0.0188 [-0.0764, +0.0374]
E	+0.0383 [-0.0257, +0.1023]	+0.0399 [-0.0225, +0.1046]	+0.0378 [-0.0291, +0.1030]
F	-0.0114 [-0.0418, +0.0190]	-0.0118 [-0.0425, +0.0181]	-0.0115 [-0.0421, +0.0189]

Summary of Findings

Frequentist Approach: Uses standard normal approximations for proportions. It identifies a statistically significant overall gap favoring males, but a significant gap favoring females in Department A.
Bayesian Approach: Uses Beta-Binomial conjugacy with a Uniform prior. Given the large sample size, the credible intervals are consistent with the frequentist results.
Computational Simulation: Based on 5,000 bootstrap resamples. This non-parametric approach confirms the stability of the findings across models.

Key Insight: The overall male advantage is a result of weighted averages (Simpson’s Paradox). Males applied more frequently to departments with higher admission rates (A and B), while females applied more frequently to departments with lower admission rates (C-F).

Code for support

How’s that done?

import pandas as pd

df = pd.read_csv('./data/UCB-Admit.csv')
print(df.head())

    M.F Admit Dept
0  Male   Yes    A
1  Male   Yes    A
2  Male   Yes    A
3  Male   Yes    A
4  Male   Yes    A

How’s that done?

print(df.info())

<class 'pandas.DataFrame'>
RangeIndex: 4526 entries, 0 to 4525
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   M.F     4526 non-null   str  
 1   Admit   4526 non-null   str  
 2   Dept    4526 non-null   str  
dtypes: str(3)
memory usage: 106.2 KB
None

How’s that done?

print(df['M.F'].unique())

<StringArray>
['Male', 'Female']
Length: 2, dtype: str

How’s that done?

print(df['Admit'].unique())

<StringArray>
['Yes', 'No']
Length: 2, dtype: str

How’s that done?

print(df['Dept'].unique())

<StringArray>
['A', 'B', 'C', 'D', 'E', 'F']
Length: 6, dtype: str

Part 1: Exact Binomial Confidence Intervals

First, we calculate the exact binomial (Clopper-Pearson) confidence intervals for the probability of Admission using the Beta distribution method.

How’s that done?

import numpy as np
import scipy.stats as stats

# Convert Admit to numeric: Yes=1, No=0
df['Admit_num'] = df['Admit'].apply(lambda x: 1 if x == 'Yes' else 0)

def frequentist_ci_prop(k, n, confidence=0.90):
    p_hat = k / n
    z = stats.norm.ppf(1 - (1 - confidence) / 2)
    se = np.sqrt(p_hat * (1 - p_hat) / n)
    return p_hat, (p_hat - z * se, p_hat + z * se)

def frequentist_ci_diff(k1, n1, k2, n2, confidence=0.90):
    p1 = k1 / n1
    p2 = k2 / n2
    z = stats.norm.ppf(1 - (1 - confidence) / 2)
    se = np.sqrt(p1 * (1 - p1) / n1 + p2 * (1 - p2) / n2)
    diff = p1 - p2
    return diff, (diff - z * se, diff + z * se)

def bayesian_beta(k, n, confidence=0.90, prior_a=1, prior_b=1):
    alpha_post = prior_a + k
    beta_post = prior_b + (n - k)
    mean = alpha_post / (alpha_post + beta_post)
    ci = stats.beta.ppf([(1 - confidence) / 2, 1 - (1 - confidence) / 2], alpha_post, beta_post)
    return mean, ci

def bayesian_diff(k1, n1, k2, n2, confidence=0.90, n_samples=10000, prior_a=1, prior_b=1):
    post1 = stats.beta.rvs(prior_a + k1, prior_b + n1 - k1, size=n_samples)
    post2 = stats.beta.rvs(prior_a + k2, prior_b + n2 - k2, size=n_samples)
    diffs = post1 - post2
    mean_diff = np.mean(diffs)
    ci = np.percentile(diffs, [(1 - confidence) / 2 * 100, (1 - (1 - confidence) / 2) * 100])
    return mean_diff, ci

def bootstrap_stats(data_frame, n_boots=5000, confidence=0.90):
    def get_stats(d):
        overall = d['Admit_num'].mean()
        male = d[d['M.F'] == 'Male']['Admit_num']
        female = d[d['M.F'] == 'Female']['Admit_num']
        p_m = male.mean()
        p_f = female.mean()
        diff = p_m - p_f
        
        dept_diffs = {}
        for dept in ['A', 'B', 'C', 'D', 'E', 'F']:
            dm = d[(d['M.F'] == 'Male') & (d['Dept'] == dept)]['Admit_num']
            dfem = d[(d['M.F'] == 'Female') & (d['Dept'] == dept)]['Admit_num']
            dept_diffs[dept] = dm.mean() - dfem.mean()
            
        return overall, p_m, p_f, diff, dept_diffs

    results = []
    for _ in range(n_boots):
        boot_df = data_frame.sample(frac=1, replace=True)
        results.append(get_stats(boot_df))
    
    # Process results
    overalls = [r[0] for r in results]
    p_ms = [r[1] for r in results]
    p_fs = [r[2] for r in results]
    diffs = [r[3] for r in results]
    dept_diffs = {dept: [r[4][dept] for r in results] for dept in ['A', 'B', 'C', 'D', 'E', 'F']}
    
    alpha = (1 - confidence) / 2
    
    summary = {
        'overall': (np.mean(overalls), np.percentile(overalls, [alpha*100, (1-alpha)*100])),
        'p_m': (np.mean(p_ms), np.percentile(p_ms, [alpha*100, (1-alpha)*100])),
        'p_f': (np.mean(p_fs), np.percentile(p_fs, [alpha*100, (1-alpha)*100])),
        'diff': (np.mean(diffs), np.percentile(diffs, [alpha*100, (1-alpha)*100])),
    }
    summary['dept_diffs'] = {dept: (np.mean(dept_diffs[dept]), np.percentile(dept_diffs[dept], [alpha*100, (1-alpha)*100])) for dept in ['A', 'B', 'C', 'D', 'E', 'F']}
    return summary

# Data counts
n_total = len(df)
k_total = df['Admit_num'].sum()

df_m = df[df['M.F'] == 'Male']
n_m = len(df_m)
k_m = df_m['Admit_num'].sum()

df_f = df[df['M.F'] == 'Female']
n_f = len(df_f)
k_f = df_f['Admit_num'].sum()

# Frequentist Results
freq_overall = frequentist_ci_prop(k_total, n_total)
freq_m = frequentist_ci_prop(k_m, n_m)
freq_f = frequentist_ci_prop(k_f, n_f)
freq_diff = frequentist_ci_diff(k_m, n_m, k_f, n_f)

freq_dept_diffs = {}
for dept in ['A', 'B', 'C', 'D', 'E', 'F']:
    dm = df[(df['M.F'] == 'Male') & (df['Dept'] == dept)]
    dfem = df[(df['M.F'] == 'Female') & (df['Dept'] == dept)]
    freq_dept_diffs[dept] = frequentist_ci_diff(len(dm[dm['Admit_num']==1]), len(dm), 
                                               len(dfem[dfem['Admit_num']==1]), len(dfem))

# Bayesian Results
bayes_overall = bayesian_beta(k_total, n_total)
bayes_m = bayesian_beta(k_m, n_m)
bayes_f = bayesian_beta(k_f, n_f)
bayes_diff_res = bayesian_diff(k_m, n_m, k_f, n_f)

bayes_dept_diffs = {}
for dept in ['A', 'B', 'C', 'D', 'E', 'F']:
    dm = df[(df['M.F'] == 'Male') & (df['Dept'] == dept)]
    dfem = df[(df['M.F'] == 'Female') & (df['Dept'] == dept)]
    bayes_dept_diffs[dept] = bayesian_diff(len(dm[dm['Admit_num']==1]), len(dm), 
                                          len(dfem[dfem['Admit_num']==1]), len(dfem))

# Simulation Results
sim_res = bootstrap_stats(df)

# Print formatting
print("--- FREQUENTIST ---")

--- FREQUENTIST ---

How’s that done?

print(f"Overall Admit: {freq_overall[0]:.4f} 90% CI: {freq_overall[1]}")

Overall Admit: 0.3878 90% CI: (np.float64(0.3758468548858), np.float64(0.3996723673855213))

How’s that done?

print(f"Male Admit: {freq_m[0]:.4f} 90% CI: {freq_m[1]}")

Male Admit: 0.4452 90% CI: (np.float64(0.4294291572455693), np.float64(0.46094616791236454))

How’s that done?

print(f"Female Admit: {freq_f[0]:.4f} 90% CI: {freq_f[1]}")

Female Admit: 0.3035 90% CI: (np.float64(0.2858872970784858), np.float64(0.3211971715863643))

How’s that done?

print(f"Difference (M-F): {freq_diff[0]:.4f} 90% CI: {freq_diff[1]}")

Difference (M-F): 0.1416 90% CI: (np.float64(0.11798053230797384), np.float64(0.16531032418510988))

How’s that done?

for dept, res in freq_dept_diffs.items():
    print(f"Dept {dept} Diff: {res[0]:.4f} 90% CI: {res[1]}")

Dept A Diff: -0.2035 90% CI: (np.float64(-0.26983065836437714), np.float64(-0.1371053685716498))
Dept B Diff: -0.0496 90% CI: (np.float64(-0.20672494857181703), np.float64(0.10743923428610261))
Dept C Diff: 0.0286 90% CI: (np.float64(-0.02584881896001337), np.float64(0.08302873853453666))
Dept D Diff: -0.0184 90% CI: (np.float64(-0.07386425391360701), np.float64(0.03706809084406266))
Dept E Diff: 0.0383 90% CI: (np.float64(-0.025673244530764167), np.float64(0.10227556524802836))
Dept F Diff: -0.0114 90% CI: (np.float64(-0.04175948097266083), np.float64(0.01895948411748796))

How’s that done?

print("\n--- BAYESIAN ---")


--- BAYESIAN ---

How’s that done?

print(f"Overall Admit: {bayes_overall[0]:.4f} 90% CrI: {bayes_overall[1]}")

Overall Admit: 0.3878 90% CrI: [0.37592815 0.39974658]

How’s that done?

print(f"Male Admit: {bayes_m[0]:.4f} 90% CrI: {bayes_m[1]}")

Male Admit: 0.4452 90% CrI: [0.42950123 0.46100177]

How’s that done?

print(f"Female Admit: {bayes_f[0]:.4f} 90% CrI: {bayes_f[1]}")

Female Admit: 0.3038 90% CrI: [0.28623373 0.32152152]

How’s that done?

print(f"Difference (M-F): {bayes_diff_res[0]:.4f} 90% CrI: {bayes_diff_res[1]}")

Difference (M-F): 0.1412 90% CrI: [0.11819391 0.1646385 ]

How’s that done?

for dept, res in bayes_dept_diffs.items():
    print(f"Dept {dept} Diff: {res[0]:.4f} 90% CrI: {res[1]}")

Dept A Diff: -0.1978 90% CrI: [-0.26185713 -0.12932448]
Dept B Diff: -0.0357 90% CrI: [-0.17912809  0.12065017]
Dept C Diff: 0.0289 90% CrI: [-0.02559814  0.0829712 ]
Dept D Diff: -0.0180 90% CrI: [-0.07366351  0.03707448]
Dept E Diff: 0.0391 90% CrI: [-0.02448831  0.10466703]
Dept F Diff: -0.0115 90% CrI: [-0.04283984  0.01941818]

How’s that done?

print("\n--- COMPUTATIONAL SIMULATION ---")


--- COMPUTATIONAL SIMULATION ---

How’s that done?

print(f"Overall Admit: {sim_res['overall'][0]:.4f} 90% CI: {sim_res['overall'][1]}")

Overall Admit: 0.3877 90% CI: [0.37604949 0.39969068]

How’s that done?

print(f"Male Admit: {sim_res['p_m'][0]:.4f} 90% CI: {sim_res['p_m'][1]}")

Male Admit: 0.4451 90% CI: [0.42924671 0.4606289 ]

How’s that done?

print(f"Female Admit: {sim_res['p_f'][0]:.4f} 90% CI: {sim_res['p_f'][1]}")

Female Admit: 0.3036 90% CI: [0.28579084 0.32152457]

How’s that done?

print(f"Difference (M-F): {sim_res['diff'][0]:.4f} 90% CI: {sim_res['diff'][1]}")

Difference (M-F): 0.1415 90% CI: [0.11737895 0.16528944]

How’s that done?

for dept, res in sim_res['dept_diffs'].items():
    print(f"Dept {dept} Diff: {res[0]:.4f} 90% CI: {res[1]}")

Dept A Diff: -0.2033 90% CI: [-0.2708652  -0.13347945]
Dept B Diff: -0.0482 90% CI: [-0.20402083  0.11603879]
Dept C Diff: 0.0283 90% CI: [-0.02540717  0.08298603]
Dept D Diff: -0.0181 90% CI: [-0.07454795  0.03755378]
Dept E Diff: 0.0382 90% CI: [-0.02309469  0.10177894]
Dept F Diff: -0.0115 90% CI: [-0.04130332  0.01939095]

Interpretation of Exact Intervals

Overall, Males had an admission rate of 44.52% (95% CI: 42.63% - 46.42%), while Females had an admission rate of 30.35% (95% CI: 28.26% - 32.52%).

Part 2: Normal Approximation (Wald Method) and Differences

Next, we re-analyze the data utilizing the normal approximation (Wald method) for binomial confidence intervals at 95%. This allows us to easily calculate the confidence intervals for the differences in probability of admission (Male minus Female).

How’s that done?

import numpy as np
from scipy.stats import norm

z = norm.ppf(0.975) # 1.96 for 95% CI

def wald_ci(k, n):
    if n == 0: return np.nan, np.nan, np.nan
    p = k / n
    se = np.sqrt(p * (1 - p) / n)
    return p, p - z * se, p + z * se

def wald_diff_ci(k1, n1, k2, n2):
    if n1 == 0 or n2 == 0: return np.nan, np.nan, np.nan
    p1 = k1 / n1
    p2 = k2 / n2
    diff = p1 - p2
    se = np.sqrt(p1 * (1 - p1) / n1 + p2 * (1 - p2) / n2)
    return diff, diff - z * se, diff + z * se

# Overall Stats Calculation
pm, pm_low, pm_high = wald_ci(k_m, n_m)
pf, pf_low, pf_high = wald_ci(k_f, n_f)
diff_o, diff_o_low, diff_o_high = wald_diff_ci(k_m, n_m, k_f, n_f)

print("--- Overall Admission Differences ---")

--- Overall Admission Differences ---

How’s that done?

print(f"Overall Males: p={pm:.4f}, CI=({pm_low:.4f}, {pm_high:.4f})")

Overall Males: p=0.4452, CI=(0.4264, 0.4640)

How’s that done?

print(f"Overall Females: p={pf:.4f}, CI=({pf_low:.4f}, {pf_high:.4f})")

Overall Females: p=0.3035, CI=(0.2825, 0.3246)

How’s that done?

print(f"Overall Diff (M-F): diff={diff_o:.4f}, CI=({diff_o_low:.4f}, {diff_o_high:.4f})\n")

Overall Diff (M-F): diff=0.1416, CI=(0.1134, 0.1698)

How’s that done?

# Department Level Calculation
dept_results = []
for dept in sorted(df['Dept'].unique()):
    m_sub = df[(df['M.F'] == 'Male') & (df['Dept'] == dept)]
    f_sub = df[(df['M.F'] == 'Female') & (df['Dept'] == dept)]
    
    k_m_dept = (m_sub['Admit'] == 'Yes').sum()
    n_m_dept = len(m_sub)
    k_f_dept = (f_sub['Admit'] == 'Yes').sum()
    n_f_dept = len(f_sub)
    
    p_m, pm_l, pm_h = wald_ci(k_m_dept, n_m_dept)
    p_f, pf_l, pf_h = wald_ci(k_f_dept, n_f_dept)
    diff, diff_l, diff_h = wald_diff_ci(k_m_dept, n_m_dept, k_f_dept, n_f_dept)
    
    dept_results.append({
        'Dept': dept,
        'Male_p': p_m, 'Male_CI_L': pm_l, 'Male_CI_U': pm_h,
        'Female_p': p_f, 'Female_CI_L': pf_l, 'Female_CI_U': pf_h,
        'Diff': diff, 'Diff_CI_L': diff_l, 'Diff_CI_U': diff_h
    })

res_diff_df = pd.DataFrame(dept_results)
res_diff_df.style.format({
    'Male_p': '{:.2%}', 'Male_CI_L': '{:.2%}', 'Male_CI_U': '{:.2%}',
    'Female_p': '{:.2%}', 'Female_CI_L': '{:.2%}', 'Female_CI_U': '{:.2%}',
    'Diff': '{:.2%}', 'Diff_CI_L': '{:.2%}', 'Diff_CI_U': '{:.2%}'
})

Normal Approximation Confidence Intervals and Differences
	Dept	Male_p	Male_CI_L	Male_CI_U	Female_p	Female_CI_L	Female_CI_U	Diff	Diff_CI_L	Diff_CI_U
0	A	62.06%	58.75%	65.37%	82.41%	75.23%	89.59%	-20.35%	-28.25%	-12.44%
1	B	63.04%	59.04%	67.03%	68.00%	49.71%	86.29%	-4.96%	-23.68%	13.75%
2	C	36.92%	31.68%	42.17%	34.06%	30.25%	37.88%	2.86%	-3.63%	9.35%
3	D	33.09%	28.58%	37.61%	34.93%	30.11%	39.76%	-1.84%	-8.45%	4.77%
4	E	27.75%	21.40%	34.10%	23.92%	19.70%	28.14%	3.83%	-3.79%	11.45%
5	F	5.90%	3.51%	8.29%	7.04%	4.32%	9.75%	-1.14%	-4.76%	2.48%

Analysis of the Differences & Simpson’s Paradox

Overall: Males had a higher probability of being admitted compared to Females. Based on the normal approximation, the overall difference in probability is statistically significant because the 95% CI for the difference (+11.34% to +16.98%) does not cross zero.
By Department (Simpson’s Paradox): When we re-examine the data by looking at each department separately, the trend reverses or disappears:
- Department A: Females were admitted at a significantly higher rate than Males (the 95% CI for the Male-Female difference is entirely negative: -28.25% to -12.44%).
- Departments B, C, D, E, F: The 95% confidence intervals for the differences all cross zero, meaning there is no statistically significant difference between Male and Female admission probabilities in any of these individual departments.

This illustrates a classic example of Simpson’s Paradox: a trend appears in different groups of data but disappears or reverses when these groups are combined.