This dataset represents Admissions decisions by two traits: M.F. captures Male/Female election and Dept captures the department, labeled A through F. Please provide exact binomial confidence intervals, 95%, for the probability of Admission for Males, for Females, and then for each of Males and Females by department
This is great, now please reanalyse the data using the normal approximation and provide differences overall and for each department.
Could you collect your responses and code into a quarto markdown document with python code chunks?
Can you create the text file, the chunks are hampered by your rendering?
Introduction
This document analyzes a dataset representing admissions decisions. The dataset includes gender (M.F), department (Dept, labeled A through F), and the admission decision (Admit).
Below, we compute 95% confidence intervals for the probability of admission for Males and Females, both overall and broken down by department, using two different methods.
Part 1: Exact Binomial Confidence Intervals
First, we calculate the exact binomial (Clopper-Pearson) confidence intervals for the probability of Admission using the Beta distribution method.
Overall, Males had an admission rate of 44.52% (95% CI: 42.63% - 46.42%), while Females had an admission rate of 30.35% (95% CI: 28.26% - 32.52%).
Part 2: Normal Approximation (Wald Method) and Differences
Next, we re-analyze the data utilizing the normal approximation (Wald method) for binomial confidence intervals at 95%. This allows us to easily calculate the confidence intervals for the differences in probability of admission (Male minus Female).
How’s that done?
import numpy as npfrom scipy.stats import normz = norm.ppf(0.975) # 1.96 for 95% CIdef wald_ci(k, n):if n ==0: return np.nan, np.nan, np.nan p = k / n se = np.sqrt(p * (1- p) / n)return p, p - z * se, p + z * sedef wald_diff_ci(k1, n1, k2, n2):if n1 ==0or n2 ==0: return np.nan, np.nan, np.nan p1 = k1 / n1 p2 = k2 / n2 diff = p1 - p2 se = np.sqrt(p1 * (1- p1) / n1 + p2 * (1- p2) / n2)return diff, diff - z * se, diff + z * se# Overall Stats Calculationpm, pm_low, pm_high = wald_ci(k_m, n_m)pf, pf_low, pf_high = wald_ci(k_f, n_f)diff_o, diff_o_low, diff_o_high = wald_diff_ci(k_m, n_m, k_f, n_f)print("--- Overall Admission Differences ---")
Normal Approximation Confidence Intervals and Differences
Dept
Male_p
Male_CI_L
Male_CI_U
Female_p
Female_CI_L
Female_CI_U
Diff
Diff_CI_L
Diff_CI_U
0
A
62.06%
58.75%
65.37%
82.41%
75.23%
89.59%
-20.35%
-28.25%
-12.44%
1
B
63.04%
59.04%
67.03%
68.00%
49.71%
86.29%
-4.96%
-23.68%
13.75%
2
C
36.92%
31.68%
42.17%
34.06%
30.25%
37.88%
2.86%
-3.63%
9.35%
3
D
33.09%
28.58%
37.61%
34.93%
30.11%
39.76%
-1.84%
-8.45%
4.77%
4
E
27.75%
21.40%
34.10%
23.92%
19.70%
28.14%
3.83%
-3.79%
11.45%
5
F
5.90%
3.51%
8.29%
7.04%
4.32%
9.75%
-1.14%
-4.76%
2.48%
Analysis of the Differences & Simpson’s Paradox
Overall: Males had a higher probability of being admitted compared to Females. Based on the normal approximation, the overall difference in probability is statistically significant because the 95% CI for the difference (+11.34% to +16.98%) does not cross zero.
By Department (Simpson’s Paradox): When we re-examine the data by looking at each department separately, the trend reverses or disappears:
Department A: Females were admitted at a significantly higher rate than Males (the 95% CI for the Male-Female difference is entirely negative: -28.25% to -12.44%).
Departments B, C, D, E, F: The 95% confidence intervals for the differences all cross zero, meaning there is no statistically significant difference between Male and Female admission probabilities in any of these individual departments.
This illustrates a classic example of Simpson’s Paradox: a trend appears in different groups of data but disappears or reverses when these groups are combined.