M.F No Yes
Female 0.2823685 0.1230667
Male 0.3298719 0.2646929
No Yes
Female 0.2823685 0.1230667
Male 0.3298719 0.2646929
OIS Chapter 3 and Jaynes
2026-02-25
Ethics: Dehumanization for Aggregation
The table sums to one.
For Berkeley:
The row/column sums to one. We collapse the table to a single margin. Here, two can be identified. The probability of Admit and the probability of M.F.
How does one margin of the table break down given values of another? Each row or column sums to one
Four can be identified, the probability of admission/rejection for Male, for Female; the probability of male or female for admits/rejects.
For Berkeley:
Is a combination of the distributive property of multiplication and the fact that probabilities sum to one.
For example, the probability of Admitted and Male is the probability of admission for males times the probability of male.
Pr(x=x, y=y) = Pr(y | x)Pr(x)
Or it is the probability of being admitted times the probability of being male among admits.
Pr(x=x, y=y) = Pr(x | y)Pr(y)
To find the joint probability [the intersection] of x and y, we can use either of the aforementioned methods. To turn this into a conditional probability, we simply take it is a proportion of the relevant margin.
Pr(x | y) = \frac{Pr(y | x) Pr(x)}{Pr(y)}
Prior customers and current customers; engagement metrics, etc. The rows are prior state: customer/user and not. The columns are current state: customer/user and not.
The churn rate is the rate at which prior customers become current non-customers: a conditional probability.
Three nodes: guilty and not at each, convict at the third.
To find the joint probability [the intersection] of x and y, we can use either of the aforementioned methods. To turn this into a conditional probability, we simply take it is a proportion of the relevant margin.
Pr(x | y) = \frac{Pr(y | x) Pr(x)}{Pr(y)}
By itself, this is algebra. It is magic in an application.
Pr(User | +) = \frac{Pr(+ | User) Pr(User)}{Pr(+)}
This poses the question: what does a positive test mean?
Suppose a test is 99% accurate for Users and 95% accurate for non-Users. Moreover, suppose that Users make up 10% of the population. So given some population to which this applies, we have:
Pr(User, +) = Pr(+ | User)*Pr(User)
Pr(User, -) = Pr(- | User)*Pr(User)
Pr(\overline{User}, +) = Pr(+ | \overline{User})*Pr(\overline{User})
and
Pr(\overline{User}, -) = Pr(- | \overline{User})*Pr(\overline{User})
| Status | Positive | Negative | Total |
|---|---|---|---|
| User | 0.099 | 0.001 | 0.1 |
| non-User | 0.045 | 0.855 | 0.9 |
| ——— | ———– | ———- | —— |
| Total | 0.144 | 0.856 | 1.0 |
Pr(User | +) = \frac{Pr(+ | User) Pr(User)}{Pr(+)}
yields:
Pr(User | +) = \frac{0.099 [0.99*0.1]}{0.144[0.99*0.1 + 0.05*0.9]} = 0.6875
Sensitivity refers to the ability of a test to designate an individual with a disease as positive. Specificity refers to the ability of a test to designate an individual without a disease as negative.
False positives are then the complement/opposite of specificity and false negatives are the complement/opposite of sensitivity.
| Truth | Positive Test | Negative Test |
|---|---|---|
| Positive | Sensitivity | False Negative |
| Negative | False Positive. | Specificity |
When we get to hypothesis testing in inference, this comes up again with null and alternative hypotheses and the related decision.
| Truth | Reject Null | Accept Null |
|---|---|---|
| Alternative | Correct | Type II error |
| Null | Type I error | Correct |
What does it mean to say something is independent of something else? The simplest way to think about it is, “do I learn something more about x by knowing y than not”. If two things are independent, I don’t need to care about y if x is my objective.
I do not love the book definition of this. Technically, it is a variable whose values are generated according to some random process; your book implies that these are limited to quantities. It is really a measurable function defined on a probability space that maps from the sample space [the set of possible outcomes] to the real numbers.
Pr(Decision | data) = \frac{Pr(data | Decision) Pr(Decision)}{Pr(Data)}

BUS 1301-SP26