# Bayes’ Theorem
## What is Bayes’ Theorem?
Bayes’ Theorem is a rule for updating what you believe when you get new information.
- You start with an initial belief about how likely something is.
- You then observe new evidence.
- Bayes’ Theorem tells you how to revise your belief in light of that evidence.
An example: Imagine a medical test for a disease.
- The disease is rare.
- The test is pretty accurate, but not perfect.
- You test positive.
Your immediate reaction might be:
“I tested positive, so I probably have the disease.”
Bayes’ Theorem helps answer the real question:
“Given that I tested positive, how likely is it that I actually have the disease?”
It forces you to consider:
- How common the disease is
- How accurate the test is
- How often false positives occur
## Concepts and Terminology
Bayes’ Theorem combines four concepts:
1. **Prior probability**
What you believed before seeing the evidence.
2. **Likelihood**
How probable the evidence is if your hypothesis were true.
3. **Evidence** (normalization)
How common the evidence is overall.
4. **Posterior probability**
What you believe after seeing the evidence.
Let’s define two events for our example:
- A = a hypothesis (e.g., “the patient has the disease”)
- B = observed evidence (e.g., “the test is positive”)
## Conditional Probability
We want to know:
What is the probability of A, given that B occurred?
This is written as:
$P(A \mid B)$: “probability of A given B”
This takes us into the field of conditional probability.
By definition:
$P(A \mid B) = \frac{P(A \cap B)}{P(B)}$
Out of all situations where B happens, what fraction also have A happening?
- $P(A \cap B)$: probability that both A and B happen
- $P(B)$: probability that B happens at all
Similarly:
$P(B \mid A) = \frac{P(A \cap B)}{P(A)}$
“If my belief were true, how likely is it that I would see this evidence?”
Both expressions involve the same joint probability $P(A \cap B)$.
## Deriving Bayes’ Theorem
From the two definitions above:
$P(A \cap B) = P(A \mid B)P(B)$
$P(A \cap B) = P(B \mid A)P(A)$
Set them equal:
$P(A \mid B)P(B) = P(B \mid A)P(A)$
Now solve for $P(A \mid B)$ to derive the mathematical definition of Bayes’ Theorem:
$\boxed{ P(A \mid B) = \frac{P(B \mid A)\,P(A)}{P(B)} }$
Where:
- $P(A)$ = prior probability
- $P(B \mid A)$ = likelihood
- $P(B)$ = evidence
- $P(A \mid B)$ = posterior probability
Bayes’ Theorem provides a formal, rational way to update prior beliefs $P(A)$ when new data $P(B)$ becomes available.
## What is the likelihood?
Likelihood answers this question:
“If my assumption were true, how plausible is the data I am seeing?”
It does not ask whether the assumption is true.
It asks how well the assumption explains the observed evidence.
For example:
- Assumption (hypothesis): “This coin is fair”
- Observation (data): “I flipped it 10 times and got 8 heads”
This is a critical distinction:
- Probability: “How likely is the hypothesis?”
- Likelihood: “How compatible is the data with the hypothesis?”
This number is **measured or assumed, not derived from Bayes’ Theorem**.
## Counting the likelihood
If you can directly observe cases where the hypothesis is true:
$P(B \mid A) = \frac{\text{Number of times B occurs when A is true}}{\text{Total number of times A is true}}$
Example: You observe 950 positives out of 1,000 diseased patients.
$P(B \mid A) = \frac{950}{1000} = 0.95$
## Modeling the likelihood
When direct counting is impossible, you use a model.
Example: coin flips.
- Hypothesis: coin has probability p of heads
- Data: 8 heads out of 10 flips
Likelihood (via the binomial model):
$P(\text{data} \mid p) = \binom{10}{8} p^8 (1-p)^2$
For a fair coin $(p = 0.5)$:
$P(\text{data} \mid p=0.5) \approx 0.044$
So the likelihood of getting 8 heads out of 10 flips with a fair coin is below 5%.
$P(B)$ and $P(B \mid A)$ **are the same when you only have a single hypothesis** like “fair coin,” with probability 1; in real Bayesian inference, $P(B)$ aggregates likelihoods across all competing hypotheses.
## Hypothesis testing
### Setup: define hypotheses and data
Observed data B:
Exactly 8 heads out of 10 flips
Competing hypotheses:
- $A_1$: Coin is fair and heads $p = 0.5$
- $A_2$: Coin is biased toward heads
$p = 0.8$
### Step 1: Assign prior beliefs
Suppose before seeing any data you believed:
- $P(A_1) = 0.7$ (fair coin)
- $P(A_2) = 0.3$ (biased coin)
### Step 2: Compute likelihoods $P(B \mid A_i)$
Use the binomial distribution:
$P(B \mid p) = \binom{10}{8} p^8 (1-p)^2$
Fair coin (p = 0.5):
$P(B \mid A_1) = 45 \cdot (0.5)^{10} = \frac{45}{1024} \approx 0.04395$
Biased coin (p = 0.8):
$P(B \mid A_2) = 45 \cdot (0.8)^8 (0.2)^2 \approx 0.302$
### Step 3: Compute P(B) (the evidence)
$P(B) = P(B \mid A_1)P(A_1) + P(B \mid A_2)P(A_2)$
$P(B) = (0.04395 \times 0.7) + (0.302 \times 0.3)$
$P(B) = 0.0308 + 0.0906 = 0.1214$
### Step 4: Update beliefs (posterior probabilities)
Apply Bayes’ Theorem.
Posterior for fair coin
$P(A_1 \mid B) = \frac{0.04395 \times 0.7}{0.1214} \approx 0.254$
Posterior for biased coin
$P(A_2 \mid B) = \frac{0.302 \times 0.3}{0.1214} \approx 0.746$
### Step 5: Interpret the update
- You started believing the coin was probably fair (70%)
- Seeing 8 heads is much more compatible with a biased coin
- After updating your belief is:
- Fair coin: 25.4%
- Biased coin: 74.6%
The data shifted belief toward the hypothesis that better explains it.
When multiple hypotheses exist, P(B) becomes a weighted average of likelihoods, and Bayes’ Theorem redistributes belief toward the hypotheses that best explain the observed data.