# Bayes’ Theorem ## What is Bayes’ Theorem? Bayes’ Theorem is a rule for updating what you believe when you get new information. - You start with an initial belief about how likely something is. - You then observe new evidence. - Bayes’ Theorem tells you how to revise your belief in light of that evidence. An example: Imagine a medical test for a disease. - The disease is rare. - The test is pretty accurate, but not perfect. - You test positive. Your immediate reaction might be: “I tested positive, so I probably have the disease.” Bayes’ Theorem helps answer the real question: “Given that I tested positive, how likely is it that I actually have the disease?” It forces you to consider: - How common the disease is - How accurate the test is - How often false positives occur ## Concepts and Terminology Bayes’ Theorem combines four concepts: 1. **Prior probability** What you believed before seeing the evidence. 2. **Likelihood** How probable the evidence is if your hypothesis were true. 3. **Evidence** (normalization) How common the evidence is overall. 4. **Posterior probability** What you believe after seeing the evidence. Let’s define two events for our example: - A = a hypothesis (e.g., “the patient has the disease”) - B = observed evidence (e.g., “the test is positive”) ## Conditional Probability We want to know: What is the probability of A, given that B occurred? This is written as: $P(A \mid B)$: “probability of A given B” This takes us into the field of conditional probability. By definition: $P(A \mid B) = \frac{P(A \cap B)}{P(B)}$ Out of all situations where B happens, what fraction also have A happening? - $P(A \cap B)$: probability that both A and B happen - $P(B)$: probability that B happens at all Similarly: $P(B \mid A) = \frac{P(A \cap B)}{P(A)}$ “If my belief were true, how likely is it that I would see this evidence?” Both expressions involve the same joint probability $P(A \cap B)$. ## Deriving Bayes’ Theorem From the two definitions above: $P(A \cap B) = P(A \mid B)P(B)$ $P(A \cap B) = P(B \mid A)P(A)$ Set them equal: $P(A \mid B)P(B) = P(B \mid A)P(A)$ Now solve for $P(A \mid B)$ to derive the mathematical definition of Bayes’ Theorem: $\boxed{ P(A \mid B) = \frac{P(B \mid A)\,P(A)}{P(B)} }$ Where: - $P(A)$ = prior probability - $P(B \mid A)$ = likelihood - $P(B)$ = evidence - $P(A \mid B)$ = posterior probability Bayes’ Theorem provides a formal, rational way to update prior beliefs $P(A)$ when new data $P(B)$ becomes available. ## What is the likelihood? Likelihood answers this question: “If my assumption were true, how plausible is the data I am seeing?” It does not ask whether the assumption is true. It asks how well the assumption explains the observed evidence. For example: - Assumption (hypothesis): “This coin is fair” - Observation (data): “I flipped it 10 times and got 8 heads” This is a critical distinction: - Probability: “How likely is the hypothesis?” - Likelihood: “How compatible is the data with the hypothesis?” This number is **measured or assumed, not derived from Bayes’ Theorem**. ## Counting the likelihood If you can directly observe cases where the hypothesis is true: $P(B \mid A) = \frac{\text{Number of times B occurs when A is true}}{\text{Total number of times A is true}}$ Example: You observe 950 positives out of 1,000 diseased patients. $P(B \mid A) = \frac{950}{1000} = 0.95$ ## Modeling the likelihood When direct counting is impossible, you use a model. Example: coin flips. - Hypothesis: coin has probability p of heads - Data: 8 heads out of 10 flips Likelihood (via the binomial model): $P(\text{data} \mid p) = \binom{10}{8} p^8 (1-p)^2$ For a fair coin $(p = 0.5)$: $P(\text{data} \mid p=0.5) \approx 0.044$ So the likelihood of getting 8 heads out of 10 flips with a fair coin is below 5%. $P(B)$ and $P(B \mid A)$ **are the same when you only have a single hypothesis** like “fair coin,” with probability 1; in real Bayesian inference, $P(B)$ aggregates likelihoods across all competing hypotheses. ## Hypothesis testing ### Setup: define hypotheses and data Observed data B: Exactly 8 heads out of 10 flips Competing hypotheses: - $A_1$: Coin is fair and heads $p = 0.5$ - $A_2$: Coin is biased toward heads $p = 0.8$ ### Step 1: Assign prior beliefs Suppose before seeing any data you believed: - $P(A_1) = 0.7$ (fair coin) - $P(A_2) = 0.3$ (biased coin) ### Step 2: Compute likelihoods $P(B \mid A_i)$ Use the binomial distribution: $P(B \mid p) = \binom{10}{8} p^8 (1-p)^2$ Fair coin (p = 0.5): $P(B \mid A_1) = 45 \cdot (0.5)^{10} = \frac{45}{1024} \approx 0.04395$ Biased coin (p = 0.8): $P(B \mid A_2) = 45 \cdot (0.8)^8 (0.2)^2 \approx 0.302$ ### Step 3: Compute P(B) (the evidence) $P(B) = P(B \mid A_1)P(A_1) + P(B \mid A_2)P(A_2)$ $P(B) = (0.04395 \times 0.7) + (0.302 \times 0.3)$ $P(B) = 0.0308 + 0.0906 = 0.1214$ ### Step 4: Update beliefs (posterior probabilities) Apply Bayes’ Theorem. Posterior for fair coin $P(A_1 \mid B) = \frac{0.04395 \times 0.7}{0.1214} \approx 0.254$ Posterior for biased coin $P(A_2 \mid B) = \frac{0.302 \times 0.3}{0.1214} \approx 0.746$ ### Step 5: Interpret the update - You started believing the coin was probably fair (70%) - Seeing 8 heads is much more compatible with a biased coin - After updating your belief is: - Fair coin: 25.4% - Biased coin: 74.6% The data shifted belief toward the hypothesis that better explains it. When multiple hypotheses exist, P(B) becomes a weighted average of likelihoods, and Bayes’ Theorem redistributes belief toward the hypotheses that best explain the observed data.