Bayesian Inference: A Modern Approach to Uncertainty

·6 mins

In the world of data science, dealing with uncertainty is a constant challenge. Predicting outcomes, modeling trends, or making decisions based on incomplete data all boil down to one thing: how confident are we in the predictions we make? This is where Bayesian Inference comes into play. It provides a framework for updating our beliefs about a hypothesis as we gather new evidence, allowing us to model uncertainty in a probabilistic and flexible manner. Instead of thinking in absolutes, it allows us to work with degrees of belief, which proves incredibly powerful in fields ranging from medical diagnoses to spam detection.

Understanding Bayes’ Theorem

At the core of Bayesian Inference is Bayes’ Theorem, which helps us calculate the probability of a hypothesis given some observed data. The theorem is named after the Reverend Thomas Bayes, an 18th-century statistician and theologian who first formulated it. The theorem is expressed as:

$$ P(H|D) = \frac{P(D|H) \cdot P(H)}{P(D)} $$

This formula may look intimidating at first, but it's quite intuitive once you break it down! Essentially, \(P(H|D)\), or the posterior probability, represents the probability of a hypothesis \(H\) being true after observing some data \(D\). Meanwhile, \(P(D|H)\), or the likelihood, represents the probability of observing the data \(D\) if the hypothesis \(H\) were true. Then we have \(P(H)\), the prior probability, which is our initial belief about the hypothesis before observing the data. Finally, \(P(D)\), known as the evidence, is the total probability of observing the data, considering all possible hypotheses.

To put this in simple terms, Bayes’ Theorem allows us to update our beliefs (the prior) in light of new evidence (the likelihood) to arrive at a more accurate probability (the posterior). This makes it an valuable tool for constantly refining our predictions in the face of uncertainty.

A Real-World Application: Medical Diagnoses

To better understand Bayesian Inference, let’s consider a classic real-world scenario: medical diagnosis. Suppose a patient is being tested for a rare disease, and the test itself has an accuracy of 99% (for both detecting the disease and returning a negative result when the disease is absent). However, this disease is rare, affecting only 1 in 10,000 people. With a positive test result, how do we calculate the actual probability that the patient has the disease?

This is where Bayes' Theorem comes into play. In this case, our prior probability \(P(H)\), which is the likelihood that the patient has the disease before knowing the test result, is 0.0001, given how rare the disease is. The likelihood \(P(D|H)\), which is the probability of getting a positive result if the patient has the disease, is 0.99 due to the test's accuracy. The evidence \(P(D)\), or the overall probability of getting a positive result, is calculated by considering both true positives and false positives.

By plugging these values into Bayes' Theorem, we can calculate the posterior probability \(P(H|D)\), which tells us the likelihood that the patient has the disease given a positive test result. This is a powerful way to refine our initial beliefs based on new information, making Bayesian Inference a valuable tool in medical diagnostics and beyond.

Bayesian Inference in Machine Learning

In data science, Bayesian inference proves particularly useful in building models that can update their predictions as new data comes in. This is especially relevant when working with small datasets or in situations where data arrives sequentially over time. One of the most well-known machine learning algorithms that utilizes Bayesian principles is the Naive Bayes classifier.

The Naive Bayes classifier is deceptively simple, yet widely used for tasks like spam detection, sentiment analysis, and even medical diagnoses. Despite its name, the algorithm assumes that the features (such as words in an email for spam detection) are independent of each other, given the class (spam or not spam). While this assumption may not hold perfectly in practice, Naive Bayes performs surprisingly well.

I’ve created a Python example to demonstrate how the Naive Bayes classifier works in practice using scikit-learn, which is a very popular ML library in Python.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the iris dataset
data = load_iris()
X = data['data']
y = data['target']

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Gaussian Naive Bayes model
model = GaussianNB()

# Train the model
model.fit(X_train, y_train)

# Predict the test set
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

In this example, we use the famous Iris dataset to build a simple classifier that predicts the species of a flower based on features like petal length and width. Despite its simplicity, Naive Bayes often achieves a high accuracy, making it a popular choice for many classification problems.

(.venv) selsayed@elsayed-laptop:~/Programs/codects/bayesian$ python naive_bayes.py
Accuracy: 97.78%

Prior and Posterior Distributions

A key aspect of Bayesian inference is the way we model prior and posterior distributions. When we start with a prior, we often use probability distributions to represent our initial belief about a hypothesis. As more data becomes available, this prior is updated to form the posterior distribution, which provides a more accurate representation of the hypothesis given the new evidence.

Take A/B testing as an example. Imagine you’re testing two versions of a webpage (A and B) to see which version generates more conversions. Using Bayesian inference, you can use a Beta distribution to represent your initial belief (the prior) about the conversion rates of the two versions. As you observe actual conversions over time, the prior gets updated to form a posterior, which becomes increasingly accurate as more data is collected.

Here, the code example shows how to model Bayesian A/B testing using PyMC3.

Spoiler alert: The code snippet is not included in this post because I didn’t get it to work, but I’ll update this post once I figure it out!

Advantages of Bayesian Inference

There are plenty of advantages that the Bayesian Inference gives! One of the major advantages of Bayesian inference is its ability to incorporate prior knowledge into models. This is particularly useful in situations where domain expertise or historical data can provide valuable context. For instance, in medical research, where datasets are often small, being able to include prior knowledge can lead to more accurate predictions.

The probabilistic output of Bayesian models is another benefit. Instead of just providing a point estimate (as is common in many machine learning algorithms), Bayesian methods give us a distribution over possible outcomes. This is incredibly valuable because it allows us to quantify the uncertainty in our predictions, providing a more nuanced understanding of the results.

Lastly, Bayesian inference handles small datasets very well. While traditional frequentist methods often require large amounts of data to make accurate predictions, Bayesian models can work effectively even with limited data, thanks to the use of priors. This makes Bayesian methods highly adaptable to fields like medical research, where data is scarce but prior knowledge is abundant.


Sami Elsayed is a Senior at TJHSST, and the current Lead Sysadmin at the tjCSL. He’s the Co-Founder of the Cardinal Development Organization, and the current Head Writer of “The Techbook.”