Q12: Explain Naïve Bayes Classification Algorithm with a suitable example.

Introduction

Naïve Bayes is a simple yet powerful probabilistic classification algorithm based on Bayes’ Theorem. It assumes that the features are conditionally independent given the class label. Despite its ‘naïve’ assumption, it performs well in various real-world scenarios like spam detection, sentiment analysis, and document classification.

Bayes’ Theorem

Bayes’ Theorem is the foundation of the Naïve Bayes algorithm:

P(C|X) = [P(X|C) * P(C)] / P(X)

Where:

P(C|X) is the posterior probability of class C given predictor X
P(X|C) is the likelihood probability
P(C) is the prior probability of class
P(X) is the prior probability of predictor

Assumption

It assumes all features are independent given the class label. That is:

P(X1, X2, ..., Xn | C) = P(X1 | C) * P(X2 | C) * ... * P(Xn | C)

Types of Naïve Bayes Models

Gaussian: Assumes features follow a normal distribution (used for continuous data)
Multinomial: Used for discrete count features (e.g., text classification)
Bernoulli: Used for binary/boolean features

Example

Suppose we want to classify whether an email is “Spam” or “Not Spam” based on two features: “Contains ‘Free'” and “Contains ‘Buy now'”.

Email	Free	Buy Now	Spam
1	Yes	Yes	Yes
2	No	Yes	Yes
3	Yes	No	No
4	No	No	No

Step 1: Calculate prior probabilities

P(Spam=Yes) = 2/4 = 0.5
P(Spam=No) = 2/4 = 0.5

Step 2: Calculate likelihoods

P(Free=Yes | Spam=Yes) = 1/2 = 0.5
P(Buy Now=Yes | Spam=Yes) = 2/2 = 1
P(Free=Yes | Spam=No) = 1/2 = 0.5
P(Buy Now=Yes | Spam=No) = 0/2 = 0

Step 3: Predict for new email: Free=Yes, Buy Now=Yes

P(Spam=Yes | X) ∝ 0.5 * 0.5 * 1 = 0.25
P(Spam=No | X) ∝ 0.5 * 0.5 * 0 = 0

Prediction: Email is classified as Spam

Advantages

Fast and simple
Works well with high-dimensional data
Requires less training data

Limitations

Assumes feature independence
Zero probabilities if a feature class never appears

Conclusion

Naïve Bayes is a powerful classification algorithm that works well for many tasks, especially with text data. It is computationally efficient, easy to implement, and surprisingly effective despite its simplifying assumptions.