Q13: Explain K-Nearest Neighbors classification Algorithm with a suitable example.

Introduction

K-Nearest Neighbors (KNN) is a simple and intuitive supervised machine learning algorithm used for classification and regression tasks. It classifies new data points based on the majority label of the ‘k’ closest training examples in the feature space.

How KNN Works

Choose the number of neighbors ‘k’
Calculate the distance (e.g., Euclidean) between the new data point and all other points in the training dataset
Select the ‘k’ nearest neighbors
Assign the most frequent label among those neighbors to the new data point (for classification)

Example

Let’s consider a dataset where we want to predict whether a fruit is an Apple (A) or an Orange (O) based on its weight and texture.

Fruit	Weight (g)	Texture (1=smooth, 0=bumpy)
A	150	1
O	170	0
A	140	1
O	160	0

Now we want to classify a new fruit with weight = 155g and texture = 0. We’ll compute Euclidean distance between the new point and existing ones.

Euclidean Distance:

Distance = √((weight1 – weight2)² + (texture1 – texture2)²)

To A (150,1): √((155−150)² + (0−1)²) = √(25 + 1) = √26 ≈ 5.1
To O (170,0): √((155−170)² + (0−0)²) = √(225) = 15
To A (140,1): √((155−140)² + (0−1)²) = √(225 + 1) = √226 ≈ 15.03
To O (160,0): √((155−160)² + (0−0)²) = √25 = 5

For k=3, nearest neighbors are: A (150,1), O (160,0), O (170,0)

Majority = O → Predicted: Orange

Distance Metrics

Euclidean Distance (default)
Manhattan Distance
Minkowski Distance

Choosing k

Too small k: Model may overfit. Too large k: Model may underfit. Use cross-validation to choose the best ‘k’.

Advantages

Simple to understand and implement
No training phase (lazy learner)
Effective for small datasets

Disadvantages

Computationally expensive for large datasets
Performance degrades with high-dimensional data
Needs proper feature scaling

Conclusion

KNN is a foundational algorithm that works well for many problems where interpretability and simplicity are priorities. It’s a great baseline model to compare with more complex models.