4 Jan 2026

K-Means Clustering

Mateo Lafalce - Blog

K-means clustering is one of the most fundamental algorithms in unsupervised machine learning. Its goal is deceptively simple: take a messy, unlabeled dataset and organize it into tidy, distinct groups based on similarity.

Unlike supervised learning, where the computer is given the answers, K-means must find the hidden structure on its own.

The algorithm relies on an iterative process to find order in chaos. The K represents the number of groups you want to find, and the Means refers to the average center of those groups.

Initialize: The algorithm guesses the starting center points for K number of groups.
Assign: Every data point looks for the nearest centroid and joins that group.
Update: The algorithm calculates the actual mathematical center (the mean) of the newly formed groups and moves the centroid there.
Repeat: This cycle continues until the groups stabilize and stop changing.

The T-Shirt Problem

To understand why this is useful, imagine a clothing brand that needs to manufacture T-shirts. They have the height and weight data of 10,000 customers, but they can't make custom shirts for everyone. They need to define three standard sizes: Small, Medium, and Large.

Instead of guessing the measurements, they use K-means with K=3.

The algorithm plots all 10,000 customers on a graph. It creates three clusters based on where the data points naturally congregate. The mathematical centers of these three final clusters become the precise measurements for the Small, Medium, and Large shirts.

K-means is a tool for efficiency. Whether it is segmenting customers by purchasing behavior, compressing the colors in an image, or organizing documents by topic, K-means transforms raw noise into structured, actionable insights.

This blog is open source. See an error? Go ahead and propose a change.

KMeansClustering