4 Jan 2026
Mateo Lafalce - Blog
K-means clustering is one of the most fundamental algorithms in unsupervised machine learning. Its goal is deceptively simple: take a messy, unlabeled dataset and organize it into tidy, distinct groups based on similarity.
Unlike supervised learning, where the computer is given the answers, K-means must find the hidden structure on its own.
The algorithm relies on an iterative process to find order in chaos. The K represents the number of groups you want to find, and the Means refers to the average center of those groups.
The T-Shirt Problem
To understand why this is useful, imagine a clothing brand that needs to manufacture T-shirts. They have the height and weight data of 10,000 customers, but they can't make custom shirts for everyone. They need to define three standard sizes: Small, Medium, and Large.
Instead of guessing the measurements, they use K-means with K=3.
The algorithm plots all 10,000 customers on a graph. It creates three clusters based on where the data points naturally congregate. The mathematical centers of these three final clusters become the precise measurements for the Small, Medium, and Large shirts.
K-means is a tool for efficiency. Whether it is segmenting customers by purchasing behavior, compressing the colors in an image, or organizing documents by topic, K-means transforms raw noise into structured, actionable insights.
This blog is open source. See an error? Go ahead and propose a change.