7 Jan 2026

The Engine of Computer Vision

Mateo Lafalce - Blog

Convolutional Neural Networks are the gold standard for image recognition. But what actually makes them work? The secret lies in a small but powerful component: the filter/kernel.

What is a Filter?

Physically, a filter is just a small matrix of learnable weights, usually a grid sized or . You can think of it as a specialized scanner.

The filter performs a mathematical operation called convolution. It slides across the input image pixel by pixel. At every stop, it multiplies its own values by the underlying pixel values and sums the result. This process creates a Feature Map (or Activation Map), a new grid that highlights exactly where specific patterns appear in the image.

Why Are Filters Used?

Filters are the primary tools for Feature Extraction. Instead of programmers manually defining what a curve looks like, the network learns the best filter values to find these patterns automatically.

Low-level filters: These detect simple geometric structures like vertical lines, horizontal edges, or corners.
High-level filters: The network combines these simple features to recognize complex objects, such as eyes, wheels, or feathers.

This approach offers Translation Invariance, meaning the same filter can identify a feature regardless of whether it appears in the top-left corner or the center of the photo.

Imagine a simple filter designed to detect vertical edges:

If this filter slides over a solid color, the math cancels out to 0. However, if it slides over a transition from dark to light, the result is a high number. This activates that specific position on the feature map, effectively telling the computer: There is a vertical edge here!

This blog is open source. See an error? Go ahead and propose a change.