9 Jan 2026

The Impact of Kernel Size on CNN Training

Mateo Lafalce - Blog

In CNNs, the kernel size is a critical hyperparameter that determines how the model interprets images. While kernels are the standard, increasing the size to or drastically changes how the network learns.

The most immediate benefit of a larger kernel is an increased receptive field. A larger kernel sees a bigger section of the image at once.

Benefit: The model captures global context and large structural shapes more easily.
Downside: It tends to lose fine grained local details, such as texture or small edges, because it is aggregating information over a wider area.

The primary cost of increasing kernel size is computational. The number of parameters grows quadratically, not linearly.

A kernel has 9 weights.
A kernel has 25 weights.

This nearly triples the parameter count per filter. Consequently, training becomes slower due to higher FLOPs, and the model becomes more prone to overfitting because it has to learn significantly more parameters from the same amount of data.

Modern deep learning architectures generally prefer stacking multiple small kernels rather than using one large kernel.

Two stacked layers provide the same effective receptive field as a single layer, but with two major advantages:

Fewer Parameters: You use 18 weights (9+9) instead of 25.
More Non-Linearity: You can insert an activation function between the two layers, allowing the network to learn more complex features than a single linear pass of a kernel.

This blog is open source. See an error? Go ahead and propose a change.