9 Jan 2026

The Impact of Kernel Size on CNN Training

Mateo Lafalce - Blog

In CNNs, the kernel size is a critical hyperparameter that determines how the model interprets images. While  kernels are the standard, increasing the size to  or  drastically changes how the network learns.

The most immediate benefit of a larger kernel is an increased receptive field. A larger kernel sees a bigger section of the image at once.

The primary cost of increasing kernel size is computational. The number of parameters grows quadratically, not linearly.

This nearly triples the parameter count per filter. Consequently, training becomes slower due to higher FLOPs, and the model becomes more prone to overfitting because it has to learn significantly more parameters from the same amount of data.

Modern deep learning architectures generally prefer stacking multiple small kernels rather than using one large kernel.

Two stacked  layers provide the same effective receptive field as a single  layer, but with two major advantages:

  1. Fewer Parameters: You use 18 weights (9+9) instead of 25.
  2. More Non-Linearity: You can insert an activation function between the two layers, allowing the network to learn more complex features than a single linear pass of a  kernel.

This blog is open source. See an error? Go ahead and propose a change.