loading

Convolutional Layer — Step-by-Step Visualization

mediumAIDLCNNComputer Vision

Step through 2D convolution — watch a 2×2 filter slide across a 4×4 input, computing each cell of the 3×3 feature map via dot products.

Algorithm Pattern

Sliding Window Dot Product

Key Idea

A convolutional layer learns spatial features by sliding a small filter (kernel) over the input and computing a dot product at each position.

Step-by-Step Approach

  1. Place the filter at the top-left of the input (position 0,0).
  2. Compute the element-wise product of the filter and the overlapping input region.
  3. Sum all products — this is the output feature map value at that position.
  4. Slide the filter right (or down) by the stride amount.
  5. Repeat for all valid positions to produce the complete feature map.

Common Gotchas

  • Output size = (input_size - kernel_size) / stride + 1. A 4×4 input with 2×2 kernel gives 3×3 output.
  • Padding ('same') keeps spatial dimensions constant by adding zeros around the input.
  • Shared weights: the same filter is used at every position — this is what makes CNNs parameter-efficient.

Related Problems