- convolution output dimension = $\lfloor\frac{n + 2p - f}{s} + 1\rfloor\,X\,\lfloor\frac{n + 2p - f}{s} + 1\rfloor$
- $n =$ input image side length (square image)
- $p =$ [[Padding]]
- $s =$ [[Stride]]
- $f =$ filter/kernel side length (square filter)
- notice that padding increases dimension of both sides by 2, not 1
![[CleanShot 2024-06-26 at
[email protected]]]
- in $Z = Wx + b$ analogy, filter is $W$ & $x$ is the input image
- note we also apply bias then [[ReLU Activation Function|ReLU]]
- filters have unique weights for each value in its matrix, but the whole filter itself is analogous to the $W$ in vanilla [[Deep Learning|Neural Networks]]
- $Wx$ is the element wise multiplication of the input with the filter
- for [[3D Convolutional Neural Networks]], it's also a complete element wise multiplication, but instead of between 2 matrices, it's 2 3d tensors
- then for the above result, sum every single element together to get a scalar
- $Z$ = that above scalar + a scalar $b$ parameter (this $b$ is unique for each filter)
- final output of the convolution layer is applying an [[Activation Functions|Activation Function]] on the scalar $Z$