Convolution Layer - Brendan Shih

- convolution output dimension = $\lfloor\frac{n + 2p - f}{s} + 1\rfloor\,X\,\lfloor\frac{n + 2p - f}{s} + 1\rfloor$ - $n =$ input image side length (square image) - $p =$ [[Padding]] - $s =$ [[Stride]] - $f =$ filter/kernel side length (square filter) - notice that padding increases dimension of both sides by 2, not 1 ![[CleanShot 2024-06-26 at [email protected]]] - in $Z = Wx + b$ analogy, filter is $W$ & $x$ is the input image - note we also apply bias then [[ReLU Activation Function|ReLU]] - filters have unique weights for each value in its matrix, but the whole filter itself is analogous to the $W$ in vanilla [[Deep Learning|Neural Networks]] - $Wx$ is the element wise multiplication of the input with the filter - for [[3D Convolutional Neural Networks]], it's also a complete element wise multiplication, but instead of between 2 matrices, it's 2 3d tensors - then for the above result, sum every single element together to get a scalar - $Z$ = that above scalar + a scalar $b$ parameter (this $b$ is unique for each filter) - final output of the convolution layer is applying an [[Activation Functions|Activation Function]] on the scalar $Z$