Deep Learning - Brendan Shih

- [[Deep Learning Models]] - [[Deep Learning Model Components]] - [[Deep Learning Training]] - [[Data Preparation]] ## Algorithm --- 1. Define neural network structure (# of input units, # hidden units etc) 2. Initialize model's parameters 3. Loop: - Implement forward propagation - Compute loss - Implement backward propagation to get gradients - Update parameters ## General --- - applied deep learning is very empirical process, very evidence based - Anant Sahai says no one really knows why it works, like ancient alchemy - when starting a new application, almost impossible to guess the right values for hyperparameters - layers, hidden units, learning rates, activation functions etc. - being big alone is not enough to perform well, we need it to be **deep** - from circuit theory: - there are functions you can compute with a "small" but deep neural network that shallower networks require exponentially more hidden units to compute ![[CleanShot 2024-06-11 at [email protected]|350]] ![[CleanShot 2024-06-11 at [email protected]|350]] ![[CleanShot 2024-06-10 at [email protected]|400]] - 2 computations in a node, z combo on left, then activation on right ![[CleanShot 2024-06-10 at [email protected]|300]] ## Andrew Ng's Standardized Notation/Setup --- ![[deep-learning-notation.pdf]] ![[ACCE4963-52DB-4618-97F9-FF9A2570C02A_1_105_c.jpeg]] Notation (Andrew Ng Coursera) - note that by putting training rows as columns, it is much easier to code - superscript $x^{(i)}, y^{(i)}, z^{(i)}$ refer to $ith$ training example out of the $m$ total samples ![[CleanShot 2024-06-05 at [email protected]|500]] - note that square superscript brackets refer to layer, old parentheses bracket refers to index of the $m$ training examples ![[CleanShot 2024-06-10 at [email protected]|300]]