- [[Bidirectional RNN]] - [[LSTM]] - performs well on temporal data - a huge limitation of RNN is that each output/prediction value can only use inputs earlier in the sequence, cannot be informed by the future - note that $W_{aa}$ & $W_{ax}$ weights written below are reused for all layers of a RNN ![[CleanShot 2024-07-06 at [email protected]|500]] - below shows forward prop in blue arrows, & backprop in red arrows ![[CleanShot 2024-07-06 at [email protected]|500]] - we use [[Binary Cross Entropy Loss]], for each timestep ![[CleanShot 2024-07-06 at [email protected]]] - but it's only the sum of them, that gets us the loss value for a single training example ![[CleanShot 2024-07-06 at [email protected]|500]] - below are all types of RNNs: ![[CleanShot 2024-07-08 at [email protected]|400]] - note that for a RNN [[Language Models|Language Model]], the inputs will be the output from the previous timestep - also note that during training, you will plug in the actual labelled previous output for calculating the next layer & getting the loss ![[CleanShot 2024-07-09 at [email protected]|400]]