- [[Bidirectional RNN]]
- [[LSTM]]
- performs well on temporal data
- a huge limitation of RNN is that each output/prediction value can only use inputs earlier in the sequence, cannot be informed by the future
- note that $W_{aa}$ & $W_{ax}$ weights written below are reused for all layers of a RNN
![[CleanShot 2024-07-06 at
[email protected]|500]]
- below shows forward prop in blue arrows, & backprop in red arrows
![[CleanShot 2024-07-06 at
[email protected]|500]]
- we use [[Binary Cross Entropy Loss]], for each timestep
![[CleanShot 2024-07-06 at
[email protected]]]
- but it's only the sum of them, that gets us the loss value for a single training example
![[CleanShot 2024-07-06 at
[email protected]|500]]
- below are all types of RNNs:
![[CleanShot 2024-07-08 at
[email protected]|400]]
- note that for a RNN [[Language Models|Language Model]], the inputs will be the output from the previous timestep
- also note that during training, you will plug in the actual labelled previous output for calculating the next layer & getting the loss
![[CleanShot 2024-07-09 at
[email protected]|400]]