Training RNNs
Fundamentally, there are two types of RNN architectures: Many-to-One RNNs and Many-to-Many RNNs. In Many-to-One RNNs, the input is a sequence, and the output is a single element. A classic example of Many-to-One RNNs is sentiment analysis, where the goal is to classify a sentence as positive or negative. The RNN takes in a sequence of words and produces a single output representing the sentiment of the sentence.
To train Many-to-One RNNs, the procedure is similar to that of conventional neural networks. For each input sequence, the loss is calculated once the entire sequence is consumed. This means that we can wait until the entire sequence is processed before calculating the loss. This approach is suitable for tasks where the ground truth is dependent on the entire input sequence.
A second type of Recurrent Neural Network (RNN) is the Many-to-Many RNN architecture. In this configuration, both the input and output are sequences. Unlike the Many-to-One RNNs where the network generates a single output for the entire sequence, in Many-to-Many RNNs, the network generates an activation for each element of the sequence.
This means that the loss function changes as each element is generated for a given input sequence. We cannot wait until the entire sequence is consumed to calculate the loss, as the loss needs to be updated at each time step. This dynamic updating of the loss function allows the network to learn and make adjustments continuously.