Today I was given the task of trying incremental learning on the CUB200 dataset in four different ways:
1) With no special regularization
2) With L2SP regularization where the parameters of the first batch become the initialized parameters of the second batch
3) With full rehearsal
4) With partial rehearsal by implementing the herding method for finding a fixed buffer of k samples after training on the first batch
Using default values for the Resnet18, I first evaluated the offline learning accuracy to be around 73%. Then, I used an adjusted dataloader function to split the dataset into two batches to be trained incrementally. After setting the regularization term weights to 0, I trained the new incremental model with no special regularization and got an accuracy of 39.72%. This drop in accuracy demonstrates the phenomenon of catastrophic forgetting associated with incremental learning as this accuracy yields an Omega value of .5418 using the metric definition below.
1) With no special regularization
2) With L2SP regularization where the parameters of the first batch become the initialized parameters of the second batch
3) With full rehearsal
4) With partial rehearsal by implementing the herding method for finding a fixed buffer of k samples after training on the first batch
Using default values for the Resnet18, I first evaluated the offline learning accuracy to be around 73%. Then, I used an adjusted dataloader function to split the dataset into two batches to be trained incrementally. After setting the regularization term weights to 0, I trained the new incremental model with no special regularization and got an accuracy of 39.72%. This drop in accuracy demonstrates the phenomenon of catastrophic forgetting associated with incremental learning as this accuracy yields an Omega value of .5418 using the metric definition below.
Next, I implemented the L2SP regularization (essentially penalizes the model if the updated parameters deviate significantly from the initialized parameters) making sure the weights for the second batch were the derived weights after the first incremental batch.
(Both this model and the full rehearsal model are still training as I write this so I will update with the accuracy and Omega values next week)
Implementing full rehearsal was a little challenging at first because I needed to figure out how to randomly incorporate all of the data from the first batch into the second batch as well (so the model could 'rehearse' what it had already learned on the first batch while also learning the new classes of the second batch). However, in practice, this technique is not ideal because it requires all of the data from the first batch to be stored in memory and retrained in the second batch. When working with larger datasets of attempting to incrementally learn more batches, this type of learning fails due to computational constraints and real-world efficacy.
A more realistic option is psuedo-rehearsal or partial rehearsal where only some of the samples (or some compressed form of them) would be stored. The challenge with this method is deciding how to determine which samples to keep and which to discard. The strategy I hope to implement will use the mean of the samples of a given class, order the samples based on their importance to determining the overall mean, and only keep a certain number of the most important samples. Although I didn't finish this model today, I am looking forward to figuring out the rest of it next week.
Comments
Post a Comment