Skip to main content

Day 10 (7/19/19): 2-Batch Learning with Regularization and Rehearsal

Today I was given the task of trying incremental learning on the CUB200 dataset in four different ways:

1) With no special regularization 
2) With L2SP regularization where the parameters of the first batch become the initialized parameters of the second batch
3) With full rehearsal 
4) With partial rehearsal by implementing the herding method for finding a fixed buffer of k samples after training on the first batch

Using default values for the Resnet18, I first evaluated the offline learning accuracy to be around 73%. Then, I used an adjusted dataloader function to split the dataset into two batches to be trained incrementally. After setting the regularization term weights to 0, I trained the new incremental model with no special regularization and got an accuracy of 39.72%. This drop in accuracy demonstrates the phenomenon of catastrophic forgetting associated with incremental learning as this accuracy yields an Omega value of .5418 using the metric definition below.


Next, I implemented the L2SP regularization (essentially penalizes the model if the updated parameters deviate significantly from the initialized parameters) making sure the weights for the second batch were the derived weights after the first incremental batch. 

(Both this model and the full rehearsal model are still training as I write this so I will update with the accuracy and Omega values next week)

Implementing full rehearsal was a little challenging at first because I needed to figure out how to randomly incorporate all of the data from the first batch into the second batch as well (so the model could 'rehearse' what it had already learned on the first batch while also learning the new classes of the second batch). However, in practice, this technique is not ideal because it requires all of the data from the first batch to be stored in memory and retrained in the second batch. When working with larger datasets of attempting to incrementally learn more batches, this type of learning fails due to computational constraints and real-world efficacy. 



A more realistic option is psuedo-rehearsal or partial rehearsal where only some of the samples (or some compressed form of them) would be stored. The challenge with this method is deciding how to determine which samples to keep and which to discard. The strategy I hope to implement will use the mean of the samples of a given class, order the samples based on their importance to determining the overall mean, and only keep a certain number of the most important samples. Although I didn't finish this model today, I am looking forward to figuring out the rest of it next week.

Comments

Popular posts from this blog

Day 29 (8/15/19): Final Day Before Presentations

Most of today was also spent practicing and editing my presentation to make it as professional as I can. I'm really looking forward to the opportunity to present my work to faculty and friends tomorrow. Here is a link to the slides for my final presentation: Novelty Detection in Streaming Learning using Neural Networks

Day 28 (7/14/19): Presentation Dry Run

In the morning, all of us interns got the chance to practice our presentations in front of each other in the auditorium. I was pretty happy with how mine went overall but the experience was definitely valuable in identifying typos or slight adjustments that should be made. Throughout the rest of the day, I tried to implement these changes and clean up a few plots that I want to include for Friday.