Skip to main content

Day 10 (7/19/19): 2-Batch Learning with Regularization and Rehearsal

Today I was given the task of trying incremental learning on the CUB200 dataset in four different ways:

1) With no special regularization 
2) With L2SP regularization where the parameters of the first batch become the initialized parameters of the second batch
3) With full rehearsal 
4) With partial rehearsal by implementing the herding method for finding a fixed buffer of k samples after training on the first batch

Using default values for the Resnet18, I first evaluated the offline learning accuracy to be around 73%. Then, I used an adjusted dataloader function to split the dataset into two batches to be trained incrementally. After setting the regularization term weights to 0, I trained the new incremental model with no special regularization and got an accuracy of 39.72%. This drop in accuracy demonstrates the phenomenon of catastrophic forgetting associated with incremental learning as this accuracy yields an Omega value of .5418 using the metric definition below.


Next, I implemented the L2SP regularization (essentially penalizes the model if the updated parameters deviate significantly from the initialized parameters) making sure the weights for the second batch were the derived weights after the first incremental batch. 

(Both this model and the full rehearsal model are still training as I write this so I will update with the accuracy and Omega values next week)

Implementing full rehearsal was a little challenging at first because I needed to figure out how to randomly incorporate all of the data from the first batch into the second batch as well (so the model could 'rehearse' what it had already learned on the first batch while also learning the new classes of the second batch). However, in practice, this technique is not ideal because it requires all of the data from the first batch to be stored in memory and retrained in the second batch. When working with larger datasets of attempting to incrementally learn more batches, this type of learning fails due to computational constraints and real-world efficacy. 



A more realistic option is psuedo-rehearsal or partial rehearsal where only some of the samples (or some compressed form of them) would be stored. The challenge with this method is deciding how to determine which samples to keep and which to discard. The strategy I hope to implement will use the mean of the samples of a given class, order the samples based on their importance to determining the overall mean, and only keep a certain number of the most important samples. Although I didn't finish this model today, I am looking forward to figuring out the rest of it next week.

Comments

Popular posts from this blog

Day 9 (7/18/19): Incrementally Learning CUB200

Today I continued my work learning about incremental learning models by testing out different strategies on the CUB200 dataset. From what I understand from reading various articles, there seem to be five different approaches to mitigating catastrophic forgetting in lifelong learning models. These are regularization methods (adding constraints to a network's weights), ensemble methods (train multiple classifiers and combine them), rehearsal methods (mix old data with data from the current session), dual-memory methods (based off the human brain, includes a fast learner and a slow learner), and sparse-coding methods (reducing the interference with previously learned representations).  All of these methods have their constraints and I don't believe it is yet clear what method (or what combination of different methods) is best. Full rehearsal obviously seems to be the most effective at making the model remember what it had previously learned but given that all training exam...

Day 24 (8/8/19): Multilayer Perceptron Experiment

I continued gathering more results for my presentation today, and the data table is coming along nicely. We are able to see a significant trend that using Mahalanobis instead of Baseline Thresholding recovers much of the OOD recognition that is lost with streaming or incremental models. The SLDA model appears to be a lightweight, accurate streaming model which can be paired with Mahalanobis to be useful as an embedded agent in the real world. For the purposes of demonstrating catastrophic forgetting, I ran five experiments and averaged the results for a simple incrementally trained MLP. Obviously, the model failed miserably and was achieving only about 1% of the accuracy of the offline model. Including this is only to show how other forms of streaming and incremental models are necessary to develop lifelong learning agents. A diagram of a simple multilayer perceptron.