Skip to main content

Posts

Day 23 (8/7/19): Averaging SLDA Results

Today I worked a lot on my presentation and ran a few more experiments to include. So far we have averaged data for offline, full rehearsal, and SLDA models. We were hoping to test the EWC model today but didn't get the chance due to some bugs in our code. Hopefully, that will be ready for my presentation next week. Here is a preview of the title of the presentation:

Day 22 (8/6/19): Streaming Linear Discriminant Analysis

Today I tested the previously trained models using the Stanford dogs dataset as the inter dataset evaluator for OOD instead of the Oxford flowers dataset. However, as expected, the omega values for performance were pretty much the same as before and didn't make much of a difference as the datasets varied.  I also implemented a streaming linear discriminant analysis model (SLDA) which differed from the previous incrementally trained models. This model didn't perform as well in terms of accuracy however as only the last layer of the model was trained and streaming is more of a difficult task. Nevertheless, we did show that Mahalanobis can be used in a streaming paradigm to recover some OOD performance in an online setting. This is likely to be a large focus of my presentation as it has never been discussed prior. Tomorrow, I plan to implement an L2SP model with elastic weight consolidation as well as iCarl to serve as two more baselines to compare our experiments to.

Day 21 (8/5/19): Averaging Experiments

This morning I was at a basketball camp so I came into the lab around noon. Much of the day was spent waiting for some models to finish training to I worked on adding some slides to my presentation document. In the afternoon, I got back results from models that I could average together to get more accurate results. The general trend remained the same however which indicated that the Mahalanobis Intra Dataset OOD actually performed better when it was trained incrementally (albeit with full rehearsal) than when it was trained offline. I am not sure yet what the reason for this is, but I will continue to look into it. The green line denotes the intra dataset mahalanobis OOD omega for full rehearsal. Note how it consistently is above 1 even as more batches are learned incrementally.

Day 20 (9/2/19): Fixing OOD Evaluation with Equal Sample Distribution

Today I reran the experiments from yesterday with the dataloaders for the OOD performance evaluation having equal in- and out-loader sample sizes. Theoretically, this would lead to a more accurate AUROC metric. However, just glancing at a visualization of the new results, it appears that we are achieving the same interesting results as yesterday. Unsure of the underlying reason why, I hope to plot the metrics we calculated on the same set of axis to get a better representation of the results.

Day 19 (8/1/19): Analyzing the Results of Early Experiments

Today I reviewed the results of the earlier experiments I ran with my mentor and other students in the lab. The most interesting result (which we most likely will have to repeat to ensure accuracy) was that the intra dataset OOD performance for the full rehearsal model was actually higher than that of the offline model. The y-axis represents the omega value for intra datset OOD with mahalanobis. The x-axis represents the number of classes learned. Today was also the RIT Undergraduate Research Symposium which was very fun to attend. Along with a few other interns, I listened to three presentations which talked about political biases affecting article credibility, fingerprinting as a means of cybersecurity defense, and laughter detection and classification using deep learning respectively. Each talk was interesting in its own way, and I enjoyed learning about the other research being performed in a similar field to the one I am working in. Tomorrow I hope to run more experiments...

Day 18 (7/31/19): Obtaining Results from Early Experiments

Today I reviewed some of the first true results for the early rounds of experiments I performed. For the offline model (intended to be used as the baseline for the calculating the omega values of incrementally learned models), the final batch of 20 classes yielded an accuracy of 81.20%, an AUROC for Gaussian Noise of .99, an AUROC for Inter datset OOD of .82, and an AUROC for Intra dataset OOD of .80. It is important to note as well that I switched the learning rate scheduler to be exponentially defined rather than decaying the learning rate by steps once it reaches 2/3 of the batch iterations. The full rehearsal model, as expected, almost performed as well as the offline model achieving an  accuracy of 78.12%, an AUROC for Gaussian Noise OOD Omega of .89, an AUROC for Inter datset OOD Omega of .92, and an AUROC for Intra dataset OOD Omega of .98. It will be interesting to see how these results compare to future models. Most likely, these less memory-intensive models will per...

Day 17 (7/30/19): Adding Bounding Boxes in Experiments

Today was very successful as we finally were able to finish debugging the script we were using for training our experiment models. Previously, I was seeing the loss function diverge to nan during the second batch and the problem was actually that I was forgetting to shuffle the data with each new batch. After cleaning up much of the script, I was able to organize the code to be more general purpose for future experiments. I ran the new code to train an offline so i could generate some baseline numbers to use for omega values for the experiments. However, I am still adjusting a few hyperparameters (namely the patience counter for early stopping to prevent overfitting) to see what will yield the highest offline accuracy with bounding boxes implemented. The full rehearsal model was fairly simple to implement after the offline model trained so I plan to start evaluating Mahalanobis OOD and more complex models tomorrow.