Skip to main content

Day 20 (9/2/19): Fixing OOD Evaluation with Equal Sample Distribution

Today I reran the experiments from yesterday with the dataloaders for the OOD performance evaluation having equal in- and out-loader sample sizes. Theoretically, this would lead to a more accurate AUROC metric. However, just glancing at a visualization of the new results, it appears that we are achieving the same interesting results as yesterday. Unsure of the underlying reason why, I hope to plot the metrics we calculated on the same set of axis to get a better representation of the results.


Comments

Popular posts from this blog

Day 22 (8/6/19): Streaming Linear Discriminant Analysis

Today I tested the previously trained models using the Stanford dogs dataset as the inter dataset evaluator for OOD instead of the Oxford flowers dataset. However, as expected, the omega values for performance were pretty much the same as before and didn't make much of a difference as the datasets varied.  I also implemented a streaming linear discriminant analysis model (SLDA) which differed from the previous incrementally trained models. This model didn't perform as well in terms of accuracy however as only the last layer of the model was trained and streaming is more of a difficult task. Nevertheless, we did show that Mahalanobis can be used in a streaming paradigm to recover some OOD performance in an online setting. This is likely to be a large focus of my presentation as it has never been discussed prior. Tomorrow, I plan to implement an L2SP model with elastic weight consolidation as well as iCarl to serve as two more baselines to compare our experiments to.

Day 27 (8/13/19): Improving Presentation Plots

Today I practiced my presentation more and also added better visual graphs to better understand my results. Now, the line graphs show the results after each batch of training so you can see the trend in accuracy and OOD detection over time. Lastly, I added a bar chart at the end of the presentation to summarize my overall results in addition to the spider chart.