Skip to main content

Posts

Showing posts from August, 2019

Day 29 (8/15/19): Final Day Before Presentations

Most of today was also spent practicing and editing my presentation to make it as professional as I can. I'm really looking forward to the opportunity to present my work to faculty and friends tomorrow. Here is a link to the slides for my final presentation: Novelty Detection in Streaming Learning using Neural Networks

Day 28 (7/14/19): Presentation Dry Run

In the morning, all of us interns got the chance to practice our presentations in front of each other in the auditorium. I was pretty happy with how mine went overall but the experience was definitely valuable in identifying typos or slight adjustments that should be made. Throughout the rest of the day, I tried to implement these changes and clean up a few plots that I want to include for Friday.

Day 27 (8/13/19): Improving Presentation Plots

Today I practiced my presentation more and also added better visual graphs to better understand my results. Now, the line graphs show the results after each batch of training so you can see the trend in accuracy and OOD detection over time. Lastly, I added a bar chart at the end of the presentation to summarize my overall results in addition to the spider chart.

Day 26 (8/12/19): Presentation Revisions

Today was very useful for making revisions and edits to my presentation. I ran through it in front of my lab this morning and got lots of helpful feedback as to how to make it more accessible to a general audience (eliminating jargon). Every day I am becoming more and more confident with the talk, and I'm looking forward to presenting on Friday! Furthermore, I learned today that I will be able to get my RIT computer account/email to stay active for a few months after the internship ends. This will allow me to continue communicating with the lab via Slack and help review and write a research paper including some of the work I have pursued over the past six weeks. We hope to submit this paper to get it published at a conference in the fall (possible AAAI).

Day 25 (8/9/19): Finishing Presentation

Today I made a lot of progress finishing up my presentation. I feel like we developed an interesting story to tell around the data we collected from the experiments, and I am excited to get a chance to share my results. Much of the beginning of the presentation is spent explaining high level concepts such as deep learning and machine learning so I will have a better idea of what I will need to include after my meeting with Joe and Amy on Monday. I will continue to keep practicing my presentation over the weekend and possible include more results from iCaRL and MLP w/ EWC models if I can get them trained. Below I have included a visualization of one of the most important results from my project. Notice how the SLDA w/ Mahalanobis model outperforms the other models in accuracy and OOD recognition combined (the more area a model has in the spider plot, the better it performed overall).

Day 24 (8/8/19): Multilayer Perceptron Experiment

I continued gathering more results for my presentation today, and the data table is coming along nicely. We are able to see a significant trend that using Mahalanobis instead of Baseline Thresholding recovers much of the OOD recognition that is lost with streaming or incremental models. The SLDA model appears to be a lightweight, accurate streaming model which can be paired with Mahalanobis to be useful as an embedded agent in the real world. For the purposes of demonstrating catastrophic forgetting, I ran five experiments and averaged the results for a simple incrementally trained MLP. Obviously, the model failed miserably and was achieving only about 1% of the accuracy of the offline model. Including this is only to show how other forms of streaming and incremental models are necessary to develop lifelong learning agents. A diagram of a simple multilayer perceptron.

Day 23 (8/7/19): Averaging SLDA Results

Today I worked a lot on my presentation and ran a few more experiments to include. So far we have averaged data for offline, full rehearsal, and SLDA models. We were hoping to test the EWC model today but didn't get the chance due to some bugs in our code. Hopefully, that will be ready for my presentation next week. Here is a preview of the title of the presentation:

Day 22 (8/6/19): Streaming Linear Discriminant Analysis

Today I tested the previously trained models using the Stanford dogs dataset as the inter dataset evaluator for OOD instead of the Oxford flowers dataset. However, as expected, the omega values for performance were pretty much the same as before and didn't make much of a difference as the datasets varied.  I also implemented a streaming linear discriminant analysis model (SLDA) which differed from the previous incrementally trained models. This model didn't perform as well in terms of accuracy however as only the last layer of the model was trained and streaming is more of a difficult task. Nevertheless, we did show that Mahalanobis can be used in a streaming paradigm to recover some OOD performance in an online setting. This is likely to be a large focus of my presentation as it has never been discussed prior. Tomorrow, I plan to implement an L2SP model with elastic weight consolidation as well as iCarl to serve as two more baselines to compare our experiments to.

Day 21 (8/5/19): Averaging Experiments

This morning I was at a basketball camp so I came into the lab around noon. Much of the day was spent waiting for some models to finish training to I worked on adding some slides to my presentation document. In the afternoon, I got back results from models that I could average together to get more accurate results. The general trend remained the same however which indicated that the Mahalanobis Intra Dataset OOD actually performed better when it was trained incrementally (albeit with full rehearsal) than when it was trained offline. I am not sure yet what the reason for this is, but I will continue to look into it. The green line denotes the intra dataset mahalanobis OOD omega for full rehearsal. Note how it consistently is above 1 even as more batches are learned incrementally.

Day 20 (9/2/19): Fixing OOD Evaluation with Equal Sample Distribution

Today I reran the experiments from yesterday with the dataloaders for the OOD performance evaluation having equal in- and out-loader sample sizes. Theoretically, this would lead to a more accurate AUROC metric. However, just glancing at a visualization of the new results, it appears that we are achieving the same interesting results as yesterday. Unsure of the underlying reason why, I hope to plot the metrics we calculated on the same set of axis to get a better representation of the results.

Day 19 (8/1/19): Analyzing the Results of Early Experiments

Today I reviewed the results of the earlier experiments I ran with my mentor and other students in the lab. The most interesting result (which we most likely will have to repeat to ensure accuracy) was that the intra dataset OOD performance for the full rehearsal model was actually higher than that of the offline model. The y-axis represents the omega value for intra datset OOD with mahalanobis. The x-axis represents the number of classes learned. Today was also the RIT Undergraduate Research Symposium which was very fun to attend. Along with a few other interns, I listened to three presentations which talked about political biases affecting article credibility, fingerprinting as a means of cybersecurity defense, and laughter detection and classification using deep learning respectively. Each talk was interesting in its own way, and I enjoyed learning about the other research being performed in a similar field to the one I am working in. Tomorrow I hope to run more experiments