Skip to main content

Day 6 (7/15/19): Transfer Learning with CUB-200

Today I started practicing transfer learning with CNNs. This process involves pretraining a CNN on a very large dataset (e.g. ImageNet, which contains 1.2 millions images with 1000 categories) and then using that pretrained model either as an initialization or a fixed feature extractor for a new task (e.g. classify images from a new dataset). This technique is actually much more widely used in practice than training a CNN from scratch because it is rare to have a dataset of sufficient size to fit your task and training a model on a dataset like ImageNet would take weeks. 


The dataset I started working with is called Caltech-UCSD Birds 200 (CUB-200) which contains images representing 200 different species of birds. First, I loaded the dataset and split the data into train and test sets. Using the transform function I implemented data augmentation, a strategy to significantly increase the diversity of data without actually collecting new data. Then, using pytorch, I loaded a pretrained Resnet18 model (a residual network with 18 layers) and trained only the weights of the fully connected layer and the linear classifier (I used a LogSoftmax). Using a Cross Entropy Loss Function and Adam Optimizer this new fine-tuned network achieved ~60% accuracy classifying the CUB-200 images.



After training, I took a sample of the test data to visualize how the model's predictions were matching up. While the network was not quite as accurate as I would have hoped, it did get very close to predicting the right species even when it was misclassified. For example, numerous instances occurred of the model predicting a Sooty Albatross when it was supposed to be a Black-footed Albatross (try to guess which is which from the two images below). Given that not all of the image data was very clear in detail (and the transformations for data augmentation sometimes cropped out parts of the bird) it is not surprising that many of the images were misclassified. 






I'm excited to start learning more about how to implement CNNs and tweak their structures and parameters to achieve better results and generalize to different datasets.

Comments

Popular posts from this blog

Day 22 (8/6/19): Streaming Linear Discriminant Analysis

Today I tested the previously trained models using the Stanford dogs dataset as the inter dataset evaluator for OOD instead of the Oxford flowers dataset. However, as expected, the omega values for performance were pretty much the same as before and didn't make much of a difference as the datasets varied.  I also implemented a streaming linear discriminant analysis model (SLDA) which differed from the previous incrementally trained models. This model didn't perform as well in terms of accuracy however as only the last layer of the model was trained and streaming is more of a difficult task. Nevertheless, we did show that Mahalanobis can be used in a streaming paradigm to recover some OOD performance in an online setting. This is likely to be a large focus of my presentation as it has never been discussed prior. Tomorrow, I plan to implement an L2SP model with elastic weight consolidation as well as iCarl to serve as two more baselines to compare our experiments to.

Day 24 (8/8/19): Multilayer Perceptron Experiment

I continued gathering more results for my presentation today, and the data table is coming along nicely. We are able to see a significant trend that using Mahalanobis instead of Baseline Thresholding recovers much of the OOD recognition that is lost with streaming or incremental models. The SLDA model appears to be a lightweight, accurate streaming model which can be paired with Mahalanobis to be useful as an embedded agent in the real world. For the purposes of demonstrating catastrophic forgetting, I ran five experiments and averaged the results for a simple incrementally trained MLP. Obviously, the model failed miserably and was achieving only about 1% of the accuracy of the offline model. Including this is only to show how other forms of streaming and incremental models are necessary to develop lifelong learning agents. A diagram of a simple multilayer perceptron.