A dataset and Jupyter notebook for exploring Lesson 1 of the Fast.ai Deep Learning 1 course using barbies vs. women, instead of the original Kaggle Dogs and Cats dataset. This is verty small dataset, so it is difficult to get stable results. However, I can usually get 90% plus and sometimes find weights that give 96% on the validation set.
@semih suggested classifying photos of barbies vs. women: http://forums.fast.ai/t/wiki-lesson-1/9398/
barbieswomen.zip
contains training and validation data set up using the folder structure required for fast.ai.
To use this with the version of fast.ai used in the courses, it best to clone this repo and then move the notebooks into one of the course folders, then unzip the datafile and move it the train
and valid
directories into the courses data
folder.
The Barbie and Women Import
Notebook contains sample code for creating the dataset.
I created the dataset using two python scripts:
googleimagesdownload
: https://github.com/hardikvasa/google-images-download
You can install this using pip install google-images-download
make_train_valid.py
from https://github.com/prairie-guy/ai_utilities
googleimagesdownload
requires a machine with a chrome browser and the appropriate chromedriver (see the googleimagesdownload GitHub repo for instructions). Otherwise, you are limited to 100 images.
Download the images using these commands.
googleimagesdownload -k "woman" -o "barbieswomen" --format jpg --usage_rights labeled-for-reuse -l 150 --chromedriver ./chromedriver
googleimagesdownload -k "barbie" -o "barbieswomen" --format jpg --usage_rights labeled-for-reuse -l 150 --chromedriver ./chromedriver
Examine the images and remove incorrect images. I removed all paintings and images that were not clearly women or barbies. I also removed images that contained both women and barbies, since the model is forced to choose between one or the other classification.
Use imagemagick to resize images for easier uploading and processing:
cd women
convert -resize '640' *.jpg woman.jpg
You will now see your original files and new files titled woman-n.jpg
in the same directory. If you are happy with the resizing delete, the originals and convert the other directory of images.
If you are sure the resize is exactly as you need, you can also use mogrify instead of convert to resize and replace your originals:
mogrify -resize '640' *.jpg woman.jpg
Make the train and valid datasets/directory structure:
make_train_valid.py barbieswomen --train .80 --valid .20
Now compress the directory and upload to your VM. If using Paperspace through SSH, execute:
scp barbieswomen.zip paperspace@<your machine's public IP address>:./barbieswomen.zip