CSC2541 Course Project By Shihao Ma, Yichun Zhang, and Zilun Zhang

How Multimodal Data Improves Few Shot Learning

Implementation of course project of CSC2541 Winter 2021 Topics in Machine Learning: Neural Net Training Dynamics

Course Website : https://www.cs.toronto.edu/~rgrosse/courses/csc2541_2021/

Abstract

Requirements

CUDA Version: 11.2

CUDNN Version: 8.1.1

Python : 3.8

To install dependencies:

sudo pip3 install -r requirements.txt

Dataset

The main dataset is directly from links on the left, the text data and dataset split are following the paper on the middle, and the pickle version data we made could be downloaded on the right.

Dataset	Original Split + Multimodal Version Text Data	Multimodal data in PKL format
Cub_200_2011	Learning Deep Representations of Fine-grained Visual Descriptions	Google Drive
vgg_102_flowers	Learning Deep Representations of Fine-grained Visual Descriptions	Google Drive

The dataset directory should look like this (example of cub_200_2011):

├── pkl_cub_200_2011
    ├── data.pkl
    ├── id_sentence_encoder.pkl
    ├── sentence_id_encoder.pkl
    
├── csc2541_project
    ├── main.py
    ├── trainer.py
    ├── models.py
    |── ......

Training

To train the model(s) in the paper, run:

python3 main.py --num_cpu 8 --num_gpu 1 --dataset_root ../pkl_cub_200_2011 --task_file config.yaml --num_epoch 100 --fusion_method fc

Evaluation

To evaluate the model(s) in the paper, run:

python3 inference.py --num_cpu 8 --num_gpu 1 --test_size 600 --dataset_root ../pkl_cub_200_2011 --task_file config.yaml --ckpt_file xxx.ckpt

Results

# Default checkpoints directory is:
./saves

Multimodal Improvement

ID	Backbone	Model	Modality	Fusion Method	Accuracy
0	4-Conv	ProtoNet	Image	-	46.99
5	4-Conv	ProtoNet	Image + Text	Mean	75.52
6	4-Conv	ProtoNet	Image + Text	FC	73.41
7	4-Conv	ProtoNet	Image + Text	Attention (text guided)	78.40
8	4-Conv	ProtoNet	Image + Text	Attention (text residual)	63.6
3	ResNet12	ProtoNet	Image	-	53.65
9	ResNet12	ProtoNet	Image + Text	Mean	76.87
10	ResNet12	ProtoNet	Image + Text	FC	75.63
11	ResNet12	ProtoNet	Image + Text	Attention (text guided)	77.98
12	ResNet12	ProtoNet	Image + Text	Attention (text residual)	67.08
2	4-Conv	MAML	Image	-	49.75
13	4-Conv	MAML	Image + Text	Mean	51.10
14	4-Conv	MAML	Image + Text	FC	53.97
15	4-Conv	MAML	Image + Text	Attention (text guided)	Fail to Converge
16	4-Conv	MAML	Image + Text	Attention (text residual)	Fail to Converge

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
README.md		README.md
config.yaml		config.yaml
config_tune.py		config_tune.py
cub_pkl_maker.py		cub_pkl_maker.py
dataset.py		dataset.py
encoders.py		encoders.py
exps.md		exps.md
fused_feature.npy		fused_feature.npy
image_feature.npy		image_feature.npy
inference.py		inference.py
main.py		main.py
models.py		models.py
requirments.txt		requirments.txt
trainer.py		trainer.py
utils.py		utils.py
visual.py		visual.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSC2541 Course Project By Shihao Ma, Yichun Zhang, and Zilun Zhang

How Multimodal Data Improves Few Shot Learning

Abstract

Requirements

Dataset

Training

Evaluation

Results

Multimodal Improvement

About

Releases

Packages

Contributors 2

Languages

zilunzhang/Will-Multi-modal-Data-Improves-Few-shot-Learning

Folders and files

Latest commit

History

Repository files navigation

CSC2541 Course Project By Shihao Ma, Yichun Zhang, and Zilun Zhang

How Multimodal Data Improves Few Shot Learning

Abstract

Requirements

Dataset

Training

Evaluation

Results

Multimodal Improvement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages