The code repository for "Bidirectional Contrastive Split Learning for Visual Question Answering" paper (AAAI24) in PyTorch. It includes the implementation of the experiments on the VQA-v2 dataset based on five SOTA VQA models.
Bidirectional Contrastive Split Learning (BiCSL) trains a global multi-modal model on the entire data distribution of decentralized clients. BiCSL employs the contrastive loss to enable a more efficient self-supervised learning of decentralized modules.
Set up libraries:
pip install -r requirements.txt
Install spacy embeddings for tokens:
python -m spacy download en_vectors_web_lg
The image features are extracted using the bottom-up-attention, with each image being represented as 2048-D features. Download the extracted features from GoogleDrive. Place the file under the folder './data/vqa/'.
Choose a VQA model from {mcan_small, mcan_large, ban_4, butd, mmnasnet, mmnasnet_large, mfb}. The detailed setting of these models can be changed from './configs/vqa'
python run.py --RUN='train' --MODEL='mcan_small' --DATASET='vqa'
If this repository is helpful for your research or you want to refer the provided results in this work, you could cite the work using the following BibTeX entry:
@article{sun2024bicsl,
author = {Yuwei Sun and
Hideya Ochiai},
title = {Bidirectional Contrastive Split Learning for Visual Question Answering},
journal = {AAAI},
year = {2024}
}