Prompting Language-Informed Distribution (PLID)

[ECCV 2024] Prompting Language-Informed Distribution for Compositional Zero-Shot Learning
Authors: Wentao Bao, Lichang Chen, Heng Huang, Yu Kong
Affiliations: Michigan State University, University of Maryland

This repo contains the official source code of the above ECCV2024 paper for compositional zero-shot learning (CZSL) tasks. The CZSL task aims to learn from a subset of seen state-object compositions, and recognize both seen and unseen compositions, either in a closed world where the compositional classes are assumed to be feasible, or in an open world where infeasible compositional classes are taken into account in recognition. The figure CZSL Task illustrates the CZSL task. Our method PLID, based on CLIP model, leverages large-language models (LLM) and Gaussian distributions to formulate informative and diverse prompts for text input.

Setup

conda create --name clip python=3.7
conda activate clip
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
pip3 install ftfy regex tqdm scipy pandas
pip3 install git+https://github.com/openai/CLIP.git

Alternatively, you can use pip install -r requirements.txt to install all the dependencies.

Download Dataset

1) Download raw data

We experiment with three datasets: MIT-States, UT-Zappos, and C-GQA.

sh download_data.sh

If you already have setup the datasets, you can use symlink and ensure the following paths exist: data/<dataset> where <datasets> = {'mit-states', 'ut-zappos', 'cgqa'}.

2) Download text features and Glove feasibility scores

In this Gdrive folder, we provide LLM-generated text descriptions and corresponding CLIP text features, as well as the Glove feasibility scores used in evaluation.

By default, we use OPT-1.5B as the LLM so that only the files named as opt_xxx.pkl are required to download for reproducity.
To try other LLMs (e.g., GPT-3.5, Mistral-7B), please refer to the scripts text_augment.py and compute_db_features.py in the folder exp/ for text description generation and feature extraction.
Keep folder structure unchanged for all downloaded files.

Training

cd exp/mit-states
bash train_model.sh 0

You can replace mit-states with ut-zappos or cgqa for training our model on other datasets.

Evaluation

We evaluate our models in two settings: closed-world and open-world.

cd exp/mit-states
bash eval_model.sh 0 closed

You can change the closed to open switch evaluation from closed-world to open-world.

Credits

The project uses openly available model, code, and datasets. Please see the credits.

Citation

If you find CSP helpful, please cite our paper:

@InProceedings{bao2023eccv24,
  title={Prompting Language-Informed Distribution for Compositional Zero-Shot Learning},
  author={Bao, Wentao and Chen, Lichang and Huang, Heng and Kong, Yu},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
clip_modules		clip_modules
config		config
datasets		datasets
exp		exp
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
credits.md		credits.md
download_data.sh		download_data.sh
evaluate.py		evaluate.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompting Language-Informed Distribution (PLID)

Setup

Download Dataset

1) Download raw data

2) Download text features and Glove feasibility scores

Training

Evaluation

Credits

Citation

About

Releases

Packages

Languages

License

Cogito2012/PLID

Folders and files

Latest commit

History

Repository files navigation

Prompting Language-Informed Distribution (PLID)

Setup

Download Dataset

1) Download raw data

2) Download text features and Glove feasibility scores

Training

Evaluation

Credits

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages