Skip to content

Latest commit

 

History

History
191 lines (137 loc) · 6.73 KB

README.md

File metadata and controls

191 lines (137 loc) · 6.73 KB

Logo

360+x : A Panoptic Multi-modal Scene Understanding Dataset

The MIx Group, University of Birmingham


Explore the Project Page »

Report Bug · Request Feature

Welcome to 360+x dataset development kit repo.

Roadmap

This Development Toolbox is under construction 🚧.

  • Code Release - 09/06/2024
  • TAL Checkpoints Release - 04/07/2024
  • TAL Annotations Release - 13/08/2024
  • Extracted Features Extracted by 360x Pretrained Extractor Release - 13/08/2024

Table of Contents

Dataset Highlights

360+x dataset introduces a unique panoptic perspective to scene understanding, differentiating itself from existing datasets, by offering multiple viewpoints and modalities, captured from a variety of scenes. Our dataset contains:

  • 2,152 multi-model videos captured by 360° cameras and Spectacles cameras (8,579k frames in total)
  • Capture in 17 cities across 5 countries.
  • Capture in 28 Scenes from Artistic Spaces to Natural Landscapes.
  • Temporal Activity Localisation Labels for 38 action instances for each video.

Dataset Access

The dataset is fully released in HuggingFace 🤗.

Low Resolution Version High Resolution Version
Dataset quchenyuan/360x_dataset_LR Dataset quchenyuan/360x_dataset_HR

The HuggingFace Repo also contains annotations.

Toolkit Structure

  • configs : Configuration files for the dataset
  • libs : Libraries for the dataset
    • dataset : The dataloader for the dataset
    • database : The database for the dataset
  • models : Models for the dataset

Training

For using ActionFormer, you need to follow this compile guide.

For training the model, you can use the following example script:

python run/TemporalAction/train.py \
       ./configs/tridet/360_i3d.yaml \
       --method tridet \
       --modality 10011

"run/TemporalAction/configs/tridet/360_i3d.yaml" is the configuration file for training.

Method identifies the model you want to train.

Modality is the input modality for the model. These five digits represent whether the model uses panoramic video, front-view video, binocular video, audio, and direction audio respectively. For example, here "10011" means the model uses panoramic video, audio, and direction audio.

Pretrained Models

All pretrained models are available in the Huggingface Model Hub🤗.

TAL Pretrained Model [email protected] [email protected] [email protected] Download Link
ActionFormer 27.4 17.0 6.53 Model
TemporalMaxer 29.8 20.9 10.0 Model
TriDet 26.98 19.4 7.21 Model

For evaluation, you can use the following example script:

python run/TemporalAction/eval.py \
       ./configs/tridet/360_i3d.yaml \
       {path_to_pretrained_model.pth.tar} \
       --method tridet \
       --modality 10011

Pretrained Models

Extracted features extracted by 360x pretrained extractor for each modality are also released in Huggingface Dataset Hub🤗

License

Distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Cite

@inproceedings{chen2024x360,
  title={360+x: A Panoptic Multi-modal Scene Understanding Dataset},
  author={Chen, Hao and Hou, Yuqi and Qu, Chenyuan and Testini, Irene and Hong, Xiaohan and Jiao, Jianbo},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}

Contact

You can contact us by https://mix.jianbojiao.com/contact/.

You can also email us by [email protected] or [email protected].

Acknowledgments

This README template is inspired by Best-README-Template

(back to top)