360+x : A Panoptic Multi-modal Scene Understanding Dataset

The MIx Group, University of Birmingham

Explore the Project Page »

Report Bug · Request Feature

Welcome to 360+x dataset development kit repo.

Roadmap

This Development Toolbox is under construction 🚧.

Code Release - 09/06/2024
TAL Checkpoints Release - 04/07/2024
TAL Annotations Release - 13/08/2024
Extracted Features Extracted by 360x Pretrained Extractor Release - 13/08/2024

Dataset Highlights

360+x dataset introduces a unique panoptic perspective to scene understanding, differentiating itself from existing datasets, by offering multiple viewpoints and modalities, captured from a variety of scenes. Our dataset contains:

2,152 multi-model videos captured by 360° cameras and Spectacles cameras (8,579k frames in total)
Capture in 17 cities across 5 countries.
Capture in 28 Scenes from Artistic Spaces to Natural Landscapes.
Temporal Activity Localisation Labels for 38 action instances for each video.

Dataset Access

The dataset is fully released in HuggingFace 🤗.

Low Resolution Version	High Resolution Version
Dataset quchenyuan/360x_dataset_LR	Dataset quchenyuan/360x_dataset_HR

The HuggingFace Repo also contains annotations.

Toolkit Structure

configs : Configuration files for the dataset
libs : Libraries for the dataset
- dataset : The dataloader for the dataset
- database : The database for the dataset
models : Models for the dataset

Training

For using ActionFormer, you need to follow this compile guide.

For training the model, you can use the following example script:

python run/TemporalAction/train.py \
       ./configs/tridet/360_i3d.yaml \
       --method tridet \
       --modality 10011

"run/TemporalAction/configs/tridet/360_i3d.yaml" is the configuration file for training.

Method identifies the model you want to train.

Modality is the input modality for the model. These five digits represent whether the model uses panoramic video, front-view video, binocular video, audio, and direction audio respectively. For example, here "10011" means the model uses panoramic video, audio, and direction audio.

Pretrained Models

All pretrained models are available in the Huggingface Model Hub🤗.

TAL Pretrained Model	mAP@0.50	mAP@0.75	mAP@0.95	Download Link
ActionFormer	27.4	17.0	6.53	Model
TemporalMaxer	29.8	20.9	10.0	Model
TriDet	26.98	19.4	7.21	Model

For evaluation, you can use the following example script:

python run/TemporalAction/eval.py \
       ./configs/tridet/360_i3d.yaml \
       {path_to_pretrained_model.pth.tar} \
       --method tridet \
       --modality 10011

Pretrained Models

Extracted features extracted by 360x pretrained extractor for each modality are also released in Huggingface Dataset Hub🤗

License

Distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Cite

@inproceedings{chen2024x360,
  title={360+x: A Panoptic Multi-modal Scene Understanding Dataset},
  author={Chen, Hao and Hou, Yuqi and Qu, Chenyuan and Testini, Irene and Hong, Xiaohan and Jiao, Jianbo},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}

Contact

You can contact us by https://mix.jianbojiao.com/contact/.

You can also email us by mix.group.uk@gmail.com or cxq134@student.bham.ac.uk.

Acknowledgments

This README template is inspired by Best-README-Template

(back to top)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

360+x : A Panoptic Multi-modal Scene Understanding Dataset

Roadmap

Table of Contents

Dataset Highlights

Dataset Access

Toolkit Structure

Training

Pretrained Models

Pretrained Models

License

Cite

Contact

Acknowledgments

Files

README.md

Latest commit

History

README.md

File metadata and controls

360+x : A Panoptic Multi-modal Scene Understanding Dataset

Roadmap

Table of Contents

Dataset Highlights

Dataset Access

Toolkit Structure

Training

Pretrained Models

Pretrained Models

License

Cite

Contact

Acknowledgments