The MIx Group, University of Birmingham
Welcome to 360+x dataset development kit repo.
This Development Toolbox is under construction 🚧.
- Code Release - 09/06/2024
- TAL Checkpoints Release - 04/07/2024
- TAL Annotations Release - 13/08/2024
- Extracted Features Extracted by 360x Pretrained Extractor Release - 13/08/2024
- Dataset Highlights
- Dataset Access
- Toolkit Structure
- Training
- Pretrained Models
- Features
- License
- Cite
- Contact
- Acknowledgments
360+x dataset introduces a unique panoptic perspective to scene understanding, differentiating itself from existing datasets, by offering multiple viewpoints and modalities, captured from a variety of scenes. Our dataset contains:
- 2,152 multi-model videos captured by 360° cameras and Spectacles cameras (8,579k frames in total)
- Capture in 17 cities across 5 countries.
- Capture in 28 Scenes from Artistic Spaces to Natural Landscapes.
- Temporal Activity Localisation Labels for 38 action instances for each video.
The dataset is fully released in HuggingFace 🤗.
Low Resolution Version | High Resolution Version |
---|---|
Dataset quchenyuan/360x_dataset_LR | Dataset quchenyuan/360x_dataset_HR |
The HuggingFace Repo also contains annotations.
- configs : Configuration files for the dataset
- libs : Libraries for the dataset
- dataset : The dataloader for the dataset
- database : The database for the dataset
- models : Models for the dataset
For using ActionFormer, you need to follow this compile guide.
For training the model, you can use the following example script:
python run/TemporalAction/train.py \
./configs/tridet/360_i3d.yaml \
--method tridet \
--modality 10011
"run/TemporalAction/configs/tridet/360_i3d.yaml" is the configuration file for training.
Method identifies the model you want to train.
Modality is the input modality for the model. These five digits represent whether the model uses panoramic video, front-view video, binocular video, audio, and direction audio respectively. For example, here "10011" means the model uses panoramic video, audio, and direction audio.
All pretrained models are available in the Huggingface Model Hub🤗.
TAL Pretrained Model | [email protected] | [email protected] | [email protected] | Download Link |
---|---|---|---|---|
ActionFormer | 27.4 | 17.0 | 6.53 | Model |
TemporalMaxer | 29.8 | 20.9 | 10.0 | Model |
TriDet | 26.98 | 19.4 | 7.21 | Model |
For evaluation, you can use the following example script:
python run/TemporalAction/eval.py \
./configs/tridet/360_i3d.yaml \
{path_to_pretrained_model.pth.tar} \
--method tridet \
--modality 10011
Extracted features extracted by 360x pretrained extractor for each modality are also released in Huggingface Dataset Hub🤗
Distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
@inproceedings{chen2024x360,
title={360+x: A Panoptic Multi-modal Scene Understanding Dataset},
author={Chen, Hao and Hou, Yuqi and Qu, Chenyuan and Testini, Irene and Hong, Xiaohan and Jiao, Jianbo},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}
You can contact us by https://mix.jianbojiao.com/contact/.
You can also email us by [email protected] or [email protected].
This README template is inspired by Best-README-Template