T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis

Yoonjin Chung*, Junwon Lee*, Juhan Nam

This repository contains the implementation of the paper, T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis, accepted in 2024 ICASSP.

In our paper, we propose T-Foley, a Temporal-event guided waveform generation model for Foley sound synthesis, which can generate high-quality audio considering both sound class and when sound should be arranged.

Setup

To get started, please prepare the codes and python environment.

Clone this repository:

$ git clone https://github.com/YoonjinXD/T-foley.git
$ cd ./T-foley

Install the required dependencies by running the following command:

# (Optional) Create a conda virtual emvironment
$ conda create -n tfoley python=3.8.0
$ conda activate tfoley
# Install dependency with pip. Choose appropriate cuda version
$ pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu118
$ pip install -r requirements.txt

Dataset

To train and evaluate our model, we used DCASE 2023 Challenge Task 7 which was constructed for Foley Sound Synthesis. To evaluate our mode, we used the subsets of VocalImitationSet and VocalSketch. These vocal imitating sets consist of vocal audios that mimick event-based or environmental sounds. Click the link above links to download the corresponding datasets.

Inference

To perform inference using our model, follow these steps:

Download the pre-trained model weights and configurations from the following link: prertrained.zip.
```
$ wget https://zenodo.org/records/10826692/files/pretrained.zip
```
Unzip and place the downloaded model weights and config json file in the ./pretrained directory.
```
$ unzip pretrained.zip
```
Run the inference script by executing the following command:
```
$ python inference.py --class_name "DogBark"
```
The class_name must be one of the class name of 2023 DCASE Task7 dataset. The list of the class name: "DogBark", "Footstep", "GunShot", "Keyboard", "MovingMotorVehicle", "Rain", "Sneeze_Cough"
The generated samples would be saved in the ./results directory.
For FAD evaluation, we utilized this toolkit: FAD tookit

Training

To train the T-Foley model, follow these steps:

Download and unzip the DCASE 2023 task 7 dataset. Due to the mismatch between the provided csv and actual data files, please make valid filelists(.txt) using the provided scripts:
```
$ wget http://zenodo.org/records/8091972/files/DCASE_2023_Challenge_Task_7_Dataset.tar.gz
$ tar -zxvf DCASE_2023_Challenge_Task_7_Dataset.tar.gz
$ sh rename_dirs.sh
$ sh make_filelist.sh
```
If you use other dataset, prepare file path list of your training data as .txt format and configure to params.py.
Run the training:
```
$ python train.py
```
This will start the training process and save the trained model weights in the logs/ directory.

To see the training on tensorboard, run:
```
$ tensorboard --logdir logs/
```

Citation

@inproceedings{t-foley,
  title={T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis},
  author={Chung, Yoonjin and Lee, Junwon and Nam, Juhan},
  booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2024},
  organization={IEEE}
}

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
dataset.py		dataset.py
inference.py		inference.py
learner.py		learner.py
make_filelist.sh		make_filelist.sh
model.py		model.py
params.py		params.py
rename_dirs.sh		rename_dirs.sh
requirements.txt		requirements.txt
sampler.py		sampler.py
sde.py		sde.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis

Setup

Dataset

Inference

Training

Citation

License

About

Releases

Packages

Contributors 2

Languages

YoonjinXD/T-FOLEY

Folders and files

Latest commit

History

Repository files navigation

T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis

Setup

Dataset

Inference

Training

Citation

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages