Skip to content

Text-to-Music Generation with Rectified Flow Transformer

Notifications You must be signed in to change notification settings

intelligencedev/FluxMusic

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FluxMusic: Text-to-Music Generation with Rectified Flow Transformer
Official PyTorch Implementation

This repo contains PyTorch model definitions, pre-trained weights, and training/sampling code for paper Flux that plays music. It explores a simple extension of diffusion-based rectified flow Transformers for text-to-music generation. The model architecture can be seen as follows:

1. Training

You can refer to the link to build the running environment.

To launch small version in the latent space training with N GPUs on one node with pytorch DDP:

torchrun --nnodes=1 --nproc_per_node=N train.py \
--version small \
--data-path xxx \
--global_batch_size 128

More scripts of different model size can reference to scripts file direction.

2. Inference

We include a sample.py script which samples music clips according to conditions from a MusicFlux model as:

python sample.py \
--version small \
--ckpt_path /path/to/model \
--prompt_file config/example.txt

All prompts used in paper are lists in config/example.txt.

3. Download Ckpts and Data

We use VAE and Vocoder in AudioLDM2, CLAP-L, and T5-XXL. You can download in the following table directly, we also provide the training scripts in our experiments.

Note that in actual experiments, a restart experiment was performed due to machine malfunction, so there will be resume options in some scripts.

Model Url Training scripts
VAE link -
Vocoder link -
T5-XXL link -
CLAP-L link -
FluxMusic-Small link link
FluxMusic-Base link link
FluxMusic-Large link link
FluxMusic-Giant link link

The construction of training data can refer to the test.py file, showing a simple build of combing differnet datasets in json file.

Considering copyright issues, the data used in the paper needs to be downloaded by oneself. A quick download link can be found in Huggingface : ).

Acknowledgments

The codebase is based on the awesome Flux and AudioLDM2 repos.

About

Text-to-Music Generation with Rectified Flow Transformer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Shell 0.1%