LeFusion: Lesion-Focused Diffusion Model

LeFusion is now accepted by ICLR'25 with high reviewer scores (8888).

The top illustrates the training process of LeFusion, while the bottom shows the inference. During training, LeFusion avoids learning unnecessary background generation using a lesion-focused loss. In inference, by combining forward-diffused real backgrounds with reverse-diffused generated foregrounds, LeFusion ensures high-quality background generation. Additionally, we introduce histogram-based texture control to handle multi-peak lesions and multi-channel decomposition for multi-class lesions. (arXiv)

📑Data Preparation

We utilized the LIDC dataset, which includes 1,010 chest CT scans. From these, we extracted 2,624 pathology regions of interest (ROIs) related to lung nodules to train the LeFusion Model. The dataset is divided into 808 cases for training, containing 2,104 lung nodule ROIs, and 202 cases for testing, containing 520 lung nodule ROIs. This portion of the dataset is located in LIDC-IDRI\Pathological, with the test.txt listing the data used for testing.

Additionally, we provide 20 normal ROIs from healthy patients, representing areas where lung nodules typically appear. This data is located in LIDC-IDRI\Normal, where Image contains the healthy images, and Mask includes the corresponding masks generated by matching lung and ground truth masks, which can be used to generate lesions. You can simulate lesion generation on the Normal dataset.

Furthermore, we provide pre-generated images with lesions based on the LIDC-IDRI\Normal dataset. These images are stored in LIDC-IDRI\Demo, where Image_i represents the images generated under the control information hist_i. The pre-trained weights used to generate these images are available in the pre-trained weights mentioned below.

├── LIDC-IDRI
    ├── Pathological
    │   ├── Image
    │   ├── Mask
    │   └── test.txt
    ├── Normal
    │   ├── Image
    │   └── Mask
    └── Demo
        ├── Image
        │   ├── Image_1
        │   ├── Image_2
        │   └── Image_3
        └── Mask
            ├── Mask_1
            ├── Mask_2
            └── Mask_3

Besides, we provide the preprocessed EMIDEC dataset, which contains 57 pathology MRI scans and 43 healthy MRI scans.

├── EMIDEC
    ├── Pathological
    │   ├── images
    │   ├── labels
    └── Normal
        └── images

🔩 Installation

Create a virtual environment conda create -n lefusion python=3.10 and activate it conda activate lefusion
Download the codegit clone https://github.com/M3DV/LeFusion.git
Check if your pip version is 22.3.1. If it is not, install pip version 22.3.1 pip install pip==22.3.1
Enter the LeFusion folder cd LeFusion/LeFusion_LIDC and run pip install -r requirements.txt

💡Get Started

Download the LIDC_IDRI and EMIDEC dataset (HuggingFace🤗)

In our study, the LeFusion Model focuses on the generation of lung nodule regions. If you want to train a Diffusion Model to synthesize lung nodules, you can use the LIDC-IDRI dataset that has already been processed by us to train the LeFusion Model. Just put the LIDC-IDRI dataset to LeFusion/data.

mkdir data
cd data
mkdir LIDC
cd LIDC
wget https://huggingface.co/datasets/YuheLiuu/LeFusion_Preprocessed_Data/resolve/main/LIDC-IDRI/Pathological.tar -O Pathological.tar
tar -xvf Pathological.tar
wget https://huggingface.co/datasets/YuheLiuu/LeFusion_Preprocessed_Data/resolve/main/LIDC-IDRI/Normal.tar -O Normal.tar
tar -xvf Normal.tar
wget https://huggingface.co/datasets/YuheLiuu/LeFusion_Preprocessed_Data/resolve/main/LIDC-IDRI/Demo.tar -O Demo.tar
tar -xvf Demo.tar

Additionally, if you wish to train the LeFusion Model using the EMIDEC dataset, you can download the dataset as follow.

cd ..
mkdir EMIDEC
cd EMIDEC
wget https://huggingface.co/datasets/YuheLiuu/LeFusion_Preprocessed_Data/resolve/main/EMIDEC/Normal.tar -O Normal.tar
tar -xvf Normal.tar
wget https://huggingface.co/datasets/YuheLiuu/LeFusion_Preprocessed_Data/resolve/main/EMIDEC/Pathological.tar -O Pathological.tar
tar -xvf Pathological.tar

Download the pre-trained LeFusion Model (HuggingFace🤗)

We provide pre-trained models on the LIDC and EMIDEC datasets. This pre-trained model can be directly used for Inference if you do not want to re-train the LeFusion Model.
```
cd ../..
cd LeFusion
mkdir LeFusion_Model
cd LeFusion_Model
mkdir LIDC
cd LIDC
wget https://huggingface.co/YuheLiuu/LeFusion_Pretrained_model/resolve/main/lidc.pt -O lidc.pt
cd ..
mkdir EMIDEC
cd EMIDEC
wget https://huggingface.co/YuheLiuu/LeFusion_Pretrained_model/resolve/main/emidec.pt -O emidec.pt
```
If you have downloaded the pre-trained model, you can skip the training step and proceed directly to inference!

🔬Train LeFusion Model

Start training:

For LIDC:

chmod +x lidc_train.sh
./lidc_train.sh

Our model was trained for 50,000 steps using five 40GB A100 GPUs, taking two and a half days. However, we found that the model performs very well after 20,000 steps. Therefore, when training a model on your own, anywhere between 20,000 to 50,000 steps would yield good results. Additionally, by default, we save the weights every 1,000 steps.

For EMIDEC:

chmod +x emidec_train.sh
./emidec_train.sh

📈Inference

Start inference:

For LIDC:

chmod +x lidc_inference.sh
./lidc_inference.sh

Three folders, Image_1, Image_2, and Image_3, will be generated under the target_img_path directory, each representing images generated under the control of hist_1, hist_2, and hist_3 respectively. Similarly, three folders will be generated under the Mask directory, but unlike the Image folders, files with the same name in each of the three Mask folders contain the same mask.

For jump_length and jump_n_sample, larger values generally result in longer image generation times. We found that when these two parameters are between 2 and 10, the generated images maintain good quality. When both parameters are set to 2, it takes about 40 seconds to generate an image using a 40G A100 GPU.

For EMIDEC:

chmod +x emidec_inference.sh
./emidec_inference.sh

🔎Visualization

The first image is a healthy image from LIDC-IDRI/Normal. The second image is the corresponding generated mask, where lesions will be generated in the areas marked by the mask. Image_1, Image_2, and Image_3 are the lesions generated when the control information is set to Hist_1, Hist_2, and Hist_3, respectively.

☀️ DiffMask

The training and inference process of DiffMask is as follows.

Train:

chmod +x diffmask_inference.sh
./diffmask_inference.sh

Infernce:

chmod +x diffmask_inference.sh
./diffmask_inference.sh

Citation

@misc{zhang2024lefusioncontrollablepathologysynthesis,
      title={LeFusion: Controllable Pathology Synthesis via Lesion-Focused Diffusion Models}, 
      author={Hantao Zhang and Yuhe Liu and Jiancheng Yang and Shouhong Wan and Xinyuan Wang and Wei Peng and Pascal Fua},
      year={2024},
      eprint={2403.14066},
      archivePrefix={arXiv},
      primaryClass={eess.IV},
      url={https://arxiv.org/abs/2403.14066}, 
}

Acknowledgement

Some of our code is modified based on medicaldiffusion and RePaint, and we greatly appreciate the efforts of the respective authors for providing open-source code. We also thank DiffTumor for providing the segmentation model code.

Community Contribution: 3D Slicer Extension for LeFusion

For those who work with medical imaging and seek to bring LeFusion's inpainting model closer to real-world clinical practice, we are excited to introduce a community contribution: a 3D Slicer extension! This extension leverages our inpainting model as the backend, offering practical applications for radiologists and other medical professionals.

Special thanks to @pedr0sorio for developing this valuable tool.

ToDo List

✅ The preprocessed LIDC-IDRI dataset 🚀

✅ The LeFusion model applied to LIDC-IDRI 🚀

✅ The DiffMask model used for generating mask 🚀

✅ The preprocessed EMIDEC dataset 🚀

✅ The LeFusion model applied to EMIDEC 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LeFusion: Lesion-Focused Diffusion Model

📑Data Preparation

🔩 Installation

💡Get Started

🔬Train LeFusion Model

📈Inference

🔎Visualization

☀️ DiffMask

Citation

Acknowledgement

Community Contribution: 3D Slicer Extension for LeFusion

ToDo List

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
DiffMask		DiffMask
LeFusion		LeFusion
Segmentation		Segmentation
figures		figures
LICENSE		LICENSE
README.md		README.md
diffmask_inference.sh		diffmask_inference.sh
diffmask_train.sh		diffmask_train.sh
emidec_inference.sh		emidec_inference.sh
emidec_train.sh		emidec_train.sh
lidc_inference.sh		lidc_inference.sh
lidc_train.sh		lidc_train.sh
requirements.txt		requirements.txt

License

M3DV/LeFusion

Folders and files

Latest commit

History

Repository files navigation

LeFusion: Lesion-Focused Diffusion Model

📑Data Preparation

🔩 Installation

💡Get Started

🔬Train LeFusion Model

📈Inference

🔎Visualization

☀️ DiffMask

Citation

Acknowledgement

Community Contribution: 3D Slicer Extension for LeFusion

ToDo List

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages