This repository contains the code and results presented in the above paper.
The paper (Link):
Addressing Small and Imbalanced Medical Image Datasets Using Generative Models: A Comparative Study of DDPM and PGGANs with Random and Greedy K Sampling
Authors:
Iman Khazrak, Shakhnoza Takhirova, Mostafa M. Rezaee, Mehrdad Yadollahi, Robert C. Green II, Shuteng Niu
Table of Contents
This project addresses the challenges of small and imbalanced medical image datasets by exploring two generative models: Denoising Diffusion Probabilistic Models (DDPM) and Progressive Growing Generative Adversarial Networks (PGGANs). These models are used to generate synthetic images to augment medical datasets, which improves the performance of classification algorithms.
We evaluate the impact of DDPM- and PGGAN-generated synthetic images on the performance of custom CNN, untrained VGG16, pretrained VGG16, and pretrained ResNet50 models, demonstrating significant improvements in model robustness and accuracy, especially in imbalanced scenarios.
For more details, please refer to the paper.
-
An Evaluation Framework:
A comprehensive framework to systematically evaluate and compare the quality of images produced by DDPM and PGGANs. -
High-Quality Image Generation:
Demonstrates that producing high-quality and diverse synthetic images using small medical image datasets is feasible. -
Accuracy Improvement:
Incorporating synthetic images into the training datasets significantly improves the accuracy of classification algorithms. -
Increased Robustness:
Adding synthetic images to the original datasets enhances the robustness of classification algorithms. -
Faster Convergence:
The inclusion of synthetic images accelerates the convergence of classification algorithms.
.
├── Cite us
│ └── README.md
├── Codes
│ ├── Classification Models
│ │ ├── VGG_help.py
│ │ ├── plots.py
│ │ ├── pretrained_balanced-VGG_ResNet-epo5.ipynb
│ │ ├── pretrained_imbalanced-VGG_ResNet-epo5.ipynb
│ │ ├── untrained_balanced-VGG_ResNet.ipynb
│ │ └── untrained_imbalanced-VGG_ResNet.ipynb
│ ├── DDPM
│ │ └── DDPM_Pytorch.ipynb
│ ├── FID
│ │ ├── FID.ipynb
│ │ ├── Results.txt
│ │ ├── fid.sh
│ │ ├── fid_comparison_plot.png
│ │ ├── fid_comparison_plot_full.png
│ │ └── fid_plot.ipynb
│ └── PGGANs
│ ├── ModelTrainingImages
│ │ ├── PGAN_Architecture.png
│ │ ├── PGAN_NRM_loss.png
│ │ └── PGAN_PNM_loss.png
│ ├── progan_modules.py
│ ├── train.py
│ ├── train_config_NRM200k_2024-04-11_20_17.txt
│ ├── train_config_PNM200k_2024-04-11_21_23.txt
│ ├── train_log_NRM200k_2024-04-11_20_17.txt
│ └── train_log_PNM200k_2024-04-11_21_23.txt
├── Dataset
│ ├── All_Data
│ │ ├── NORMAL
│ │ └── PNEUMONIA
│ ├── Balanced_data
│ │ ├── Greedy_K_Selection
│ │ │ ├── Mixed
│ │ │ │ ├── DDPM_Mixed
│ │ │ │ │ ├── NORMAL
│ │ │ │ │ └── PNEUMONIA
│ │ │ │ ├── PGGANS150_Mixed
│ │ │ │ │ ├── NORMAL
│ │ │ │ │ └── PNEUMONIA
│ │ │ │ └── PGGANS160_Mixed
│ │ │ │ ├── NORMAL
│ │ │ │ └── PNEUMONIA
│ │ │ ├── Original
│ │ │ │ └── selected_images
│ │ │ │ ├── NORMAL
│ │ │ │ └── PNEUMONIA
│ │ │ └── Test_greedy
│ │ │ └── Test
│ │ │ ├── NORMAL
│ │ │ └── PNEUMONIA
│ │ └── Randeom_Selection
│ │ ├── Mixed
│ │ │ ├── DDPM_Mixed
│ │ │ │ ├── NORMAL
│ │ │ │ └── PNEUMONIA
│ │ │ ├── PGGANS150_Mixed
│ │ │ │ ├── NORMAL
│ │ │ │ └── PNEUMONIA
│ │ │ └── PGGANS160_Mixed
│ │ │ ├── NORMAL
│ │ │ └── PNEUMONIA
│ │ ├── Original_Random
│ │ │ ├── NORMAL
│ │ │ └── PNEUMONIA
│ │ └── Test_random
│ │ ├── NORMAL
│ │ └── PNEUMONIA
│ └── Imbalanced_data
│ ├── Greedy_K_Selection
│ │ ├── Mixed
│ │ │ ├── DDPM_Mixed
│ │ │ │ ├── NORMAL
│ │ │ │ └── PNEUMONIA
│ │ │ ├── PGGANS150_Mixed
│ │ │ │ ├── NORMAL
│ │ │ │ └── PNEUMONIA
│ │ │ └── PGGANS160_Mixed
│ │ │ ├── NORMAL
│ │ │ └── PNEUMONIA
│ │ ├── Original
│ │ │ ├── NORMAL
│ │ │ └── PNEUMONIA
│ │ └── Test_greedy
│ │ ├── NORMAL
│ │ └── PNEUMONIA
│ └── Randeom_Selection
│ ├── Mixed
│ │ ├── DDPM_Mixed
│ │ │ ├── NORMAL
│ │ │ └── PNEUMONIA
│ │ ├── PGGANS150_Mixed
│ │ │ ├── NORMAL
│ │ │ └── PNEUMONIA
│ │ └── PGGANS160_Mixed
│ │ ├── NORMAL
│ │ └── PNEUMONIA
│ ├── Original
│ │ ├── NORMAL
│ │ └── PNEUMONIA
│ └── imbalanced_test
│ ├── NORMAL
│ └── PNEUMONIA
├── Figures
│ ├── Classification_boxplots.png
│ ├── Classification_boxplots_F1.png
│ ├── DDPM_forward.png
│ ├── Dataset.png
│ ├── FID_plot.png
│ ├── FID_table.png
│ ├── Flowchart2.png
│ ├── Logo_DDPM_X-Ray.jpg
│ ├── Normal_gallary.png
│ ├── Normal_vs_Original_ddpm_3images.png
│ ├── Pneumina_gallary.png
│ ├── Pneumonia_Original_ddpm_gans_3images.png
│ ├── README.md
│ ├── Run_results.png
│ └── VGG16_and_CNN_performance_5 runs_2.png
├── Results
│ ├── Descriptive_Statistics.xlsx
│ ├── Greedy-k_Method_Analysis.xlsx
│ ├── Model_Quality_Evaluation.xlsx
│ ├── README.md
│ └── Random_Method_Analysis.xlsx
├── .gitignore
├── Code.zip
├── DDPM_X_Ray___Paper.pdf
├── LICENSE
├── README.md
├── environment.yml
├── requirements.txt
└── tree.txt
85 directories, 64659 files
Step 1:
Please consider starring the repository to support its development.
Step 2: Fork the repository to your GitHub account by using the "Fork" option available at the top of the repository page.
Step 3:
Clone the repository by replacing your-username
with your GitHub username in the command below. Then, navigate to the project directory.
git clone https://github.com/your-username/DDPM_X-Ray.git
cd DDPM_X-Ray
Step 4:
Install Python and the required packages by following one of the methods below:
-
Method 1: Using Conda
Create a Conda environment using the
environment.yml
file:conda env create -f environment.yml conda activate DDPM_X-Ray
-
Method 2: Using venv
-
Ensure you are in the project directory
DDPM_X-Ray
. -
Create a virtual environment using
venv
:python -m venv DDPM_X-Ray
-
Activate the virtual environment:
- On Mac/Linux:
source DDPM_X-Ray\Scripts\activate
- On Windows:
DDPM_X-Ray\Scripts\activate
- On Mac/Linux:
-
Install the required packages using the
requirements.txt
file:pip install -r requirements.txt
-
-
Method 3: Without setting up an environment
-
Make sure you have
python=3.10.14
installed on your machine. -
Install the required packages using the
requirements.txt
file:pip install -r requirements.txt
-
Note:
Click on the figure to open it in a new window for a clearer and more detailed view of its content.
The dataset used in this study consists of Chest X-ray (CXR) images with two classes: NORMAL and PNEUMONIA. The dataset is structured as follows:
dataset/NORMAL
: Contains normal CXR images.dataset/PNEUMONIA
: Contains pneumonia CXR images.
-
Prepare the dataset:
from VGG_help import prepare_dataset dataset_dir = 'path/to/dataset' class_labels = ['NORMAL', 'PNEUMONIA'] X, y = prepare_dataset(dataset_dir, class_labels)
-
Train the VGG16 model using cross-validation:
from VGG_help import cv_train_vgg_model fold_metrics_df, best_model = cv_train_vgg_model(X, y)
-
Plot training history:
from VGG_help import plot_train_history plot_train_history(fold_metrics_df, 'VGG16 Training History', 'vgg16_training_history.png')
-
Load the dataset and prepare it as shown in the VGG16 training section.
-
Train the custom CNN model using cross-validation:
from CNN_Classification import fit_classification_model_cv fold_metrics_df, best_model = fit_classification_model_cv(X, y)
- Open the
DDPM_Pytorch.ipynb
notebook. - Follow the instructions to train and evaluate the DDPM model.
- Train the PGGAN model using the
train.py
script:python train.py --path path/to/dataset --trial_name trial1 --gpu_id 0
- Open the
fid_plot.ipynb
notebook. - Follow the instructions to calculate and plot the FID scores.
The results from the cross-validation and test set evaluations will provide insights into the performance improvements achieved by using synthetic images generated by DDPM and PGGANs.
- For any questions or issues, feel free to reach out via email:
- Iman Khazrak: [email protected]
- Mostafa Rezaee: [email protected]
If you find our work helpful or relevant to your research, please consider citing it. Below are the citation formats:
-
IEEE Style:
I. Khazrak, S. Takhirova, M. M. Rezaee, M. Yadollahi, R. C. Green II, and S. Niu,
"Addressing Small and Imbalanced Medical Image Datasets Using Generative Models: A Comparative Study of DDPM and PGGANs with Random and Greedy K Sampling," arXiv preprint, vol. 2412.12532, 2024. [Online]. Available: https://arxiv.org/abs/2412.12532. -
BibTeX:
@misc{khazrak2024addressingsmallimbalancedmedical, title={Addressing Small and Imbalanced Medical Image Datasets Using Generative Models: A Comparative Study of DDPM and PGGANs with Random and Greedy K Sampling}, author={Iman Khazrak and Shakhnoza Takhirova and Mostafa M. Rezaee and Mehrdad Yadollahi and Robert C. Green II and Shuteng Niu}, year={2024}, eprint={2412.12532}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.12532}, }
Thank you for your support!