GenAI Potentials to Enhance Medical Image Classification

This repository contains the code and results presented in the above paper.

The paper (Link):
Addressing Small and Imbalanced Medical Image Datasets Using Generative Models: A Comparative Study of DDPM and PGGANs with Random and Greedy K Sampling

Authors:
Iman Khazrak, Shakhnoza Takhirova, Mostafa M. Rezaee, Mehrdad Yadollahi, Robert C. Green II, Shuteng Niu

Table of Contents

1. Abstract

2. Our Contributions

3. Repository Contents

4. Installation

5. Methodology

6. Dataset

7. Code Execution Guide

          7.1. Training the VGG16 Model
          7.2. Training the Custom CNN Model
          7.3. Training the DDPM Model
          7.4. Training the PGGANs Model
          7.5. Calculating the FID Scores

8. Results

9. FAQ and Citation Guidelines

9.1. Frequently Asked Questions
9.2. Citation Guidelines

1. Abstract

This project addresses the challenges of small and imbalanced medical image datasets by exploring two generative models: Denoising Diffusion Probabilistic Models (DDPM) and Progressive Growing Generative Adversarial Networks (PGGANs). These models are used to generate synthetic images to augment medical datasets, which improves the performance of classification algorithms.

We evaluate the impact of DDPM- and PGGAN-generated synthetic images on the performance of custom CNN, untrained VGG16, pretrained VGG16, and pretrained ResNet50 models, demonstrating significant improvements in model robustness and accuracy, especially in imbalanced scenarios.

For more details, please refer to the paper.

2. Our Contributions

An Evaluation Framework:
A comprehensive framework to systematically evaluate and compare the quality of images produced by DDPM and PGGANs.
High-Quality Image Generation:
Demonstrates that producing high-quality and diverse synthetic images using small medical image datasets is feasible.
Accuracy Improvement:
Incorporating synthetic images into the training datasets significantly improves the accuracy of classification algorithms.
Increased Robustness:
Adding synthetic images to the original datasets enhances the robustness of classification algorithms.
Faster Convergence:
The inclusion of synthetic images accelerates the convergence of classification algorithms.

3. Repository Contents

.
├── Cite us
│   └── README.md
├── Codes
│   ├── Classification Models
│   │   ├── VGG_help.py
│   │   ├── plots.py
│   │   ├── pretrained_balanced-VGG_ResNet-epo5.ipynb
│   │   ├── pretrained_imbalanced-VGG_ResNet-epo5.ipynb
│   │   ├── untrained_balanced-VGG_ResNet.ipynb
│   │   └── untrained_imbalanced-VGG_ResNet.ipynb
│   ├── DDPM
│   │   └── DDPM_Pytorch.ipynb
│   ├── FID
│   │   ├── FID.ipynb
│   │   ├── Results.txt
│   │   ├── fid.sh
│   │   ├── fid_comparison_plot.png
│   │   ├── fid_comparison_plot_full.png
│   │   └── fid_plot.ipynb
│   └── PGGANs
│       ├── ModelTrainingImages
│       │   ├── PGAN_Architecture.png
│       │   ├── PGAN_NRM_loss.png
│       │   └── PGAN_PNM_loss.png
│       ├── progan_modules.py
│       ├── train.py
│       ├── train_config_NRM200k_2024-04-11_20_17.txt
│       ├── train_config_PNM200k_2024-04-11_21_23.txt
│       ├── train_log_NRM200k_2024-04-11_20_17.txt
│       └── train_log_PNM200k_2024-04-11_21_23.txt
├── Dataset
│   ├── All_Data
│   │   ├── NORMAL
│   │   └── PNEUMONIA
│   ├── Balanced_data
│   │   ├── Greedy_K_Selection
│   │   │   ├── Mixed
│   │   │   │   ├── DDPM_Mixed
│   │   │   │   │   ├── NORMAL
│   │   │   │   │   └── PNEUMONIA
│   │   │   │   ├── PGGANS150_Mixed
│   │   │   │   │   ├── NORMAL
│   │   │   │   │   └── PNEUMONIA
│   │   │   │   └── PGGANS160_Mixed
│   │   │   │       ├── NORMAL
│   │   │   │       └── PNEUMONIA
│   │   │   ├── Original
│   │   │   │   └── selected_images
│   │   │   │       ├── NORMAL
│   │   │   │       └── PNEUMONIA
│   │   │   └── Test_greedy
│   │   │       └── Test
│   │   │           ├── NORMAL
│   │   │           └── PNEUMONIA
│   │   └── Randeom_Selection
│   │       ├── Mixed
│   │       │   ├── DDPM_Mixed
│   │       │   │   ├── NORMAL
│   │       │   │   └── PNEUMONIA
│   │       │   ├── PGGANS150_Mixed
│   │       │   │   ├── NORMAL
│   │       │   │   └── PNEUMONIA
│   │       │   └── PGGANS160_Mixed
│   │       │       ├── NORMAL
│   │       │       └── PNEUMONIA
│   │       ├── Original_Random
│   │       │   ├── NORMAL
│   │       │   └── PNEUMONIA
│   │       └── Test_random
│   │           ├── NORMAL
│   │           └── PNEUMONIA
│   └── Imbalanced_data
│       ├── Greedy_K_Selection
│       │   ├── Mixed
│       │   │   ├── DDPM_Mixed
│       │   │   │   ├── NORMAL
│       │   │   │   └── PNEUMONIA
│       │   │   ├── PGGANS150_Mixed
│       │   │   │   ├── NORMAL
│       │   │   │   └── PNEUMONIA
│       │   │   └── PGGANS160_Mixed
│       │   │       ├── NORMAL
│       │   │       └── PNEUMONIA
│       │   ├── Original
│       │   │   ├── NORMAL
│       │   │   └── PNEUMONIA
│       │   └── Test_greedy
│       │       ├── NORMAL
│       │       └── PNEUMONIA
│       └── Randeom_Selection
│           ├── Mixed
│           │   ├── DDPM_Mixed
│           │   │   ├── NORMAL
│           │   │   └── PNEUMONIA
│           │   ├── PGGANS150_Mixed
│           │   │   ├── NORMAL
│           │   │   └── PNEUMONIA
│           │   └── PGGANS160_Mixed
│           │       ├── NORMAL
│           │       └── PNEUMONIA
│           ├── Original
│           │   ├── NORMAL
│           │   └── PNEUMONIA
│           └── imbalanced_test
│               ├── NORMAL
│               └── PNEUMONIA
├── Figures
│   ├── Classification_boxplots.png
│   ├── Classification_boxplots_F1.png
│   ├── DDPM_forward.png
│   ├── Dataset.png
│   ├── FID_plot.png
│   ├── FID_table.png
│   ├── Flowchart2.png
│   ├── Logo_DDPM_X-Ray.jpg
│   ├── Normal_gallary.png
│   ├── Normal_vs_Original_ddpm_3images.png
│   ├── Pneumina_gallary.png
│   ├── Pneumonia_Original_ddpm_gans_3images.png
│   ├── README.md
│   ├── Run_results.png
│   └── VGG16_and_CNN_performance_5 runs_2.png
├── Results
│   ├── Descriptive_Statistics.xlsx
│   ├── Greedy-k_Method_Analysis.xlsx
│   ├── Model_Quality_Evaluation.xlsx
│   ├── README.md
│   └── Random_Method_Analysis.xlsx
├── .gitignore
├── Code.zip
├── DDPM_X_Ray___Paper.pdf
├── LICENSE
├── README.md
├── environment.yml
├── requirements.txt
└── tree.txt

85 directories, 64659 files

4. Installation

Step 1:
Please consider starring the repository to support its development.

Step 2: Fork the repository to your GitHub account by using the "Fork" option available at the top of the repository page.

Step 3:
Clone the repository by replacing your-username with your GitHub username in the command below. Then, navigate to the project directory.

git clone https://github.com/your-username/DDPM_X-Ray.git    
cd DDPM_X-Ray

Step 4:
Install Python and the required packages by following one of the methods below:

Method 1: Using Conda

Create a Conda environment using the environment.yml file:

conda env create -f environment.yml      
conda activate DDPM_X-Ray

Method 2: Using venv
1. Ensure you are in the project directory DDPM_X-Ray.
2. Create a virtual environment using venv:
```
python -m venv DDPM_X-Ray
```
3. Activate the virtual environment:
  - On Mac/Linux:
```
source DDPM_X-Ray\Scripts\activate
```
  - On Windows:
```
DDPM_X-Ray\Scripts\activate
```
4. Install the required packages using the requirements.txt file:
```
pip install -r requirements.txt
```
Method 3: Without setting up an environment
1. Make sure you have python=3.10.14 installed on your machine.
2. Install the required packages using the requirements.txt file:
```
pip install -r requirements.txt
```

5. Methodology

Note:
Click on the figure to open it in a new window for a clearer and more detailed view of its content.

6. Dataset

The dataset used in this study consists of Chest X-ray (CXR) images with two classes: NORMAL and PNEUMONIA. The dataset is structured as follows:

dataset/NORMAL: Contains normal CXR images.
dataset/PNEUMONIA: Contains pneumonia CXR images.

7. Code Execution Guide

7.1. Training the VGG16 Model

Prepare the dataset:

from VGG_help import prepare_dataset
dataset_dir = 'path/to/dataset'
class_labels = ['NORMAL', 'PNEUMONIA']
X, y = prepare_dataset(dataset_dir, class_labels)

Train the VGG16 model using cross-validation:

from VGG_help import cv_train_vgg_model
fold_metrics_df, best_model = cv_train_vgg_model(X, y)

Plot training history:

from VGG_help import plot_train_history
plot_train_history(fold_metrics_df, 'VGG16 Training History', 'vgg16_training_history.png')

7.2. Training the Custom CNN Model

Load the dataset and prepare it as shown in the VGG16 training section.

Train the custom CNN model using cross-validation:

from CNN_Classification import fit_classification_model_cv
fold_metrics_df, best_model = fit_classification_model_cv(X, y)

7.3. Training the DDPM model

Open the DDPM_Pytorch.ipynb notebook.
Follow the instructions to train and evaluate the DDPM model.

7.4. Training the PGGANs model

Train the PGGAN model using the train.py script:

python train.py --path path/to/dataset --trial_name trial1 --gpu_id 0

7.5. Calculating the FID Scores

Open the fid_plot.ipynb notebook.
Follow the instructions to calculate and plot the FID scores.

8. Results

The results from the cross-validation and test set evaluations will provide insights into the performance improvements achieved by using synthetic images generated by DDPM and PGGANs.

9. FAQ and Citation Guidelines

Frequently Asked Questions

For any questions or issues, feel free to reach out via email:
- Iman Khazrak: [email protected]
- Mostafa Rezaee: [email protected]

Citation Guidelines

If you find our work helpful or relevant to your research, please consider citing it. Below are the citation formats:

IEEE Style:
I. Khazrak, S. Takhirova, M. M. Rezaee, M. Yadollahi, R. C. Green II, and S. Niu,
"Addressing Small and Imbalanced Medical Image Datasets Using Generative Models: A Comparative Study of DDPM and PGGANs with Random and Greedy K Sampling," arXiv preprint, vol. 2412.12532, 2024. [Online]. Available: https://arxiv.org/abs/2412.12532.

BibTeX:

@misc{khazrak2024addressingsmallimbalancedmedical,
    title={Addressing Small and Imbalanced Medical Image Datasets Using Generative Models: A Comparative Study of DDPM and PGGANs with Random and Greedy K Sampling}, 
    author={Iman Khazrak and Shakhnoza Takhirova and Mostafa M. Rezaee and Mehrdad Yadollahi and Robert C. Green II and Shuteng Niu},
    year={2024},
    eprint={2412.12532},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2412.12532}, 
}

Thank you for your support!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenAI Potentials to Enhance Medical Image Classification

1. Abstract

2. Our Contributions

3. Repository Contents

4. Installation

5. Methodology

6. Dataset

7. Code Execution Guide

7.1. Training the VGG16 Model

7.2. Training the Custom CNN Model

7.3. Training the DDPM model

7.4. Training the PGGANs model

7.5. Calculating the FID Scores

8. Results

9. FAQ and Citation Guidelines

Frequently Asked Questions

Citation Guidelines

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
Cite us		Cite us
Codes		Codes
Dataset		Dataset
Figures		Figures
Results		Results
.gitignore		.gitignore
Code.zip		Code.zip
DDPM_X_Ray___Paper.pdf		DDPM_X_Ray___Paper.pdf
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

License

Mostafa-M-Rezaee/DDPM_X-Ray

Folders and files

Latest commit

History

Repository files navigation

GenAI Potentials to Enhance Medical Image Classification

1. Abstract

2. Our Contributions

3. Repository Contents

4. Installation

5. Methodology

6. Dataset

7. Code Execution Guide

7.1. Training the VGG16 Model

7.2. Training the Custom CNN Model

7.3. Training the DDPM model

7.4. Training the PGGANs model

7.5. Calculating the FID Scores

8. Results

9. FAQ and Citation Guidelines

Frequently Asked Questions

Citation Guidelines

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages