This project focuses on predicting heart disease using machine learning models. It includes data cleaning, exploratory data analysis (EDA), feature importance analysis, model selection, parameter tuning, and deployment via a web service. The solution is designed for effective containerization and deployment.
- π Project Overview
- π Directory Structure
- β Problem Description
- βοΈ Installation and Setup
βΆοΈ Running the Project- π Local Model Deployment
- π³ Docker Containerization
- βοΈ AWS Elastic Beanstalk Deployment
- π§ͺ Testing the Application
- π€ Contributing
- π License
Heart disease remains one of the leading causes of death globally. This project leverages machine learning techniques to predict the likelihood of heart disease based on patient data.
Key features include:
- π§Ή Data preparation and cleaning.
- π Exploratory Data Analysis (EDA) to uncover patterns and relationships.
- π§ Model training, evaluation, and parameter optimization.
- π Deployment via Flask and containerization using Docker for scalable web service hosting.
- βοΈ Cloud deployment using AWS Elastic Beanstalk.
Heart-Disease-App/
β
βββ data/ # Contains the dataset
βββ images/ # Illustrations and deployment screenshots
βββ midterm_project.ipynb # Jupyter Notebook with data preparation, analysis and model planning
βββ train.py # Script for training and saving the model
βββ predict.py # Web service for serving the model
βββ no_app_predict_test.py # Test script for direct model testing
βββ predict_test.py # Script for testing the web service
βββ predict_test_cloud.py # Script for testing the app deployed on AWS Elastic Beanstalk
βββ Pipfile # Dependencies for pipenv
βββ Pipfile.lock # Locked versions of dependencies
βββ Dockerfile # Docker configuration for containerization
βββ LICENSE.txt # Project MIT License
βββ README.md # Project description and instructions
Cardiovascular diseases are a major global health challenge. This project aims to use machine learning to:
β οΈ Identify individuals at risk of heart disease.- π©Ί Assist healthcare professionals in making informed decisions.
- π Provide an easily deployable service for real-world applications.
The dataset combines five publicly available heart disease datasets, with a total of 2181 records:
- π Heart Attack Analysis & Prediction Dataset: 304 reccords from Rahman, 2021
- π Heart Disease Dataset: 1,026 records from Lapp, 2019
- π Heart Attack Prediction (Dataset 3): 295 records from Damarla, 2020
- π Heart Attack Prediction (Dataset 4): 271 records from Anand, 2018
- π Heart CSV Dataset: 290 records from Nandal, 2022
Merging these datasets provides a more robust foundation for training machine learning models aimed at early detection and prevention of heart disease. The resulting dataset contains anonymized patient records with various features, such as age, cholesterol levels, and blood pressure, which are crucial for predicting heart attack and stroke risks, covering both medical and demographic factors.
- age: age of the patient [years: Numeric]
- sex: gender of the patient [1: Male, 0: Female]
- cp: chest pain type [0: Typical Angina, 1: Atypical Angina, 2: Non-Anginal Pain, 3: Asymptomatic]
- trestbps: resting blood pressure [mm Hg: Numeric]
- chol: serum cholesterol level [mg/dl: Numeric]
- fbs: fasting blood sugar [1: if fasting blood sugar > 120 mg/dl, 0: otherwise]
- restecg: resting electrocardiographic results [0: Normal, 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), 2: showing probable or definite left ventricular hypertrophy by Estes' criteria]
- thalach: maximum heart rate achieved [Numeric value between 60 and 202]
- exang: exercise-induced angina [1: Yes, 0: No]
- oldpeak: ST depression induced by exercise relative to rest [Numeric value measured in depression]
- slope: slope of the peak exercise ST segment [0: Upsloping, 1: Flat, 2: Downsloping]
- ca: number (0-3) of major vessels (arteries, veins, and capillaries) colored by fluoroscopy [0, 1, 2, 3]
- thal: Thalassemia types [1: Normal, 2: Fixed defect, 3: Reversible defect]
- target: outcome variable for heart attack risk [1: disease or more chance of heart attack, 0: normal or less chance of heart attack]
Requirements: Python 3.11, Ubuntu with WSL 2.0
git clone https://github.com/maxim-eyengue/Heart-Disease-App.git
cd Heart-Disease-App
Use pipenv
to manage dependencies:
pip install pipenv
pipenv install flask scikit-learn==1.5.1 gunicorn
pipenv shell
NB: You can also directly use:
pipenv run `add the command to execute`
Train the model and save it as a binary file:
python train.py
Start the Flask application:
gunicorn --bind 0.0.0.0:9696 predict:app
Send a test request using predict_test.py
:
python predict_test.py
The model is deployed using Flask in an environment created with pipenv.
Serve the app using Flask and test its functionality:
python predict_test.py
You can now transition to containerized deployment with Docker.
Create a Docker image for the project:
docker build -t heart-prediction-app .
Run the image and map the port:
docker run -it --rm -p 9696:9696 heart-prediction-app
Send a request to the service using:
python predict_test.py
Install the AWS Elastic Beanstalk CLI in your environment:
pipenv install awsebcli --dev
After activating the environment with pipenv shell
, initialize the project for Elastic Beanstalk:
eb init -p docker -r us-east-1 heart-prediction-app
If errors occur, use:
eb init -p "Docker running on 64bit Amazon Linux 2" heart-prediction-app -r us-east-1
Provide your AWS credentials when prompted. These can be generated from the AWS IAM service.
NB: You can follow Alexey's tutorial to create an account on AWS.
Deploy the application locally:
eb local run --port 9696
Use python predict_test.py
to send a request to the locally running app for testing.
Deploy the application to Elastic Beanstalk:
eb create heart-prediction-app-env --enable-spot
After deployment, the app was accessible at the Elastic Beanstalk URL.
To test the deployment, we used:
python predict_test_cloud.py
To terminate the Elastic Beanstalk environment:
eb terminate heart-prediction-app-env
Note that we tested the model in the following ways:
i. π¬ Without Flask: Directly test the model using:
python no_app_predict_test.py
ii. π Flask Web Service, Docker & Local EB: Send requests to the Flask app, or to the docker image, or when running Elastic Beanstalk locally:
python predict_test.py
iii. βοΈ Cloud Deployment: Test the application on AWS:
python predict_test_cloud.py
We welcome contributions to enhance this project. Please:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Submit a pull request with a detailed description of your changes.
This project is licensed under the MIT License.