- It is difficult to identify the high risk of heart disease where different risks like diabetes, high blood pressure, and cholesterol problems are present. In these types of scenarios, ML can help in the early diagnosis of disease.
- A random sample can be drawn from the complete dataset to avoid overfitting issues. Also, the work focuses on training the model on samples of data obtained from the UCI Machine Learning repository.
- So, the aim of this study is to improve the prediction of heart disease.
The Intel® oneAPI Base Toolkit (Base Kit) is a core set of tools and libraries for developing high-performance, data-centric applications across diverse architectures. It features an industry-leading C++ compiler that implements SYCL*, an evolution of C++ for heterogeneous computing.
Domain-specific libraries and the Intel® Distribution for Python* provide drop-in acceleration across relevant architectures. Enhanced profiling, design assistance, and debug tools complete the kit.
*Age
*Sex
*Chest pain type
*BP
*Cholestrol
*FBS over 120
*EKG results
*Max HR
*Exercising angina
*ST depression
*Slope of ST
*Number of vessels fluro
*Thallium
*Health care and diseases comprise of different outcomes including binary i.e., 0 or 1 which means 0 as ‘death’ or any other events, and 1 as continuous outcomes i.e., staying duration. Other outcomes include ordinal ones such as tumor grading, life quality, survival outcomes i.e., any clinical trials or survival from cancer, etc.
*ML provides versatility in analyzing these data and providing some more precise results.
ML is an effective way to optimize the prediction of heart disease and the related effects.
-
A good understanding of the required parameters for the diagnosis of the disease can be highly helpful in making precise and accurate predictions.
-
Cardiovascular (CV) disease research and treatment coupled with some high-performance tools for analysis can improve the knowledge about the domain.
- The first step is gathering data which is represented as ‘acquisition’. This included evaluating physical conditions and considering the numeric data by converting the samples which will be utilized by the computer to manipulate.
- The second step is ‘pre-processing’ where we tackled issues in the data such as missing values, outlier detection, and redundancy removal to clean the dataset. Predictive analysis has been performed for the uniform environment which also takes the application towards EDA.
- The third step is ‘integration’ where libraries and different subsets were combined by importing independent modules in python and merging them to perform necessary experiments.
- The fourth step is ‘analysis’ where EDA was done to understand the relationship between different attributes of data.
- The fifth step was ‘intervention’ to get into the decision-making policies i.e., search strategy for understanding previous experimental studies to determine when it becomes efficient to utilize models for real-world problems effectively.
- The sixth step was’application’ of ML algorithms in making the predictions. In this work, four machine learning models were utilized i.e., SVM, Naïve Bayes, Logistic Regression, and XGBoost.
![image](https://github.com/SubarnaChinnadurai/HEART/assets/117588706/fb52ed9c-ffb9-4414-a8d8-cb3866c6a81e)
*Inteface for considering symptoms
*Prediction following interface
*Attributes distribution of values
*Box plots to represent the second and third quartiles to indicate the median value
*Training and test scores of machine learning classifiers
-
Evaluation measures for different classifiers
-
Evaluated results for machine learning classifiers
-
Receiver operating characteristic (ROC) for different classifiers
-
Area under the curve (AUC) for the performance of the classification model.
The goal is to study and merge more datasets in order to create a more relevant dataset that encompasses a broad range of population types.
The feature selection can be used to generate more relevant features and effective results for the prediction of heart disease.