This repository contains the work and findings of Team Challengers (23) from the CSC177 Advanced Data Science and Machine Learning course, Fall 2023. The project focuses on the application and comparative analysis of various classification algorithms in data science. The repository is divided into two main parts: Phase I (Guided Dataset Analysis) and Phase II (Independent Dataset Analysis).
-
Phase I - Guided Dataset Analysis
- Application of Logistic Regression, K-Nearest Neighbors, and Decision Trees to a guided dataset.
- Data preprocessing, feature selection, and model evaluation.
-
Phase II - Independent Dataset Analysis
- Selection and analysis of an independent dataset.
- Implementation of SVC, KNN, Naive Bayes, Decision Tree, and Logistic Regression.
- Focus on parameter tuning and model optimization.
- Detailed analysis of various classification models.
- Comparative study of model performance on different datasets.
- Insights into the importance of data preprocessing and feature selection.
Feel free to fork the project and submit pull requests. For major changes, please open an issue first to discuss what you would like to change.