Skip to content

Latest commit

 

History

History
57 lines (46 loc) · 3.59 KB

File metadata and controls

57 lines (46 loc) · 3.59 KB

Traffic Violations

Python Version Code style: black Imports: isort Linting: ruff Pre-commit

Authors: Mykyta Alekseiev, Elizaveta Barysheva, Joao Melo, Thomas Schneider, Harshit Shangari and Maria Stoelben

Description

The goal of this project is to predict a binary variable using white and black box models. Subsequently, the performance and fairness of the models with respect to certain protected features will be analysed. The protected attributes that will be focused on here are gender and race. Moreover, the models' predictions will be analysed with methods for interpretability.

Data

For this project a dataset of traffic violations in Maryland, USA was selected. You can download the data here. The .arff should be placed in a data/ folder in the root of your repository.

The processed data contains 65'203 instances with 15 columns, where 5 columns are categorical and the rest binary or numeric. The target column is Citation, which is equal to 1 when a citation was given by an officer and 0 if only a warning was declared.

Setup

Create a virtual environment and install the requirements:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -e .
pre-commit install

Data Preproessing

Check out the jupyter notebooks to understand the data the preprocessing decisions.

To run the data preprocessing and get a data.csv output for the following parts, run:

python -m spacy download en_core_web_sm
python src/data_preprocessing/data_preprocessor.py

Modeling

The parameters can be changed in the config/config_modeling.py. The data is seperated into 60% training and 20% validation and testing each by default.

Run the training with mlflow tracking with the following command:

python src/modeling/main.py

Results

The model selection was performed on the validation data. Below the results are displayed for white and black box models.

Model Train AUC Val AUC Test AUC Test Accuracy Test F1 Score
XGB 0.898 0.866 0.860 0.778 0.748
Random Froest 0.873 0.849 0.843 0.764 0.728
Decision Tree 0.825 0.818 0.818 0.742 0.703
GAM 0.805 0.814 0.805 0.730 0.705
Logistic Regression 0.645 0.652 0.641 0.600 0.559
ANN 0.641 0.649 0.637 0.537 0.097

Explainability and fairness

If you are interested in our conclusions regarding how our model works and if it is fair to different protected attributes, please check within the notebooks folder the explanation and fairness subfolders, respectively.