Pre-canned solution: Credit Card Fraud Detection

Credit card fraud represents a significant problem for financial institutions, and reliable fraud detection is generally challenging. This project can be used as a template, facilitating the training of a machine learning model on a real-world credit card fraud dataset. It also employs techniques like oversampling and threshold moving to address class imbalance.

The dataset used in this project has been collected as part of a research collaboration between Worldline and the Machine Learning Group of Université Libre de Bruxelles, and the raw data can be freely downloaded from Kaggle.

The assets included in the project are:

FraudDetection.ipynb - a notebook that performs exploratory data analysis, data wrangling, hyperparameter optimisation, model training and evaluation. The notebook introduces the usecases and discusses the key techniques needed for implementing a classification model (e.g. oversampling, threshold moving etc.)
model_train.py - a training script that can be operationalised and retrain the model on demand or on schedule. The script can be used as a template. The key elements that need to be customized for other datasets are:
- load_data - data ingestion function
- feature_eng - data wrangling
- xgboost_search - more specifically, the values in params, which define the grid search scope
model_api.py - a scoring function that exposes the persisted model as Model API. The score function accepts as arguments all independent parameters of the dataset and uses the model to compute the fraud probability for the individual transaction.

Note: You need to unzip the dataset/creditcard.csv.zip file before running any of the above.

Dockerfile

This project uses a compute environment based on dominodatalab/base:Ubuntu18_DAD_Py3.7_R3.6_20200508

Add the following entries to the Dockerfile:

RUN echo "ubuntu    ALL=NOPASSWD: ALL" >> /etc/sudoers
RUN pip install --upgrade pip
RUN pip install imblearn && pip install xgboost

Model API

You can test the Model API using the following observation:

{ "data": { "V1": -0.88, "V2": 0.40, "V3": 0.73, "V4":-1.65, "V5":2.73, "V6":3.41, "V7":0.23, "V8":0.71, "V9":-0.35, "V10":-0.45, "V11":-0.16, "V12":-0.36, "V13":-0.10, "V14":-0.06, "V15":0.86, "V16":0.83, "V17":-1.28, "V18":0.14, "V19":-0.27, "V20":0.10, "V21":-0.25, "V22":-0.90, "V23":-0.22, "V24":0.98, "V25":0.27, "V26":-0.001, "V27":-0.29, "V28":-0.14, "Amount":-68.74, "Hour":5.98 } }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pre-canned solution: Credit Card Fraud Detection

Dockerfile

Model API

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dataset		dataset
model		model
FraudDetection.ipynb		FraudDetection.ipynb
README.md		README.md
model_api.py		model_api.py
model_train.py		model_train.py
requirements.txt		requirements.txt

dominodatalab/reference-project-fraud-detection

Folders and files

Latest commit

History

Repository files navigation

Pre-canned solution: Credit Card Fraud Detection

Dockerfile

Model API

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages