Targeting potential customers is an essential task for companies. It helps boost their revenue and tailor their needs to cater to the right customers. Moreover, it helps them understand why particular segments of people do not use their services. In this project, we will study ways of preprocessing a high dimensional dataset nd prepare it for analysis with machine learning algorithms. We will use the power of machine learning to segment customers from a mail-order campaign, understand their demographics, and predict potential future customers.
- Numpy
- Pandas
- Scikit-learn
- Matplotlib
- Plotly
- XGBoost
- Optuna
- Tqdm
Arvato Project Workbook.ipynb
: The main jupyter notebook with the ETL pipeline, modelling and evaluation of data.helper_function.py
: This file contains all the data cleaning functions used in the main notebook.preprocess.py
: This file comprises the complete preprocessing function along with the functions that extract important features for our work.hyperparameter_tuning.py
: This file contains the functions to tune the parameters of the machine learning model.visualizations.py
: This file contains all the functions to plot the visualizations using plotly.
For detailed explanation of the project, feel free to read the blog on Medium here.
This project is a part of the kaggle competition. The competition can be accessed here. Although, the data for this project is not available publicly and can be accessed only through Udacity's Data Scientist Nanodegree.
Current leaderboard position: leaderboard
We would like to thank Arvato Financial Services for providing us this data. Special thanks to Udacity to provide us with an interesting problem. All this work can be used without any restrictions.