This is a project for predicting the hits on a web page in an IPython Notebook which demonstrates
- Understanding Data
- Exploratory Data Analysis
- Visualization
- Manipulation
- Feature Engineering
- Data Preparation
- Model Selection
- Validation
The goal of this repository is to demonstrate an insightful understanding of data using visualizations and feature engineering for the prediction model.
Quick Start: View
a static version of the notebook in the comfort of your own browser.
- Python 2.7
- Pandas
- Sklearn
- NumPy
- Seaborn
- Matplotlib
- XGBoost
To run this notebook interactively:
-
Clone this repo
$ git clone https://github.com/techedlaksh/website-hits-prediction $ cd website-hits-prediction
-
Create new virtual environment
$ sudo pip install virtualenv $ virtualenv venv $ source venv/bin/activate $ pip install -r requirements.txt ```
-
Run Notebook
$ jupyter notebook
-
Click on
final-notebook.ipynb
in the browser and enjoy! -
When you're done with notebook, close the jupyter from terminal and deactivate the virtual environment with
deactivate
.
- row_num: a nuber uniquely identifying each row.
- locale: the platform of the session.
- day_of_week: Mon-Fri, the day of the week of the session.
- hour_of_day: 00-23, the hour of the day of the session.
- agent_id: the device used for the session.
- entry_page: describes the landing page of the session.
- path_id_set: shows all the locations that were visited during the session.
- traffic_type: indicates the channel the user cane through.
- session_duration: the duration in seconds of the session.
- hits: the number of interactions with the trivago page during the session.
Use the data provided to build a model that predicts the number of hits per session, depending on the given parameters.
Predictions will be evaluated by the root mean square error.
- Importing data with pandas
- Understanding data using statistics with pandas
- Exploring Data through Visualizations with Matplotlib
- Feature Engineering
- Data Preparation for the model
- Logistic Regression
- Random Forest
- XGBoost
- LightBGM
- K-folds cross validation to valuate results locally
Note: Trained models are exported which can be re-used by importing it into your script and predicting your data with the saved model.