DMT Assignment 2

This repository is for the assignment 2 of Data Mining Technique at Vrije Universiteit (VU) Amsterdam.

Competition

The competition for this assignment is held on Kaggle platform. Click here for detail. Besides, the original competition can be found on Kaggle as well by clicking this.

Dataset

The dataset is available on here.

Metric

The evaluation metric for this competition is Normalized Discounted Cumulative Gain. See https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG for more details.

Exploratory Data Analysis (EDA)

1. Overview of dataset

View the overall situation of the data set by functions like head(), tail(), describe() and info(). Those information consists of mean value, data type, data size, non-null count and so on.

2. Missing data and anomalies

Features are classified into date, category, numerical, and text features. Check the missing rate, number of categories and outliers of each dimension feature.

3. Normalize

group by prop_id

众数平均数来填充

4. Correlation analysis and feature selection

Learn to rank

浅层模型的代表有LR(逻辑回归)、FM

RankNet

特征工程不足: 如果特征工程不足，即未能提取出对排序任务有意义的特征，那么简单的线性模型可能更容易理解和适应数据

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
docs		docs
models		models
.gitignore		.gitignore
EDA.ipynb		EDA.ipynb
LambdaMart.ipynb		LambdaMart.ipynb
Lightgbm.ipynb		Lightgbm.ipynb
README.md		README.md
evaluator.py		evaluator.py
predict.ipynb		predict.ipynb
predict_lambda_mart.ipynb		predict_lambda_mart.ipynb
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
xgb.ipynb		xgb.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DMT Assignment 2

Competition

Dataset

Metric

Exploratory Data Analysis (EDA)

1. Overview of dataset

2. Missing data and anomalies

3. Normalize

4. Correlation analysis and feature selection

Learn to rank

RankNet

About

Releases

Packages

Contributors 3

Languages

montpelllier/VU-DMT-A2

Folders and files

Latest commit

History

Repository files navigation

DMT Assignment 2

Competition

Dataset

Metric

Exploratory Data Analysis (EDA)

1. Overview of dataset

2. Missing data and anomalies

3. Normalize

4. Correlation analysis and feature selection

Learn to rank

RankNet

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages