This repository is for the assignment 2 of Data Mining Technique at Vrije Universiteit (VU) Amsterdam.
The competition for this assignment is held on Kaggle platform. Click here for detail. Besides, the original competition can be found on Kaggle as well by clicking this.
The dataset is available on here.
The evaluation metric for this competition is Normalized Discounted Cumulative Gain. See https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG for more details.
View the overall situation of the data set by functions like head()
, tail()
, describe()
and info()
.
Those information consists of mean value, data type, data size, non-null count and so on.
Features are classified into date, category, numerical, and text features. Check the missing rate, number of categories and outliers of each dimension feature.
group by prop_id
众数 平均数来填充
浅层模型的代表有LR(逻辑回归)、FM
特征工程不足: 如果特征工程不足,即未能提取出对排序任务有意义的特征,那么简单的线性模型可能更容易理解和适应数据