Hi! I'm Ali (PmD) PourMohammad! This Project is just a warm-up and mostly has an educational and training purpose.
data_description.txt
has information about datatrain.csv
andHouse.csv
are for train and evaluate a modeltest.csv
doesn't have outcomes and it's for evaluation on the kaggle site- and
housePricePrediction.ipynp
has codes that I've written to solve this problem
- Drop columns with high outliers
- Drop some data with outlier values
- Drop ID col
- Convert all 'NaN' Values in 14 categorical Columns to 'NOT' (For example : "No Basement" stored as "NaN")
- Drop columns which have most null values
- Convert all categorical columns to numeric:
- Find correlated columns and drop one of each
- get_dummy all remain categorical columns