Project 1 in the R languange: Predictions whether a user will download an app after clicking a mobile app advertisement
The objective of this project is to predict whether a user will download an app after clicking a mobile app advertisement. The datasets are from Kaggle, click here to see. This project is part of the Data Science course formation of Data Science Academy from Brazil.
The solution to this problem was divided into four parts. The first part deals with the data munging and the testing of many machine learning models using the train_sample.csv file and testing with 1E+07 rows of the train.csv. The data of this file was used as the test dataset because the provided dataset did not include the target variable. The second part of the solution got the main tidying lines of part one to tidy the full training dataset, nominated train.csv. In the third part, the tidying training dataset was taken with the best model acquired in part one to train the model, but the number of the trees of the random forest model was reduced due to my notebook capacity. In the fourth part, the trained model was applied to the provided test dataset, test.csv. Afterward, the predicted results were matched with the click_id to produce the submission file.
A script parts are below:
-
PART 1 - Data munging and testing models
- Data fields
- Exploratory data analysis
- Models
- Model 1 - Logistic regression model
- Model 2 - Logistic regression model with the most significant variables
- Model 3 - KSVM model with rbf kernel
- Model 4 - KSVM model with rbf kernel and the most significant variables
- Model 5 - KSVM model with vanilladot Linear kernel
- Model 6 - KSVM model with vanilladot Linear kernel and the most significant variables
- Model 7 - SVM model with radial kernel
- Model 8 - SVM model with radial kernel and the most significant variables
- Model 9 - SVM model with linear kernel
- Model 10 - SVM model with linear kernel and the most significant variables
- Model 11 - Regression Tree model
- Model 12 - Regression Tree model with the most significant variables
- Model 13 - Another Regression Tree model
- Model 14 - Another Regression Tree model with the most significant variables
- Model 15 - Random Forest model
- Model 15 - Random forest model balanced by reducing the major target class
- Model 15 - Random forest model balanced by increasing the minor target class
- Model 15 - Random forest model balanced by SMOTE
- Model 15 - Random forest model balanced by ROSE