XGBoost has various tunable parameters but the three most important ones are:
eta
(default=0.3)- It is also called
learning_rate
and is used to prevent overfitting by regularizing the weights of new features in each boosting step. range: [0, 1]
- It is also called
max_depth
(default=6)- Maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit. range: [0, inf]
min_child_weight
(default=1)- Minimum number of samples in leaf node. range: [0, inf]
For XGBoost models, there are other ways of finding the best parameters as well but the one we implement in the notebook follows the sequence of:
- First find the best value for
eta
- Second, find the best value for
max_depth
- Third, find the best value for
min_child_weight
Other useful parameter are:
subsample
(default=1)- Subsample ratio of the training instances. Setting it to 0.5 means that model would randomly sample half of the training data prior to growing trees. range: (0, 1]
colsample_bytree
(default=1)- This is similar to random forest, where each tree is made with the subset of randomly choosen features.
lambda
(default=1)- Also called
reg_lambda
. L2 regularization term on weights. Increasing this value will make model more conservative.
- Also called
alpha
(default=0)- Also called
reg_alpha
. L1 regularization term on weights. Increasing this value will make model more conservative.
- Also called
Add notes from the video (PRs are welcome)
The notes are written by the community. If you see an error here, please create a PR with a fix. |