Same params, automl selected estimator vs lgbm classifier have different prediction result #174

hanhanwu · 2021-08-23T03:08:24Z

hanhanwu
Aug 23, 2021

Hello FLAML team,

I was using FLAML selected estimator (LGBMClassifier) to predict the testing data got 0.75 performance.
Then I was using the same params from the selected estimator to fit LGBMClassifier from lgbm library, and predict on the same testing data with the same metrics, got 0.78 performance

Do you know what may caused the difference?

Answered by sonichi

Aug 23, 2021

Mostly likely, it's because the time budget is 60s and within that budget, either cv or holdout is used to compared different models. When you retrain the model on the full training data, the model's accuracy is higher because more data are used to fit the model. To confirm it, could you check the logger's info and see whether 'retrain' appears in the log? If not, then flaml did not get a chance to retrain the model with full training data with that time budget. BTW, we'll modify the default behavior of retraining in the next version.

View full answer

sonichi · 2021-08-23T03:35:58Z

sonichi
Aug 23, 2021

Mostly likely, it's because the time budget is 60s and within that budget, either cv or holdout is used to compared different models. When you retrain the model on the full training data, the model's accuracy is higher because more data are used to fit the model. To confirm it, could you check the logger's info and see whether 'retrain' appears in the log? If not, then flaml did not get a chance to retrain the model with full training data with that time budget. BTW, we'll modify the default behavior of retraining in the next version.

3 replies

hanhanwu Aug 23, 2021
Author

Oh, this makes sense. Although 60s here is for get_output_from_log(), I was using 300s time budget for model training, and asked it to use cv. So there is no "retrain" in the log.

Make sense. Thank you very much!

In the next version, can I ask, in what kind of situation, the model will retain?

sonichi Aug 23, 2021

In the next version, when the best model trained during search does not use the full training data, we will retrain it after the search finishes. That'll be the default. We'll provide options for users to disable the retraining.

hanhanwu Aug 23, 2021
Author

Sounds great! So excited to see you guys are making all these improvements! Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same params, automl selected estimator vs lgbm classifier have different prediction result #174

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Same params, automl selected estimator vs lgbm classifier have different prediction result #174

hanhanwu Aug 23, 2021

Replies: 1 comment · 3 replies

sonichi Aug 23, 2021

hanhanwu Aug 23, 2021 Author

sonichi Aug 23, 2021

hanhanwu Aug 23, 2021 Author

hanhanwu
Aug 23, 2021

Replies: 1 comment 3 replies

sonichi
Aug 23, 2021

hanhanwu Aug 23, 2021
Author

hanhanwu Aug 23, 2021
Author