Uniformize all sklearn estimator parameters to those of core #321

folmos-at-orange · 2024-12-20T13:26:54Z

Description

Currently, there are various Khiops parameters that are renamed in the sklearn estimators. Renaming them to their core counterpart would simplify the code and ease the transition to the core API.

Note that these naming conventions come from very early versions of the sklearn estimators, when the core library was not compliant with the Python naming conventions.

This would enable also to eliminate almost all parameter checks in sklearn estimators, because the message checks made by core would match the parameter names. So, we can reuse them.

Questions/Ideas

A plan for this change is
1. deprecate all concerned values and parameters
2. make the changes for Khiops 11.
Impacted Parameters:
- Predictors and Encoder
  - n_pairs -> max_pairs
  - n_trees -> max_trees
  - n_features -> max_constructed_variables
  - n_evaluated_features -> max_evaluated_variables
  - n_selected_features -> max_selected_variables
- Encoder only
  - transform_type_numerical -> numerical_recoding_method
  - transform_type_categorical -> categorical_recoding_method
  - transform_type_pairs -> pairs_recoding_method
    - For these last three we should also use the values accepted by core. For example, do not accept anymore dummies and instead accept 0-1 binarization.
- Coclustering
  - build_frequency_vars -> build_frequency_variables
  - build_distance_vars -> build_distance_variables
  - build_name_vars -> build_cluster_variable
n_* vs max_*: One can argue that the changes for the n_* parameters make them less sklearn compliant. But the max_* naming have the proper semantics, as they are the limit on how many pairs/trees/variables can be constructed by Khiops. Also, there are max_* parameters in the sklearn library.
feature vs variable: sklearn uses the "feature" semantics whereas core user "variable". For example many of the attributes in the fit state use feature in their names.
- Two solutions that still uniformize the parameters:
  - We replace the "feature" parameters by "variable" parameters
    - This may create a light confusion with respect to the fitted estimator attributes (which are not going to change)
  - We keep the "feature" parameters as an alias for the "variable" parameters.

The text was updated successfully, but these errors were encountered:

popescu-v · 2025-01-08T14:55:54Z

This work should allow us to give up on some type checks, which are done at the Core API level anyway.

folmos-at-orange added Status/Draft The issue is still not well defined Type/Feature A new feature request or an improvement of a feature Size/Weeks Needs some weeks (big) labels Dec 20, 2024

popescu-v added Status/ReadyForDev The issue is ready to be developed or to be investigated deeply Priority/1-Medium To do after P0 and removed Status/Draft The issue is still not well defined labels Jan 10, 2025

popescu-v assigned folmos-at-orange Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uniformize all sklearn estimator parameters to those of core #321

Uniformize all sklearn estimator parameters to those of core #321

folmos-at-orange commented Dec 20, 2024 •

edited

Loading

popescu-v commented Jan 8, 2025

Uniformize all sklearn estimator parameters to those of core #321

Uniformize all sklearn estimator parameters to those of core #321

Comments

folmos-at-orange commented Dec 20, 2024 • edited Loading

Description

Questions/Ideas

popescu-v commented Jan 8, 2025

folmos-at-orange commented Dec 20, 2024 •

edited

Loading