Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uniformize all sklearn estimator parameters to those of core #321

Open
folmos-at-orange opened this issue Dec 20, 2024 · 1 comment
Open
Assignees
Labels
Priority/1-Medium To do after P0 Size/Weeks Needs some weeks (big) Status/ReadyForDev The issue is ready to be developed or to be investigated deeply Type/Feature A new feature request or an improvement of a feature

Comments

@folmos-at-orange
Copy link
Member

folmos-at-orange commented Dec 20, 2024

Description

Currently, there are various Khiops parameters that are renamed in the sklearn estimators. Renaming them to their core counterpart would simplify the code and ease the transition to the core API.

Note that these naming conventions come from very early versions of the sklearn estimators, when the core library was not compliant with the Python naming conventions.

This would enable also to eliminate almost all parameter checks in sklearn estimators, because the message checks made by core would match the parameter names. So, we can reuse them.

Questions/Ideas

  • A plan for this change is
    1. deprecate all concerned values and parameters
    2. make the changes for Khiops 11.
  • Impacted Parameters:
    • Predictors and Encoder
      • n_pairs -> max_pairs
      • n_trees -> max_trees
      • n_features -> max_constructed_variables
      • n_evaluated_features -> max_evaluated_variables
      • n_selected_features -> max_selected_variables
    • Encoder only
      • transform_type_numerical -> numerical_recoding_method
      • transform_type_categorical -> categorical_recoding_method
      • transform_type_pairs -> pairs_recoding_method
        • For these last three we should also use the values accepted by core. For example, do not accept anymore dummies and instead accept 0-1 binarization.
    • Coclustering
      • build_frequency_vars -> build_frequency_variables
      • build_distance_vars -> build_distance_variables
      • build_name_vars -> build_cluster_variable
  • n_* vs max_*: One can argue that the changes for the n_* parameters make them less sklearn compliant. But the max_* naming have the proper semantics, as they are the limit on how many pairs/trees/variables can be constructed by Khiops. Also, there are max_* parameters in the sklearn library.
  • feature vs variable: sklearn uses the "feature" semantics whereas core user "variable". For example many of the attributes in the fit state use feature in their names.
    • Two solutions that still uniformize the parameters:
      • We replace the "feature" parameters by "variable" parameters
        • This may create a light confusion with respect to the fitted estimator attributes (which are not going to change)
      • We keep the "feature" parameters as an alias for the "variable" parameters.
@folmos-at-orange folmos-at-orange added Status/Draft The issue is still not well defined Type/Feature A new feature request or an improvement of a feature Size/Weeks Needs some weeks (big) labels Dec 20, 2024
@popescu-v
Copy link
Collaborator

This work should allow us to give up on some type checks, which are done at the Core API level anyway.

@popescu-v popescu-v added Status/ReadyForDev The issue is ready to be developed or to be investigated deeply Priority/1-Medium To do after P0 and removed Status/Draft The issue is still not well defined labels Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority/1-Medium To do after P0 Size/Weeks Needs some weeks (big) Status/ReadyForDev The issue is ready to be developed or to be investigated deeply Type/Feature A new feature request or an improvement of a feature
Projects
None yet
Development

No branches or pull requests

2 participants