Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom transform is not working #25

Open
leonardo-moraes-inbev opened this issue Apr 11, 2023 · 1 comment
Open

Custom transform is not working #25

leonardo-moraes-inbev opened this issue Apr 11, 2023 · 1 comment

Comments

@leonardo-moraes-inbev
Copy link

Hello, I have tried to implement a simple transform using OneHotEncoder, but it is not working.

I tested in both ways:

from sklearn.preprocessing import OneHotEncoder

def transformer_fn():
    return OneHotEncoder()

and

from sklearn.preprocessing import OneHotEncoder

def transformer_fn():
    return OneHotEncoder

Error

2023/04/11 15:02:04 INFO mlflow.recipes.utils.execution: ingest, split: No changes. Skipping.
Run MLFlow Recipe step: transform
2023/04/11 15:02:05 INFO mlflow.recipes.step: Running step transform...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/leonardo-moraes/Git/mlflow-recipes-titanic/.venv/lib/python3.10/site-packages/mlflow/recipes/step.py", line 139, in run
    self.step_card = self._run(output_directory=output_directory)
  File "/home/leonardo-moraes/Git/mlflow-recipes-titanic/.venv/lib/python3.10/site-packages/mlflow/recipes/steps/transform.py", line 148, in _run
    train_transformed = transform_dataset(train_df)
  File "/home/leonardo-moraes/Git/mlflow-recipes-titanic/.venv/lib/python3.10/site-packages/mlflow/recipes/steps/transform.py", line 144, in transform_dataset
    transformed_features = pd.DataFrame(transformed_features, columns=columns)
  File "/home/leonardo-moraes/Git/mlflow-recipes-titanic/.venv/lib/python3.10/site-packages/pandas/core/frame.py", line 797, in __init__
    mgr = ndarray_to_mgr(
  File "/home/leonardo-moraes/Git/mlflow-recipes-titanic/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 337, in ndarray_to_mgr
    _check_values_indices_shape_match(values, index, columns)
  File "/home/leonardo-moraes/Git/mlflow-recipes-titanic/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 408, in _check_values_indices_shape_match
    raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
ValueError: Shape of passed values is (712, 1), indices imply (712, 329)
make: *** [Makefile:31: steps/transform/outputs/transformer.pkl] Error 1
@leonardo-moraes-inbev
Copy link
Author

I was able to solve that issue by: downgrading the package to mlflow==2.2.1 and using ColumnTransformer, or Pipeline, instead the transformer directly.

def transformer_fn():
    categorical_features = ["feat1", "feat2", ..., "featn"]
    return ColumnTransformer(
        transformers=[
            ("onehot", OneHotEncoder(categories="auto"), categorical_features),
        ]
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant