Skip to content

Commit

Permalink
Maintain row order after cross join (#463)
Browse files Browse the repository at this point in the history
Fixes the failing polars tests.

By default, `polars` gives no guarantees on the resulting row order of a
join (see [here](pola-rs/polars#20725)),
meaning that our tests used to pass just by luck. This has changed since
`polars==0.19.0`, which apparently included changes that affect the row
order of our test dataframes. The PR fixes these tests by ignoring the
row order during the equality check.
  • Loading branch information
AdrianSosic authored Jan 24, 2025
2 parents 76c504e + 63fecc9 commit e2678ec
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 3 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
compatible with `strtobool`
- All arguments to `MetaRecommender.select_recommender` are now optional
- `MetaRecommender`s can now be composed of other `MetaRecommender`s
- For performance reasons, search space manipulation using `polars` is no longer
guaranteed to produce the same row order as the corresponding `pandas` operations

### Fixed
- Rare bug arising from degenerate `SubstanceParameter.comp_df` rows that caused
Expand Down
6 changes: 6 additions & 0 deletions docs/userguide/envvars.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,12 @@ changing the Python environment. To do so, you can set the environment variable
`BAYBE_DEACTIVATE_POLARS` to any truthy value accepted by
[`strtobool`](baybe.utils.boolean.strtobool).

```{admonition} Row Order
:class: caution
For performance reasons, search space manipulation using `polars` is not
guaranteed to produce the same row order as the corresponding `pandas` operations.
```

## Disk Caching
For some components, such as the
Expand Down
10 changes: 7 additions & 3 deletions tests/constraints/test_constraints_polars.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ def test_polars_product(constraints, parameters):
# Do Pandas product
df_pd = parameter_cartesian_prod_pandas(parameters)

# Assert equality of lengths before filtering
# Assert equality before filtering
assert_frame_equal(df_pl.to_pandas(), df_pd)

# Apply constraints
Expand All @@ -198,5 +198,9 @@ def test_polars_product(constraints, parameters):
_apply_constraint_filter_polars(ldf, constraints)[0].collect().to_pandas()
)

# Assert strict equality of two dataframes
assert_frame_equal(df_pl_filtered, df_pd_filtered)
# Assert order-agnostic equality of the two dataframes
cols = df_pd_filtered.columns.tolist()
assert_frame_equal(
df_pd_filtered.sort_values(cols).reset_index(drop=True),
df_pl_filtered.sort_values(cols).reset_index(drop=True),
)

0 comments on commit e2678ec

Please sign in to comment.