Skip to content

Commit

Permalink
Adds stub of NarwhalsAdapter (#998)
Browse files Browse the repository at this point in the history
The purpose of this adapter is to showcase how you can write
transforms that are agnostic of the dataframe type.

Assumptions for this plugin:

* you can only have one "backend"; you can't mix & match. That means you can't load some in pandas, and some in polars I don't think -- this is a narwhals limitation.
* This change uses the narwhals decorator. This assumes that non pandas/polars stuff would be left alone by it. If not, we could just skip adding it if we don't detect a type.
* This makes the user choose what the return result builder is and then requires them to nest it in the narwhals result builder that just converts the outputs to the backend that is being used.
* I think this is a good enough integration to get out -- we'll likely tweak/add more functionality as feedback comes in.



Squashed commits:

* Adds stub of NarwhalsAdapter

Assumptions narwhals has (I believe):
1. you can only have one "backend"; you can't mix & match.
That means you can't load some in pandas, and some in polars
I don't think.
2. This change uses the narwhals decorator. This assumes
that non pandas/polars stuff would be left alone by it.
If not, we could just skip adding it if we don't detect
a type.

Otherwise probably need a better example from narhwals.

* Adds one attempt at a result builder

This makes the user choose what the return type
is and then requires them to nest it in the
narwhals result builder that just converts
the outputs to the backend that is being used.

* Adds narwhals plugin v1

First version of narwhals support.

* Completes Narwhals example

Adds README and notebook so that people can run
this example easily.

Also adds circleci tests.

* Adds missing dependency

* Fixes polars test for polars 1.0+

* Adds narwhals to integration docs
  • Loading branch information
skrawcz authored Jul 2, 2024
1 parent d12f4dc commit 096d210
Show file tree
Hide file tree
Showing 14 changed files with 619 additions and 0 deletions.
7 changes: 7 additions & 0 deletions .ci/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,13 @@ if [[ ${TASK} == "vaex" ]]; then
exit 0
fi

if [[ ${TASK} == "narwhals" ]]; then
pip install -e .
pip install polars pandas narwhals
pytest plugin_tests/h_narwhals
exit 0
fi

if [[ ${TASK} == "tests" ]]; then
pip install .
pytest \
Expand Down
18 changes: 18 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -155,3 +155,21 @@ workflows:
name: integrations-py312
python-version: '3.12'
task: integrations
- test:
requires:
- check_for_changes
name: narwhals-py39
python-version: '3.9'
task: narwhals
- test:
requires:
- check_for_changes
name: narwhals-py310
python-version: '3.10'
task: narwhals
- test:
requires:
- check_for_changes
name: narwhals-py311
python-version: '3.11'
task: narwhals
1 change: 1 addition & 0 deletions docs/integrations/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,4 @@ This section showcases how Hamilton integrates with popular frameworks.
Slack <https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/slack>
Spark <https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/spark>
Vaex <https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/vaex>
Narwhals <https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/narwhals>
28 changes: 28 additions & 0 deletions examples/narwhals/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Narwhals

[Narwhals](https://narwhals-dev.github.io/narwhals/) is a library that aims
to unify expression across dataframe libraries. It is meant to be lightweight
and focuses on python first dataframe libraries.

This examples shows how you can write dataframe agnostic code
and then load up a pandas or polars data to then use with it.

## Running the example

You can run the example doing:

```bash
# cd examples/narwhals/
python example.py
```
This will run both variants one after the other.

or running the notebook:

```bash
# cd examples/narwhals
jupyter notebook # pip install jupyter if you don't have it
```
Or you can open up the notebook in Colab:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dagworks-inc/hamilton/blob/main/examples/narwhals/notebook.ipynb)
Binary file added examples/narwhals/example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
70 changes: 70 additions & 0 deletions examples/narwhals/example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
import narwhals as nw
import pandas as pd
import polars as pl

from hamilton.function_modifiers import config, tag


@config.when(load="pandas")
def df__pandas() -> nw.DataFrame:
return pd.DataFrame({"a": [1, 1, 2, 2, 3], "b": [4, 5, 6, 7, 8]})


@config.when(load="pandas")
def series__pandas() -> nw.Series:
return pd.Series([1, 3])


@config.when(load="polars")
def df__polars() -> nw.DataFrame:
return pl.DataFrame({"a": [1, 1, 2, 2, 3], "b": [4, 5, 6, 7, 8]})


@config.when(load="polars")
def series__polars() -> nw.Series:
return pl.Series([1, 3])


@tag(nw_kwargs=["eager_only"])
def example1(df: nw.DataFrame, series: nw.Series, col_name: str) -> int:
return df.filter(nw.col(col_name).is_in(series.to_numpy())).shape[0]


def group_by_mean(df: nw.DataFrame) -> nw.DataFrame:
return df.group_by("a").agg(nw.col("b").mean()).sort("a")


if __name__ == "__main__":
import __main__ as example

from hamilton import base, driver
from hamilton.plugins import h_narwhals, h_polars

# pandas
dr = (
driver.Builder()
.with_config({"load": "pandas"})
.with_modules(example)
.with_adapters(
h_narwhals.NarwhalsAdapter(),
h_narwhals.NarwhalsDataFrameResultBuilder(base.PandasDataFrameResult()),
)
.build()
)
r = dr.execute([example.group_by_mean, example.example1], inputs={"col_name": "a"})
print(r)

# polars
dr = (
driver.Builder()
.with_config({"load": "polars"})
.with_modules(example)
.with_adapters(
h_narwhals.NarwhalsAdapter(),
h_narwhals.NarwhalsDataFrameResultBuilder(h_polars.PolarsDataFrameResult()),
)
.build()
)
r = dr.execute([example.group_by_mean, example.example1], inputs={"col_name": "a"})
print(r)
dr.display_all_functions("example.png")
Loading

0 comments on commit 096d210

Please sign in to comment.