Nit fixes to URL-encoding of partition field names #1499

smaheshwar-pltr · 2025-01-08T17:59:47Z

Follow-up to #1457 that addresses nits on that PR.

This reverts commit 61cdd08.

pyiceberg/partitioning.py

smaheshwar-pltr · 2025-01-08T18:00:26Z

tests/integration/test_partitioning_key.py


 import pytest
 from pyspark.sql import SparkSession
 from pyspark.sql.utils import AnalysisException

 from pyiceberg.catalog import Catalog
 from pyiceberg.partitioning import PartitionField, PartitionFieldValue, PartitionKey, PartitionSpec
-from pyiceberg.schema import Schema
+from pyiceberg.schema import Schema, make_compatible_name


The changes in this file address #1457 (comment)

kevinjqliu

LGTM! minor comments

kevinjqliu · 2025-01-08T18:15:06Z

pyiceberg/partitioning.py

@@ -237,8 +237,7 @@ def partition_to_path(self, data: Record, schema: Schema) -> str:
            value_str = quote_plus(value_str, safe="")


we can collapse this too

kevinjqliu · 2025-01-09T17:43:22Z

tests/integration/test_partitioning_key.py

-            if make_compatible_name
-            else expected_partition_record
-        )
+        sanitized_record = Record(**{make_compatible_name(k): v for k, v in vars(expected_partition_record).items()})


we can either do this and run make_compatible_name for every test case
or we can set make_compatible_name as optional and only run for the specific use case

make_compatible_name: Optional[Callable[[str], str]] = None, .... if make_compatible_name: ....

which one do you prefer?

Sorry, not sure I understand. If it's always going to be pyiceberg.schema.make_compatible_name, doesn't it make more sense to be enabled by a boolean argument that we only set to True for the special field test case? I'm probably misunderstanding here again

(Aside: it's not easy to have default value for a pytest.mark.parametrize arg I think which is why I specified None each time before - and probably why there was no e.g. spark_create_table_sql_for_justification: str = None before that too)

(Aside: it's not easy to have default value for a pytest.mark.parametrize arg I think which is why I specified None each time before - and probably why there was no e.g. spark_create_table_sql_for_justification: str = None before that too)

oh i didn't know that! My comment was based on the fact that we can set a default None value.

doesn't it make more sense to be enabled by a boolean argument that we only set to True for the special field test case?

yea thats also the point i want to make. make it clear that only certain test cases requires make_compatible_name

but also, this is a nit comment, we dont necessary have to do this.

Yeah not sure. I marginally prefer leaving it as it is now - it reads more nicely, prevents Nones/Falses mostly everywhere, sanitisation is fast so I don't think it'll causes a cumulative slowdown even when done for several test cases.

kevinjqliu

i think we'd need to run make lint, due to #1507

kevinjqliu · 2025-01-10T19:34:42Z

tests/integration/test_partitioning_key.py

-            if make_compatible_name
-            else expected_partition_record
-        )
+        sanitized_record = Record(**{make_compatible_name(k): v for k, v in vars(expected_partition_record).items()})


(Aside: it's not easy to have default value for a pytest.mark.parametrize arg I think which is why I specified None each time before - and probably why there was no e.g. spark_create_table_sql_for_justification: str = None before that too)

oh i didn't know that! My comment was based on the fact that we can set a default None value.

doesn't it make more sense to be enabled by a boolean argument that we only set to True for the special field test case?

yea thats also the point i want to make. make it clear that only certain test cases requires make_compatible_name

but also, this is a nit comment, we dont necessary have to do this.

smaheshwar-pltr · 2025-01-10T20:07:40Z

mkdocs/docs/api.md

@@ -1077,6 +1077,7 @@ with table.update_schema() as update:
 with table.update_schema() as update:
    update.add_column(("details", "confirmed_by"), StringType(), "Name of the exchange")
 ```
+


(By make lint - see #1507)

kevinjqliu

LGTM

kevinjqliu · 2025-01-10T20:43:36Z

Thanks for following up on this @smaheshwar-pltr

Sreesh Maheshwar added 3 commits January 8, 2025 17:46

Revert "Add make_name_compatible suggestion so test passes"

8ba2c2f

This reverts commit 61cdd08.

Nit fixes to URL-encoding of partition field names

d303e13

Fix tests

a4bb503

smaheshwar-pltr commented Jan 8, 2025

View reviewed changes

pyiceberg/partitioning.py Show resolved Hide resolved

smaheshwar-pltr commented Jan 8, 2025

View reviewed changes

smaheshwar-pltr mentioned this pull request Jan 8, 2025

URL-encode partition field names in file locations #1457

Merged

kevinjqliu reviewed Jan 9, 2025

View reviewed changes

Collapse

312b442

kevinjqliu reviewed Jan 10, 2025

View reviewed changes

Sreesh Maheshwar added 2 commits January 10, 2025 19:59

Merge branch 'main' into url-encode-nits

0aa6442

Make lint

c75d637

smaheshwar-pltr commented Jan 10, 2025

View reviewed changes

kevinjqliu approved these changes Jan 10, 2025

View reviewed changes

kevinjqliu merged commit 19ad24e into apache:main Jan 10, 2025
8 checks passed

kevinjqliu mentioned this pull request Jan 10, 2025

[ci] fix make lint #1507

Closed

smaheshwar-pltr deleted the url-encode-nits branch January 10, 2025 21:03

Fokko mentioned this pull request Jan 10, 2025

Support Location Providers #1452

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nit fixes to URL-encoding of partition field names #1499

Nit fixes to URL-encoding of partition field names #1499

smaheshwar-pltr commented Jan 8, 2025

smaheshwar-pltr Jan 8, 2025

kevinjqliu left a comment

kevinjqliu Jan 8, 2025

kevinjqliu Jan 9, 2025

smaheshwar-pltr Jan 9, 2025

smaheshwar-pltr Jan 9, 2025 •

edited

Loading

kevinjqliu Jan 10, 2025

smaheshwar-pltr Jan 10, 2025

kevinjqliu left a comment

kevinjqliu Jan 10, 2025

smaheshwar-pltr Jan 10, 2025 •

edited

Loading

kevinjqliu left a comment

kevinjqliu commented Jan 10, 2025

		@@ -237,8 +237,7 @@ def partition_to_path(self, data: Record, schema: Schema) -> str:
		value_str = quote_plus(value_str, safe="")

Nit fixes to URL-encoding of partition field names #1499

Nit fixes to URL-encoding of partition field names #1499

Conversation

smaheshwar-pltr commented Jan 8, 2025

smaheshwar-pltr Jan 8, 2025

Choose a reason for hiding this comment

kevinjqliu left a comment

Choose a reason for hiding this comment

kevinjqliu Jan 8, 2025

Choose a reason for hiding this comment

kevinjqliu Jan 9, 2025

Choose a reason for hiding this comment

smaheshwar-pltr Jan 9, 2025

Choose a reason for hiding this comment

smaheshwar-pltr Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

kevinjqliu Jan 10, 2025

Choose a reason for hiding this comment

smaheshwar-pltr Jan 10, 2025

Choose a reason for hiding this comment

kevinjqliu left a comment

Choose a reason for hiding this comment

kevinjqliu Jan 10, 2025

Choose a reason for hiding this comment

smaheshwar-pltr Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

kevinjqliu left a comment

Choose a reason for hiding this comment

kevinjqliu commented Jan 10, 2025

smaheshwar-pltr Jan 9, 2025 •

edited

Loading

smaheshwar-pltr Jan 10, 2025 •

edited

Loading