Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nit fixes to URL-encoding of partition field names #1499

Merged
merged 6 commits into from
Jan 10, 2025

Conversation

smaheshwar-pltr
Copy link
Contributor

Follow-up to #1457 that addresses nits on that PR.


import pytest
from pyspark.sql import SparkSession
from pyspark.sql.utils import AnalysisException

from pyiceberg.catalog import Catalog
from pyiceberg.partitioning import PartitionField, PartitionFieldValue, PartitionKey, PartitionSpec
from pyiceberg.schema import Schema
from pyiceberg.schema import Schema, make_compatible_name
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file address #1457 (comment)

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! minor comments

@@ -237,8 +237,7 @@ def partition_to_path(self, data: Record, schema: Schema) -> str:
value_str = quote_plus(value_str, safe="")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can collapse this too

if make_compatible_name
else expected_partition_record
)
sanitized_record = Record(**{make_compatible_name(k): v for k, v in vars(expected_partition_record).items()})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can either do this and run make_compatible_name for every test case
or we can set make_compatible_name as optional and only run for the specific use case

    make_compatible_name: Optional[Callable[[str], str]] = None,
....

    if make_compatible_name:
        ....

which one do you prefer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not sure I understand. If it's always going to be pyiceberg.schema.make_compatible_name, doesn't it make more sense to be enabled by a boolean argument that we only set to True for the special field test case? I'm probably misunderstanding here again

Copy link
Contributor Author

@smaheshwar-pltr smaheshwar-pltr Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Aside: it's not easy to have default value for a pytest.mark.parametrize arg I think which is why I specified None each time before - and probably why there was no e.g. spark_create_table_sql_for_justification: str = None before that too)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Aside: it's not easy to have default value for a pytest.mark.parametrize arg I think which is why I specified None each time before - and probably why there was no e.g. spark_create_table_sql_for_justification: str = None before that too)

oh i didn't know that! My comment was based on the fact that we can set a default None value.

doesn't it make more sense to be enabled by a boolean argument that we only set to True for the special field test case?

yea thats also the point i want to make. make it clear that only certain test cases requires make_compatible_name

but also, this is a nit comment, we dont necessary have to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not sure. I marginally prefer leaving it as it is now - it reads more nicely, prevents Nones/Falses mostly everywhere, sanitisation is fast so I don't think it'll causes a cumulative slowdown even when done for several test cases.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we'd need to run make lint, due to #1507

if make_compatible_name
else expected_partition_record
)
sanitized_record = Record(**{make_compatible_name(k): v for k, v in vars(expected_partition_record).items()})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Aside: it's not easy to have default value for a pytest.mark.parametrize arg I think which is why I specified None each time before - and probably why there was no e.g. spark_create_table_sql_for_justification: str = None before that too)

oh i didn't know that! My comment was based on the fact that we can set a default None value.

doesn't it make more sense to be enabled by a boolean argument that we only set to True for the special field test case?

yea thats also the point i want to make. make it clear that only certain test cases requires make_compatible_name

but also, this is a nit comment, we dont necessary have to do this.

@@ -1077,6 +1077,7 @@ with table.update_schema() as update:
with table.update_schema() as update:
update.add_column(("details", "confirmed_by"), StringType(), "Name of the exchange")
```

Copy link
Contributor Author

@smaheshwar-pltr smaheshwar-pltr Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(By make lint - see #1507)

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kevinjqliu kevinjqliu merged commit 19ad24e into apache:main Jan 10, 2025
8 checks passed
@kevinjqliu
Copy link
Contributor

Thanks for following up on this @smaheshwar-pltr

@kevinjqliu kevinjqliu mentioned this pull request Jan 10, 2025
@smaheshwar-pltr smaheshwar-pltr deleted the url-encode-nits branch January 10, 2025 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants