-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nit fixes to URL-encoding of partition field names #1499
Conversation
|
||
import pytest | ||
from pyspark.sql import SparkSession | ||
from pyspark.sql.utils import AnalysisException | ||
|
||
from pyiceberg.catalog import Catalog | ||
from pyiceberg.partitioning import PartitionField, PartitionFieldValue, PartitionKey, PartitionSpec | ||
from pyiceberg.schema import Schema | ||
from pyiceberg.schema import Schema, make_compatible_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this file address #1457 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! minor comments
pyiceberg/partitioning.py
Outdated
@@ -237,8 +237,7 @@ def partition_to_path(self, data: Record, schema: Schema) -> str: | |||
value_str = quote_plus(value_str, safe="") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can collapse this too
if make_compatible_name | ||
else expected_partition_record | ||
) | ||
sanitized_record = Record(**{make_compatible_name(k): v for k, v in vars(expected_partition_record).items()}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can either do this and run make_compatible_name
for every test case
or we can set make_compatible_name
as optional and only run for the specific use case
make_compatible_name: Optional[Callable[[str], str]] = None,
....
if make_compatible_name:
....
which one do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not sure I understand. If it's always going to be pyiceberg.schema.make_compatible_name
, doesn't it make more sense to be enabled by a boolean argument that we only set to True
for the special field test case? I'm probably misunderstanding here again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Aside: it's not easy to have default value for a pytest.mark.parametrize
arg I think which is why I specified None
each time before - and probably why there was no e.g. spark_create_table_sql_for_justification: str = None
before that too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Aside: it's not easy to have default value for a pytest.mark.parametrize arg I think which is why I specified None each time before - and probably why there was no e.g. spark_create_table_sql_for_justification: str = None before that too)
oh i didn't know that! My comment was based on the fact that we can set a default None
value.
doesn't it make more sense to be enabled by a boolean argument that we only set to True for the special field test case?
yea thats also the point i want to make. make it clear that only certain test cases requires make_compatible_name
but also, this is a nit comment, we dont necessary have to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah not sure. I marginally prefer leaving it as it is now - it reads more nicely, prevents None
s/False
s mostly everywhere, sanitisation is fast so I don't think it'll causes a cumulative slowdown even when done for several test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we'd need to run make lint
, due to #1507
if make_compatible_name | ||
else expected_partition_record | ||
) | ||
sanitized_record = Record(**{make_compatible_name(k): v for k, v in vars(expected_partition_record).items()}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Aside: it's not easy to have default value for a pytest.mark.parametrize arg I think which is why I specified None each time before - and probably why there was no e.g. spark_create_table_sql_for_justification: str = None before that too)
oh i didn't know that! My comment was based on the fact that we can set a default None
value.
doesn't it make more sense to be enabled by a boolean argument that we only set to True for the special field test case?
yea thats also the point i want to make. make it clear that only certain test cases requires make_compatible_name
but also, this is a nit comment, we dont necessary have to do this.
@@ -1077,6 +1077,7 @@ with table.update_schema() as update: | |||
with table.update_schema() as update: | |||
update.add_column(("details", "confirmed_by"), StringType(), "Name of the exchange") | |||
``` | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(By make lint
- see #1507)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for following up on this @smaheshwar-pltr |
Follow-up to #1457 that addresses nits on that PR.