Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve LocationProvider unit tests #1511

Merged
merged 3 commits into from
Jan 13, 2025

Conversation

smaheshwar-pltr
Copy link
Contributor

@smaheshwar-pltr smaheshwar-pltr commented Jan 12, 2025

See #1509 (comment) for context/motivation.

This PR better highlights location provider behaviour in the presence of partitions in unit tests. It tests related behaviour more carefully - e.g. it tests that partition values and entropy included in paths when object storage location provision is used (without partitioned paths disabled) that can be confusing. It also tests the injected hash itself in this scenario because it's the hash method of both the file name and the partition key - this matches the Java implementation and I've tested that the Java code produces the same output for this case (https://github.com/apache/iceberg/blob/main/core/src/test/java/org/apache/iceberg/TestLocationProvider.java has no tests for this as of now).

@kevinjqliu maybe, mind taking a look?

@smaheshwar-pltr smaheshwar-pltr force-pushed the object-storage-better-tests branch from 216e9a9 to a88a484 Compare January 12, 2025 18:09
Comment on lines +100 to +101
# key AND the data file name are used as the hash input. This matches Java behaviour; the hash below is what the
# Java implementation produces for this input too.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've verified this. There's no test of this in https://github.com/apache/iceberg/blob/main/core/src/test/java/org/apache/iceberg/TestLocationProvider.java that we can take the hash from sadly.

I like testing this here. It's not obvious that both partition key and file name are used as hash input.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this, i have one nit comment

tests/table/test_locations.py Show resolved Hide resolved
@kevinjqliu kevinjqliu merged commit a09bcde into apache:main Jan 13, 2025
7 checks passed
@kevinjqliu
Copy link
Contributor

Thanks for the follow up @smaheshwar-pltr

@smaheshwar-pltr smaheshwar-pltr deleted the object-storage-better-tests branch January 18, 2025 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants