Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Managed Iceberg] support BQMS catalog #33511

Merged
merged 11 commits into from
Jan 9, 2025

Conversation

ahmedabu98
Copy link
Contributor

Downloads the BQMS jar during build time (best effort)
Adds validation tests for BQMS catalog

Copy link
Contributor

github-actions bot commented Jan 7, 2025

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers


def bqmsLocation = "$buildDir/libs"
task downloadBqmsJar(type: Copy) {
def jarUrl = 'https://storage.googleapis.com/spark-lib/bigquery/iceberg-bigquery-catalog-1.5.2-0.1.0.jar'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we define the version somewhere to control this and add the comments about when to change the version?

Copy link
Contributor

@Abacn Abacn Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related, currently it downloads from a gcs bucket through googleapis. Is iceberg-bigquery-catalog available in some public (trusted) source / a maven repository?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is iceberg-bigquery-catalog available in some public

Not yet. But I believe the BQ team guarantees the availability of the jar in location and it can be trusted. I don't think we have a clear ETA regarding when this jar will be available in a public repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for defining the version elsewhere for easy tracking. Probably we can just define this at the BeamModulePlugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely convinced about defining a version here, since this jar is downloaded from a plain GCS bucket path, not a repository that keeps track of versions.

e.g. there's no guarantee the BigQuery team will use the same naming template if they hypothetically release a new catalog version.

Copy link
Contributor

@chamikaramj chamikaramj Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just define versions as constants there, not really related to any repositories.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, we shouldn't define the dependency there, just the version so that it's better visible at the top level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that a new GCS path may not have the same naming template, so defining a version may not be helpful here. I don't have strong feelings about it though, I can add it in if it's a blocker

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just define it there for consistency. We can always change or define a new constant if the naming pattern changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the latest commit. I'm not the biggest fan 😅 but it should work.

Lmk if this is what you had in mind

Copy link
Contributor

github-actions bot commented Jan 7, 2025

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @robertwb for label java.
R: @Abacn for label build.
R: @Abacn for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@chamikaramj
Copy link
Contributor

chamikaramj commented Jan 8, 2025

LGTM other than the existing comments.

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM.

@ahmedabu98 ahmedabu98 merged commit 6b3783f into apache:master Jan 9, 2025
26 of 28 checks passed
ahmedabu98 added a commit to ahmedabu98/beam that referenced this pull request Jan 9, 2025
* Add BQMS catalog

* trigger integration tests

* build fix

* use shaded jar

* shadowClosure

* use global timeout for tests

* define version in BeamModulePlugin

* address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants