Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-12116. Customizable Protobuf shaded prefix in ozonefs-hadoop3-client. #7729

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

apotheque
Copy link

What changes were proposed in this pull request?

I tried running Trino together with Ozone over OFS but encountered a Protobuf shading incompatibility between the Trino Hadoop library and the Ozone Hadoop client:

2024-12-27T09:09:30.178Z    DEBUG   hive-hive-0     org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem   Ozone URI for OFS initialization is ofs://ozone-om-1
2024-12-27T09:09:30.183Z    WARN    hive-hive-0     org.apache.hadoop.ozone.om.helpers.OzoneFSUtils Ignoring ozone.fs.hsync.enabled = true because HBase enhancements are disallowed. To enable it, set ozone.client.hbase.enhancements.allowed = true as well.
2024-12-27T09:09:30.184Z    DEBUG   hive-hive-0     org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem   hsyncEnabled = false
2024-12-27T09:09:30.473Z    INFO    dispatcher-query-3      io.trino.event.QueryMonitor     TIMELINE: Query 20241227_090929_00003_x4ukx :: FINISHED :: elapsed 480ms :: planning 134ms :: waiting 23ms :: scheduling 202ms :: running 128ms :: finishing 16ms :: begin 2024-12-27T09:09:29.959Z :: end 
2024-12-27T09:09:30.641Z    INFO    hive-hive-0     org.apache.hadoop.hdds.utils.LeakDetector       Starting leak detector thread OzoneClientObject0.
2024-12-27T09:09:30.816Z    DEBUG   hive-hive-0     org.apache.hadoop.ozone.om.protocolPB.OmTransportFactory        Loading OM transport implementation org.apache.hadoop.ozone.om.protocolPB.Hadoop3OmTransportFactory as specified by configuration.
2024-12-27T09:09:31.241Z    DEBUG   hive-hive-0     org.apache.hadoop.ozone.om.ha.OMFailoverProxyProviderBase       RetryProxy: OM om1: class org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OMRequest cannot be cast to class io.trino.hadoop.$internal.com.google.protobuf.Message (org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OMRequest and io.trino.hadoop.$internal.com.google.protobuf.Message are in unnamed module of loader io.trino.filesystem.manager.HdfsClassLoader @43fffec2)
2024-12-27T09:09:31.243Z    DEBUG   hive-hive-0     org.apache.hadoop.ozone.om.ha.OMFailoverProxyProviderBase       Incrementing OM proxy index to 0, nodeId: om1
2024-12-27T09:09:33.246Z    DEBUG   hive-hive-0     org.apache.hadoop.ozone.om.ha.OMFailoverProxyProviderBase       Failing over OM from om1:0 to om1:0
2024-12-27T09:09:33.246Z    DEBUG   hive-hive-0     org.apache.hadoop.ozone.om.ha.OMFailoverProxyProviderBase       RetryProxy: OM om1: null

The simplest way to resolve this issue is to adjust the ozonefs-hadoop3-client module, which already shades Protobuf, but currently uses a fixed prefix. I suggest making the prefix configurable as a separate property to allow customization if needed.

With the proposed changes, running mvn clean package -Dproto.shaded.prefix='io.trino.hadoop.\$internal' inside the ozonefs-hadoop3-client module will properly shade Protobuf and allow the output JAR to be used with Trino.

Related Trino issue: trinodb/trino#18026

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-12116

How was this patch tested?

Manually tested.

@apotheque apotheque marked this pull request as ready for review January 21, 2025 16:54
@myskov
Copy link
Contributor

myskov commented Jan 22, 2025

Thanks @apotheque for the patch!
I think using the same JAR file but building it differently can cause issues: once built, we can't determine whether it was built for Spark or Trino. Wouldn't it be better to use a separate artifact for Trino?

@myskov myskov self-requested a review January 22, 2025 08:54
@kerneltime kerneltime requested a review from adoroszlai January 23, 2025 03:31
@apotheque
Copy link
Author

apotheque commented Jan 23, 2025

I didn’t even know that the ozonefs-hadoop3-client is intended for Spark; it’s not clear as it is. Perhaps a better solution would be to add a suffix to this jar. For example, the final jar could be named ozone-filesystem-hadoop3-client-2.0.0-spark.jar (by default) or ozone-filesystem-hadoop3-client-2.0.0-trino.jar if a different suffix is specified in the build command. I don’t see a reason to add a separate module just to change one shading rule. What do you think, @myskov?

@adoroszlai
Copy link
Contributor

Thanks @apotheque for the patch.

I didn’t even know that the ozonefs-hadoop3-client is intended for Spark

ozonefs-hadoop3-client is intended for anything that uses hadoop-client-runtime, one example is Spark. shadedPattern is not specific to Spark, but Hadoop. So I don't think Spark should be included in the artifact name.

On the other hand, the prefix io.trino.hadoop.\$internal is specific to Trino.

I don’t see a reason to add a separate module just to change one shading rule.

I think it depends.

  • If Ozone release is to include the Trino-specific jar, either a separate module or a change in the build process is necessary. Separate module seems to be less intrusive of the two.
  • If it's enough to let adventurous Ozone users locally build their own jar for Trino, then this PR may be a good solution.

Merging this PR still allows working towards a separate module later.

@jojochuang
Copy link
Contributor

It's not a scalable solution.
We have ozonefs-hadoop3-client built for Spark (because Spark expects the hadoop-client-runtime which contains shaded protobuf jars)
Now we need to build another for Trino.
I know that Apache Phoenix has a similar issue and we decided to not support that particular use case for now. But the list is only going to grow.

We could built convenience jar to encourage adoption, for now. But long term I feel like the applications themselves should maintain these jars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants