-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop spark31x shims [databricks] #11159
Conversation
1, Remove spark31x json lines from the source code 2, Remove the files those only for spark31x shims 3, Move the files for spark31x and spark32x+ shims into sql-plugin/src/main/spark320 folder Signed-off-by: Tim Liu <[email protected]>
Signed-off-by: Tim Liu <[email protected]>
tests/src/test/spark311/scala/com/nvidia/spark/rapids/shims/OrcStatisticShim.scala --> tests/src/test/spark321cdh/scala/com/nvidia/spark/rapids/shims/OrcStatisticShim.scala check if we chan merge this file into? tests/src/test/spark320/scala/com/nvidia/spark/rapids/shims/OrcStatisticShim.scala Signed-off-by: Tim Liu <[email protected]>
Signed-off-by: Tim Liu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scala2.13 pom files need to be regenerated to pick up the latest diffs.
There are still references to spark31x or 3.1.x in the following files that should be easy to update:
- CONTRIBUTING.md
- api_validation/README.md
- api_validation/src/main/scala/com/nvidia/spark/rapids/api/ApiValidation.scala
- dist/README.md
- docs/dev/shims.md
- jenkins/hadoop-def.sh
- jenkins/spark-nightly-build.sh
- jenkins/version-def.sh
- sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala
There are a number of things that should be done as followups to this, and I'll file issues to track:
- Some files no longer need to be treated as a shim after the removal of the spark31x shims (i.e.: code is now the same for all supported Spark versions).
- Tests need to be updated to remove tests specific to Spark 3.1.x
@@ -44,7 +44,7 @@ import org.apache.spark.util.MutableURLClassLoader | |||
3. a smaller fraction of classes that differ under one of the supported Spark versions | |||
com/nvidia/spark/SQLPlugin.class | |||
spark-shared/com/nvidia/spark/rapids/CastExprMeta.class | |||
spark311/org/apache/spark/sql/rapids/GpuUnaryMinus.class | |||
spark320/org/apache/spark/sql/rapids/GpuUnaryMinus.class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change doesn't make sense given the context. This line is now duplicated with the one below. Probably need to pick a different example class to use here (can examine the contents of a dist jar after a buildall for a class that has a spark320 version and one other spark version)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, GpuLast.class should be good to replace current class here
Binary files ./spark331/org/apache/spark/sql/rapids/aggregate/GpuLast.class and ./spark320/org/apache/spark/sql/rapids/aggregate/GpuLast.class differ
Fixed
sql-plugin-api/src/main/scala/com/nvidia/spark/rapids/ShimLoader.scala
Outdated
Show resolved
Hide resolved
Were you able to use shimplify.remove.shim per shims.md?
|
Remove the 2 shims(e.g. 312+313) are good with the CLI, but when the last shim(311) failed
|
Signed-off-by: Tim Liu <[email protected]>
Yes, I'll push a commit to update all the spark31x/3.1.x references --> Fixed |
commit changes after every run |
@razajafri please discuss with Tim about #10994. |
Signed-off-by: Tim Liu <[email protected]>
Change the default shim to spark320 from spark311 in the shims in docs, source code and build scripts Signed-off-by: Tim Liu <[email protected]>
@@ -8,7 +8,7 @@ parent: Developer Overview | |||
# Shim Development | |||
|
|||
RAPIDS Accelerator For Apache Spark supports multiple feature version lines of | |||
Apache Spark such as 3.1.x, 3.2.x, 3.3.0 and a number of vendor releases that contain | |||
Apache Spark such as 3.2.x, 3.3.x, 3.4.x, 3.5.x and a number of vendor releases that contain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the source code/classes that the Upstream base apache/spark classes we derive from to fix the incompatible shims building
@gerashegalov @jlowe @razajafri @revans2 @tgravescs Can you help to update this part in the file:docs/dev/shims.md
? Thanks a lot!
### Compile-time issues
Upstream base classes we derive from might be incompatible in the sense that one version
requires us to implement/override the method `M` whereas the other prohibits it by marking
the base implementation `final`, E.g. `org.apache.spark.sql.catalyst.trees.TreeNode` changes
between Spark 3.1.x and Spark 3.2.x. So instead of deriving from such classes directly we
inject an intermediate trait e.g. `com.nvidia.spark.rapids.shims.ShimExpression` that
has a varying source code depending on the Spark version we compile against to overcome this
issue as you can see e.g., comparing TreeNode:
1. [ShimExpression For 3.1.x](https://github.com/NVIDIA/spark-rapids/blob/6a82213a798a81a5f32f8cf8b4c630e38d112f65/sql-plugin/src/main/spark311/scala/com/nvidia/spark/rapids/shims/TreeNode.scala#L28)
2. [ShimExpression For 3.2.x](https://github.com/NVIDIA/spark-rapids/blob/6a82213a798a81a5f32f8cf8b4c630e38d112f65/sql-plugin/src/main/spark320/scala/com/nvidia/spark/rapids/shims/TreeNode.scala#L37)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would simply leave it as is. I don't want to necessarily search for new examples of this problem. And the links are permalinks that will work even after this PR is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's just an example demonstrating how the shims work, I'm fine leaving as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me
@gerashegalov BTW, the result of the shimplify run and my manually update is the same in this PR. |
Signed-off-by: Tim Liu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's here looks good to me. In addition to the test cleanup (tracked by #11160) and the unshimming of classes that no longer need to be shims (tracked by #11161) , there are generated tool files under tools/generated/31* that should also be removed. I'm OK with this being a followup.
cc: @amahussein @tgravescs for awareness, as the main tools files are now showing Spark 3.2.0 support and the tools/generated/31* directories are going to be deleted soon.
build |
build |
To fix: NVIDIA#11175 Clean up unused and duplicated 'org/roaringbitmap' in the spark320 shim folder to walk around for the JACOCO error 'different class with same name', after we drop 31x shims and change the default shim to spark320 Signed-off-by: Tim Liu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
1, Remove spark31x json lines from the source code
2, Remove the files those only for spark31x shims
3, Move the files for spark31x and spark32x+ shims into sql-plugin/src/main/spark320 folder
4, Drop spark31x shims in the build scripts and pom files