Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Hybrid test cases in premerge/nightly CIs [databricks] #11906

Draft
wants to merge 36 commits into
base: branch-25.02
Choose a base branch
from

Conversation

res-life
Copy link
Collaborator

@res-life res-life commented Dec 25, 2024

Enable Hybrid test cases in premerge.

Depends on

This PR

Enable Hybrid test cases in premerge/nightly.
Temporarily based on #11720
Please review the tail commits, other commits is from #11720

Will merge #11720 and this PR after the premerge/nightly CIs are passed.

Signed-off-by: Chong Gao [email protected]

Chong Gao and others added 30 commits November 25, 2024 18:18
Signed-off-by: Chong Gao <[email protected]>
…ly supports 3.2.2, 3.3.1, 3.4.2, and 3.5.1.
Signed-off-by: Chong Gao <[email protected]>
@res-life res-life changed the title Add Hybrid cases to premerge Enable Hybrid test cases in premerge [databricks] Jan 15, 2025
@res-life res-life force-pushed the premerge-nightly-for-hybrid branch 3 times, most recently from a523285 to 889f32e Compare January 15, 2025 04:28
@res-life
Copy link
Collaborator Author

build


prepare_for_hybrid_feature() {
mvn -B dependency:get -DgroupId=com.nvidia \
-DartifactId=gluten-velox-bundle \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What artifacts are we going to release?

From the scripts, we'll release gluten-velox-bundle, rapids-4-spark-hybrid_2.12, gluten-thirdparty-lib?

Or we can consider package all this dependencies into one jar.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, release 3 jars, we can not package all this dependencies into one jar. Because customer may have there own customized gluten-velox-bundle jar. E.g.: customer has developed a file system like AFS.

@@ -160,6 +163,36 @@ rapids_shuffle_smoke_test() {
$SPARK_HOME/sbin/stop-master.sh
}

# parameters for Hybrid featrue
spark_prefix="${SPARK_VER:0:3}" # get prefix from SPARK_VER, e.g.: 3.2, 3.3 ... 3.5
GLUTEN_BUNDLE_JAR="gluten-velox-bundle-spark${spark_prefix}_2.12-ubuntu_20.04_x86_64-1.2.0.jar"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we def a global Gluten version 1.2.0 some where, e.g. in pom.xml?

Then we can share this version among docs and CI scripts, and easy to update it when Gluten version get updated

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add a config in version-def.sh

GLUTEN_VERSION=1.2.0

@res-life res-life force-pushed the premerge-nightly-for-hybrid branch from 889f32e to de7a570 Compare January 15, 2025 06:10
@res-life res-life force-pushed the premerge-nightly-for-hybrid branch from de7a570 to fd2b513 Compare January 15, 2025 06:13
@res-life
Copy link
Collaborator Author

build

@res-life res-life changed the title Enable Hybrid test cases in premerge [databricks] Enable Hybrid test cases in premerge/nightly CIs [databricks] Jan 15, 2025
-DartifactId=gluten-velox-bundle \
-Dversion=${GLUTEN_VERSION} \
-Dpackaging=jar \
-Dclassifier=spark${spark_prefix}_2.12-ubuntu_${GLUTEN_FOR_OS}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo ?

  • \ --> -Dclassifier=spark${spark_prefix}_2.12-ubuntu_${GLUTEN_FOR_OS} '

@@ -26,6 +26,9 @@ for VAR in $OVERWRITE_PARAMS; do
done
IFS=$PRE_IFS

# configs for Hybrid feature
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use default value patten instead of hardcode, then we can overwrite these vars outside without changing this script

GLUTEN_VERSION=${GLUTEN_VERSION:-"1.2.0"}
GLUTEN_FOR_OS=${GLUTEN_FOR_OS:-"20.04"}

GLUTEN_THIRD_PARTY_JAR="gluten-thirdparty-lib-${GLUTEN_VERSION}-ubuntu-${GLUTEN_FOR_OS}-x86_64.jar"

# download Gluten, Hybrid jars
mvn -B dependency:get -DgroupId=com.nvidia \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: avoid using mvn dependency:get if possible, the IT run does not require mvn dep.
use wget/curl instead

@@ -109,6 +109,10 @@ mvn_verify() {
do
TZ=$tz ./integration_tests/run_pyspark_from_build.sh -m tz_sensitive_test
done

# test Hybrid feature
source "${WORKSPACE}/jenkins/hybrid_execution.sh"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long will this test take? we may need some duration to determine if we should put it into in ci1 or ci2 stage for balancing the workloads

@@ -0,0 +1,51 @@
#!/bin/bash
Copy link
Collaborator

@pxLi pxLi Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to make this script for preparing artifacts only and the actual caller place to run test command directly,
or at least separate the func of artifacts preparation from the actual test call func.

sth like

prepare() {
    # detect the os,JVM, etc
    echo "success" # or "skip/fail/or actual reason why this is discontinued" which is non-success to skip run_test

    # the func still errors out in case that it passes the env detection but fails to do the preparation works like download error
}

run_test() {
    # run the case
}

# from caller
# source the script
result=$(prepare)

if [[ "$result" == "success" ]]; then
    run_test
else
    echo "skipped test run blahblah: $result"
fi

Copy link
Collaborator

@pxLi pxLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please help doc the steps for local development to run of hybrid cases in section like https://github.com/NVIDIA/spark-rapids/blob/branch-25.02/integration_tests/README.md#enabling-cudf_udf-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants