Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update rapids dependencies #809

Merged
merged 2 commits into from
Dec 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ci/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -38,5 +38,5 @@ RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86

# install cuML
ARG CUML_VER=24.12
RUN conda install -y -c rapidsai-nightly -c conda-forge -c nvidia cuml=$CUML_VER cuvs=$CUML_VER python=3.10 cuda-version=11.8 numpy~=1.0 \
RUN conda install -y -c rapidsai -c conda-forge -c nvidia cuml=$CUML_VER cuvs=$CUML_VER python=3.10 cuda-version=11.8 numpy~=1.0 \
&& conda clean --all -f -y
4 changes: 2 additions & 2 deletions docker/Dockerfile.pip
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2023, NVIDIA CORPORATION.
# Copyright (c) 2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -18,7 +18,7 @@ ARG CUDA_VERSION=11.8.0
FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04

ARG PYSPARK_VERSION=3.3.1
ARG RAPIDS_VERSION=24.10.0
ARG RAPIDS_VERSION=24.12.0
ARG ARCH=amd64
#ARG ARCH=arm64
# Install packages to build spark-rapids-ml
Expand Down
4 changes: 2 additions & 2 deletions docker/Dockerfile.python
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2023, NVIDIA CORPORATION.
# Copyright (c) 2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -17,7 +17,7 @@
ARG CUDA_VERSION=11.8.0
FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu20.04

ARG CUML_VERSION=24.10
ARG CUML_VERSION=24.12

# Install packages to build spark-rapids-ml
RUN apt update -y \
Expand Down
2 changes: 1 addition & 1 deletion notebooks/aws-emr/init-bootstrap-action.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ sudo bash -c "wget https://www.python.org/ftp/python/3.10.9/Python-3.10.9.tgz &&
tar xzf Python-3.10.9.tgz && cd Python-3.10.9 && \
./configure --enable-optimizations && make altinstall"

RAPIDS_VERSION=24.10.0
RAPIDS_VERSION=24.12.0

sudo /usr/local/bin/pip3.10 install --upgrade pip

Expand Down
2 changes: 1 addition & 1 deletion notebooks/databricks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ If you already have a Databricks account, you can run the example notebooks on a
spark.task.resource.gpu.amount 1
spark.databricks.delta.preview.enabled true
spark.python.worker.reuse true
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-24.08.1.jar:/databricks/spark/python
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-24.10.1.jar:/databricks/spark/python
spark.sql.execution.arrow.maxRecordsPerBatch 100000
spark.rapids.memory.gpu.minAllocFraction 0.0001
spark.plugins com.nvidia.spark.SQLPlugin
Expand Down
4 changes: 2 additions & 2 deletions notebooks/databricks/init-pip-cuda-11.8.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ SPARK_RAPIDS_ML_ZIP=/dbfs/path/to/zip/file
# IMPORTANT: specify RAPIDS_VERSION fully 23.10.0 and not 23.10
# also in general, RAPIDS_VERSION (python) fields should omit any leading 0 in month/minor field (i.e. 23.8.0 and not 23.08.0)
# while SPARK_RAPIDS_VERSION (jar) should have leading 0 in month/minor (e.g. 23.08.2 and not 23.8.2)
RAPIDS_VERSION=24.10.0
SPARK_RAPIDS_VERSION=24.08.1
RAPIDS_VERSION=24.12.0
SPARK_RAPIDS_VERSION=24.10.1

curl -L https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/${SPARK_RAPIDS_VERSION}/rapids-4-spark_2.12-${SPARK_RAPIDS_VERSION}-cuda11.jar -o /databricks/jars/rapids-4-spark_2.12-${SPARK_RAPIDS_VERSION}.jar

Expand Down
2 changes: 1 addition & 1 deletion notebooks/dataproc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ If you already have a Dataproc account, you can run the example notebooks on a D
- Create a cluster with at least two single-gpu workers. **Note**: in addition to the initialization script from above, this also uses the standard [initialization actions](https://github.com/GoogleCloudDataproc/initialization-actions) for installing the GPU drivers and RAPIDS:
```
export CUDA_VERSION=11.8
export RAPIDS_VERSION=24.10.0
export RAPIDS_VERSION=24.12.0

gcloud dataproc clusters create $USER-spark-rapids-ml \
--image-version=2.1-ubuntu \
Expand Down
2 changes: 1 addition & 1 deletion notebooks/dataproc/spark_rapids_ml.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
# limitations under the License.


RAPIDS_VERSION=24.10.0
RAPIDS_VERSION=24.12.0

# patch existing packages
mamba install "llvmlite<0.40,>=0.39.0dev0" "numba>=0.56.2"
Expand Down
6 changes: 3 additions & 3 deletions python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ For simplicity, the following instructions just use Spark local mode, assuming a

First, install RAPIDS cuML per [these instructions](https://rapids.ai/start.html). Example for CUDA Toolkit 11.8:
```bash
conda create -n rapids-24.10 \
conda create -n rapids-24.12 \
-c rapidsai -c conda-forge -c nvidia \
cuml=24.10 cuvs=24.10 python=3.10 cuda-version=11.8 numpy~=1.0
cuml=24.12 cuvs=24.12 python=3.10 cuda-version=11.8 numpy~=1.0
```

**Note**: while testing, we recommend using conda or docker to simplify installation and isolate your environment while experimenting. Once you have a working environment, you can then try installing directly, if necessary.
Expand All @@ -19,7 +19,7 @@ conda create -n rapids-24.10 \

Once you have the conda environment, activate it and install the required packages.
```bash
conda activate rapids-24.10
conda activate rapids-24.12

## for development access to notebooks, tests, and benchmarks
git clone --branch main https://github.com/NVIDIA/spark-rapids-ml.git
Expand Down
2 changes: 1 addition & 1 deletion python/benchmark/databricks/init-pip-cuda-11.8.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ BENCHMARK_ZIP=/dbfs/path/to/benchmark.zip
# IMPORTANT: specify rapids fully 23.10.0 and not 23.10
# also, in general, RAPIDS_VERSION (python) fields should omit any leading 0 in month/minor field (i.e. 23.8.0 and not 23.08.0)
# while SPARK_RAPIDS_VERSION (jar) should have leading 0 in month/minor (e.g. 23.08.2 and not 23.8.2)
RAPIDS_VERSION=24.10.0
RAPIDS_VERSION=24.12.0
SPARK_RAPIDS_VERSION=24.10.1

curl -L https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/${SPARK_RAPIDS_VERSION}/rapids-4-spark_2.12-${SPARK_RAPIDS_VERSION}-cuda11.jar -o /databricks/jars/rapids-4-spark_2.12-${SPARK_RAPIDS_VERSION}.jar
Expand Down
2 changes: 1 addition & 1 deletion python/benchmark/dataproc/init_benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ function get_metadata_attribute() {
/usr/share/google/get_metadata_value "attributes/${attribute_name}" || echo -n "${default_value}"
}

RAPIDS_VERSION=$(get_metadata_attribute rapids-version 24.10.0)
RAPIDS_VERSION=$(get_metadata_attribute rapids-version 24.12.0)

# patch existing packages
mamba install "llvmlite<0.40,>=0.39.0dev0" "numba>=0.56.2"
Expand Down
Loading