The ideal use-case for Spark-Rapids for the best performance/$ improvements. #11937

MaxNevermind · 2025-01-07T23:57:30Z

MaxNevermind
Jan 7, 2025

Hi everybody.

I wanted to ask about ideally suitable use-case for Spark-Rapids usage in terms of the best possible performance/$ improvements. As I understand from different sources it seems to me that would likely mean:

Large amount of data(each GPU should process GBs of data not few MB)
Long running queries(few minutes to hours)
Low end cheap GPUs like T4 or L4
Job should consists of regular ETL transformations(joins, projections, windows, groupBys etc.), ideally should not use UDFs

Is my general understanding correct?

Answered by revans2

Jan 8, 2025

@MaxNevermind Your general understanding is correct with a few caveats.

Large amounts of data are important to keep the GPU busy, but at the same time you need I/O that can keep up with that data. In CPU based Spark clusters we tend to see them generally be compute bound. When we add a GPU they quickly start to look much more I/O bond. This includes disks for the shuffle data and networking for both shuffle and reading/writing from S3/HDFS/...

Long running queries typically have more room for improvement. We do see improvements in sub-min queries, but you typically want several of them in an application to see large cost savings.

View full answer

revans2 · 2025-01-08T16:39:38Z

revans2
Jan 8, 2025
Maintainer

@MaxNevermind Your general understanding is correct with a few caveats.

Large amounts of data are important to keep the GPU busy, but at the same time you need I/O that can keep up with that data. In CPU based Spark clusters we tend to see them generally be compute bound. When we add a GPU they quickly start to look much more I/O bond. This includes disks for the shuffle data and networking for both shuffle and reading/writing from S3/HDFS/...

Long running queries typically have more room for improvement. We do see improvements in sub-min queries, but you typically want several of them in an application to see large cost savings.

0 replies

MaxNevermind · 2025-01-11T21:00:17Z

MaxNevermind
Jan 11, 2025
Author

@revans2
Thanks!

Could you also recommend an optimal CPU/GPU ratio range? I guess that would depend on pipeline logic and other variables but some average ranges can still be established I believe. I see that most of AWS L4/T4 based instances provide 12 CPU cores / 1 GPU. Do you think it is reasonable or it is overkill and it should be lower like 4 CPU cores / 1 GPU?

1 reply

tgravescs Jan 13, 2025
Maintainer

Generally a base recommendation is 16 cores per 1 gpu. This is based on the NDS (https://github.com/NVIDIA/spark-rapids-benchmarks/tree/dev/nds) benchmarks we have run with different configurations for cores to gpu.

Like you said it will vary depending on hardware, IO characteristics, data size and your workload.

If the instances you have access to and the cost comes out best, 12 cores to 1 GPU is not bad.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The ideal use-case for Spark-Rapids for the best performance/$ improvements. #11937

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

The ideal use-case for Spark-Rapids for the best performance/$ improvements. #11937

MaxNevermind Jan 7, 2025

Replies: 2 comments · 1 reply

revans2 Jan 8, 2025 Maintainer

MaxNevermind Jan 11, 2025 Author

tgravescs Jan 13, 2025 Maintainer

MaxNevermind
Jan 7, 2025

Replies: 2 comments 1 reply

revans2
Jan 8, 2025
Maintainer

MaxNevermind
Jan 11, 2025
Author

tgravescs Jan 13, 2025
Maintainer