Skip to content

The ideal use-case for Spark-Rapids for the best performance/$ improvements. #11937

Answered by revans2
MaxNevermind asked this question in Q&A
Discussion options

You must be logged in to vote

@MaxNevermind Your general understanding is correct with a few caveats.

Large amounts of data are important to keep the GPU busy, but at the same time you need I/O that can keep up with that data. In CPU based Spark clusters we tend to see them generally be compute bound. When we add a GPU they quickly start to look much more I/O bond. This includes disks for the shuffle data and networking for both shuffle and reading/writing from S3/HDFS/...

Long running queries typically have more room for improvement. We do see improvements in sub-min queries, but you typically want several of them in an application to see large cost savings.

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by MaxNevermind
Comment options

You must be logged in to vote
1 reply
@tgravescs
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants