-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spill framework refactor for better performance and extensibility [databricks] #11747
Conversation
Signed-off-by: Alessandro Bellina <[email protected]>
@@ -44,22 +44,20 @@ class RapidsSerializerManager (conf: SparkConf) { | |||
|
|||
private lazy val compressionCodec: CompressionCodec = TrampolineUtil.createCodec(conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to remove this class, or make it much simpler.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/DeviceMemoryEventHandler.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillFramework.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillFramework.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillFramework.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillFramework.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillFramework.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillFramework.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillFramework.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillFramework.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't finish, but things keep changing under me so I thought I would publish what I have so far
true | ||
} | ||
|
||
shouldRetry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why is this shouldRetry change in there? I assume it was for debugging at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
undoing
@@ -1110,7 +1110,7 @@ class CudfSpillableHostConcatResult( | |||
val hmb: HostMemoryBuffer) extends SpillableHostConcatResult { | |||
|
|||
override def toBatch: ColumnarBatch = { | |||
closeOnExcept(buffer.getHostBuffer()) { hostBuf => | |||
closeOnExcept(buffer.getHostBuffer) { hostBuf => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix
@@ -307,7 +307,6 @@ class LazySpillableColumnarBatchImpl( | |||
spill = Some(SpillableColumnarBatch(cached.get, | |||
SpillPriorities.ACTIVE_ON_DECK_PRIORITY)) | |||
} finally { | |||
// Putting data in a SpillableColumnarBatch takes ownership of it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this no longer true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was debugging something here, and I must have forgotten to undo this change. coming up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but to answer your question, the spill framework takes ownership, always
Sorry for the rapid movement @revans2. I'll pause for a bit, and come back to address the ChunkedPacker comments and other comments I get. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise leaving partial review
sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have a lot more to go through, but I thought I would at least get some of my comments in.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/ShuffleBufferCatalog.scala
Outdated
Show resolved
Hide resolved
* Set a new spill priority. | ||
*/ | ||
override def setSpillPriority(priority: Long): Unit = { | ||
// TODO: handle.setSpillPriority(priority) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or remove this entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other code in the plugin calls this. I kept it for now, but can clean up if you really want me to.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillableColumnarBatch.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/SpillableColumnarBatch.scala
Show resolved
Hide resolved
@@ -245,18 +338,29 @@ object SpillableColumnarBatch { | |||
*/ | |||
def apply(batch: ColumnarBatch, | |||
priority: Long): SpillableColumnarBatch = { | |||
Cuda.DEFAULT_STREAM.sync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add comments in the code around why we have the sync.
The reason is that if you hand off an object to the framework, we could turn around and spill them immediately. We need to make sure the object is immutable when it hits the store.
We can move the sync back to the factory methods in the spill framework if that is desired?
RapidsBufferCatalog.addBatch(batch, initialSpillPriority) | ||
} | ||
} | ||
Cuda.DEFAULT_STREAM.sync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again why?
catalog: RapidsBufferCatalog): RapidsBufferHandle = { | ||
withResource(batch) { batch => | ||
catalog.addBatch(batch, initialSpillPriority) | ||
val handle = SpillableHostColumnarBatchHandle(batch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no stream sync if it is needed for the other APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a host batch, so we didn't need a device sync on this :)
val handle = withResource(buffer) { _ => | ||
RapidsBufferCatalog.addBuffer(buffer, meta, priority) | ||
} | ||
Cuda.DEFAULT_STREAM.sync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still have a lot more to look at, but I am making progress
buffOffset += blockRange.rangeSize() | ||
} | ||
needsCleanup = false | ||
} catch { | ||
case ioe: IOException => | ||
case ex: Throwable => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to catch Errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me fix to catch Exception
tests/src/test/scala/com/nvidia/spark/rapids/ShuffleBufferCatalogSuite.scala
Outdated
Show resolved
Hide resolved
tests/src/test/scala/com/nvidia/spark/rapids/spill/SpillFrameworkSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not done, but I need to switch to some other things so I am going to comment on what I have looked at so far.
* | ||
* CUDA/Host synchronization: | ||
* | ||
* We assume all device backed handles are completely materialized on the device (before adding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? Please add that to the comments so that it is clear what is happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added info in this comment. We could, at least for the GPU handles, add an event per handle and we could record on that event, instead of synchronizing. I think this means I should move the synchronization to the framework, right now (as you spotted), I let callers synchronize before they call the factory methods.
* with extra locking is the `SpillableHostStore`, to maintain a `totalSize` number that is | ||
* used to figure out cheaply when it is full. | ||
* | ||
* Handles hold a lock to protect the user against when it is either in the middle of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing. So essentially the lock prevents race conditions when the object is in the middle of spilling or being closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a confusing comment. It seems important to someone designing a new spillable handle, not to a end user. I have reworded it.
* - If sizeInBytes is 0, the object is tracked by the stores so it can be | ||
* removed on shutdown, or by handle.close, but 0-byte handles are not spillable. | ||
*/ | ||
val sizeInBytes: Long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is approximate we should name it such. So there is no confusion about how that can be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this is a val, but if something spills wouldn't it change to 0 from whatever it was before? Shouldn't it be a def?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it is, and it isn't. That's the issue with sizeInBytes
. I am going to rework that and make it clearer. I think something like approxSizeInBytes
makes sense, or trackedSizeInBytes
, and it should be private[spill]
going with the other things that are marked that way. But some handles do have a size that is not approximate, and IS used for creating buffers, so I kind of want that size reported with a different API, specific to each handle.
It is a val because I didn't see a point in changing it. If the object is spilled that is determined by dev
, or host
being empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also a val, because I wanted to set spillable
at construction for 0-byte handles. Any other handle is spillable by definition at the level of the trait SpillableHandle
, but if it's not a val then there's a chance that approxSizeInBytes
is not ready at construction time, so I don't know if I can call it reliably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree this is confusing. The val part is because I want to mark objects that are 0 sized as not spillable, and I'd like to do that at construction time. It removes some locking that otherwise I have to do.
I have renamed this approxSizeInBytes
and left sizeInBytes
public in handles that do support this non-approximate value. I default this so in most cases approxSizeInBytes = sizeInBytes
, but I do think this makes it clearer. Please take a look at d9490ee
* or directly against the handle. | ||
* @return sizeInBytes if spilled, 0 for any other reason (not spillable, closed) | ||
*/ | ||
def spill: Long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I thought the convention in Scala was that a method with no parameters should have parens if it did work (like side effects). Should this be def spill(): Long
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixing all of these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed with d9dbf36
/** | ||
* Method called to spill this handle. It can be triggered from the spill store, | ||
* or directly against the handle. | ||
* @return sizeInBytes if spilled, 0 for any other reason (not spillable, closed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify do we want them to return the result of calling sizeInBytes? Especially if it is just an approximate size? Can we clarify a little bit here about what is expected to be returned by this API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully this is clearer with a comment I added.
* is a `val`. | ||
* @return true if currently spillable, false otherwise | ||
*/ | ||
private[spill] def spillable: Boolean = sizeInBytes > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this private to spill? Just curious, because if we want others to add spillable things in does that mean they have to put the handles in com.nvidia.spark.rapids.spill to make them work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. We can try to open that up if we want spillables in other packages, but right now it's all in here so I made this change in response to a comment by @jlowe. I did want to access this from unit test, and that's why it is private[spill]
specifically.
// do we care if: | ||
// handle didn't fit in the store as it is too large. | ||
// we made memory for this so we are going to hold our noses and keep going | ||
// we could spill `handle` at this point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we spoke about this already. There are currently two cases.
- We are running with no limit on the host memory, but a spill store limit
- We have a host memory limit.
If we have a host memory limit, then the spill store is constrained by the host memory limit so we can ignore it here. we will not have allocated a HostMemoryBuffer that is too large to fit.
If we have unlimited host memory, then making a SpillableHostBufferHandle is there for code compatibility, but it should not be added to the host store at all. It should never spill. We have unlimited host memory.
We should also document this somehere.
…which is different than before
build - re-deploy jenkins instance for an internal mandatory ops. rekicked the blossom-ci |
build |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/ShuffleBufferCatalog.scala
Outdated
Show resolved
Hide resolved
private val MAX_TABLE_ID = Integer.MAX_VALUE | ||
private val TABLE_ID_UPDATER = new IntUnaryOperator { | ||
override def applyAsInt(i: Int): Int = if (i < MAX_TABLE_ID) i + 1 else 0 | ||
def getColumnarBatchAndRemove(handle: RapidsShuffleHandle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering why this method exists on the catalog rather than a getColumnarBatch method and a sizeInBytes method on the RapidsShuffleHandle. Then the caller can just use withResource on the shuffle handle directly and call those methods within the withResource block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tableMeta
is not part of the spill framework for UCX, it is stored in the shuffle catalogs instead using a RapidsShuffleHandle
(which doesn't live in the spill framework... yet).
I think we can make these first class spill handles.
tests/src/test/scala/com/nvidia/spark/rapids/timezone/TimeZonePerfSuite.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did a pass on the SpillFramework tests and had some questions
} | ||
} | ||
|
||
def initialize(rapidsConf: RapidsConf): Unit = synchronized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check if it was already initialized, and if so, throw IllegalStateException
?
* | ||
* We handle aliasing of objects, either in the spill framework or outside, by looking at the | ||
* reference count. All objects added to the store should support a ref count. | ||
* If the ref count is greater than the expected value, we assume it is being aliased, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work? What is the "expected value" derived from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each handle overrides def spillable
and does things differently. A buffer is super easy, it's just MemoryBuffer.getRefCount==1
, a ColumnarBatch needs to figure out if it has repetition: A CB of [col0, col0, col0] will have three columns, but they will have a refcount of 3, so 3 is the spillable ref count, not 1, and the spillable
method checks this for every column of the batch.
} | ||
} | ||
|
||
test("an aliased contiguous table is not spillable (until closing the original) ") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be "until closing the alias?"
assertResult(2)(SpillFramework.stores.deviceStore.numHandles) | ||
assert(!handle.spillable) | ||
assert(!aliasHandle.spillable) | ||
} // we now have two copies in the store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment supposed to apply to the code inside the curly brackets, or just after closing? It's a bit confusing
|
||
test("a buffer is not spillable until the owner closes columns referencing it") { | ||
val (ct, _) = buildContiguousTable() | ||
// the contract for spillable handles is that they take ownership |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naive question - why can't the call to getBuffer
contain the incRefCount
? Are there instances where we want/need a non-owning reference (despite the contract)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should. We should file a follow on to fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertResult(0)(SpillFramework.stores.hostStore.numHandles) | ||
assertResult(1)(SpillFramework.stores.diskStore.numHandles) | ||
assert(handle.dev.isEmpty) | ||
assert(handle.host.isDefined) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain/add a comment as to why handle.host is defined but handle.host.get.host is not? I'm sure there's a good reason but it's not obvious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we have architected the spill handles is that they can cascade device->host->disk. But, the device handles don't have a host AND disk handles, they just have a host handle. If the host handle itself spilled, or could not fit on host to begin with, its host component is empty, and its disk component is set. So the view of the world from the handle's perspective is either set in the place where they are supposed to be, or they have a handle to something that can help them find the object later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, it's kind of like how the spill stores had a next
store reference
} | ||
|
||
test("host originated: get host memory buffer") { | ||
val spillPriority = -10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had this value in the test before, I just copied it. I didn't want to remove spill priorities because we still want to implement them, and was hoping for tests to fail on me when I change the interface (or I could easily find/replace things)
test("host originated: a host batch supports aliasing and duplicated columns") { | ||
SpillFramework.shutdown() | ||
val sc = new SparkConf | ||
// disables the host store limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just a confusing config name where setting it (enabled) to true disables it (because it enables some alternative mechanism or something), or is the comment wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we have defined host limits is that if they are set, the host spill's store own limits are ignored.
The idea is that host spills will be triggered only by a host OOM, and not via a host store limit. So yes, enabling the off heap limit, disables the host store limit. I'll try to change the comment.
// this is a key behavior that we wanted to keep during the spill refactor | ||
// where host objects that are added directly to the store do not cause a | ||
// host->disk spill on their own, instead they will get spilled later | ||
// due to device->host spills. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because we are assuming the memory is already allocated/accounted for? Are we still adding the new buffer size to the total store allocated size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes and yes. The host memory was
allocated, and we track it as such. But we are not going to actively spill right now. We'll spill later when device wants to spill.
This may change later. And when host limits are enabled (not host store limits) then it fits that model better.
} | ||
} | ||
|
||
val hostSpillStorageSizes = Seq("-1", "1MB", "16MB") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does size -1 do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 means no host store limit. Will add comment.
Signed-off-by: Alessandro Bellina <[email protected]>
build |
build |
build |
1 similar comment
build |
if (!spillable) { | ||
0L | ||
} else { | ||
synchronized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a race condition here? If Thread t1 calls spill()
, and they see that spillable
is true, but then thread t2 jumps in and calls materialize()
and gets the lock and materializes. Say the buffer is on the device only at the moment, so t2 gets a ref to the DeviceMemoryBuffer
, and ref count goes to 2. Now t2 drops the lock and goes on to actually use the buffer, and t1 proceeds and starts spilling even though it's no longer spillable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is an acceptable race.
In this case we copy to host but we can't free (we call close but we don't actually free). From the perspective of the spill framework, we are going to synchronize up to the point where we call .close on the DeviceMemoryBuffer
. The rest is left to the caller. The caller needs to synchronize w.r.t. apis that it calls, and I think that part is understood (it's part of the original spill framework as well.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/spill/SpillFramework.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Alessandro Bellina <[email protected]>
build |
running it again with db enabled. |
build |
1 similar comment
build |
This is a very large PR that I'd like some 👀 on. Marked it as draft as I still have some TODOs around more tests. The PR is NOT going to go to 24.12, it's just that we don't have a 25.02 available.
The main file I think one should focus on is
SpillFramework.scala
(yeap one file, let me know if you want me to break that into multiple files).SpillFramework.scala
has a comment describing how things should work, please take a look at that.The main contribution here is a simplification of the framework where we remove the idea of a
RapidsBuffer
that has to be acquired and unacquired, for the idea of a handle that just knows how tomaterialize
. There isn't a concept of acquisition in the new framework.There is a
SpillableColumnarBatch
api and a lazy-spillable api for Join that I did not touch and left there on purpose, but we can start to remove that API and create spillable handles that replicate the lazy behavior we wanted in lazy spillable, or the recomputing behavior we want for broadcasts. This is the second contribution of the PR: handles decide how to spill, not the framework.There is one easily fixable shortcoming today in the multiple-spiller case, that I will fix in a follow on PR. While we are spilling a handle, the handle holds a lock. The same lock is used to figure out if the handle is spillable. A second thread that is trying to spill may need to wait for this lock (and spill) to finish, to figure out if it needs to spill that handle or not. We can make this more straightforward by handling the spill state separate from the materialization/data state, but I'd like to submit that work as an improvement.
I have run this against NDS @ 3TB in our perf cluster and I don't see regressions, and have run it against spill prone cases and I am able to see multiple threads in the "spill path", and no deadlocks. I'll post more results when I can run them.