Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform handle spill IO outside of locked section in SpillFramework #11880

Open
wants to merge 22 commits into
base: branch-25.02
Choose a base branch
from

Conversation

zpuller
Copy link
Collaborator

@zpuller zpuller commented Dec 16, 2024

Addresses #11830

Moves the IO outside of the critical section in the spillable buffer handle spill functions to allow threads interacting with the SpillFramework to manage the spill state of a handle consistently without being blocked on IO. Eg. if thread t1 is in the middle of spilling, and thread t2 wants to check whether this handle is currently spilling, it doesn't need to wait for the spill IO operation to complete in order to check whether the handle it spillable.

Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Zach Puller <[email protected]>
@zpuller zpuller changed the title Spill lock Add an explicit spill lock to the Spill Framework Dec 16, 2024
Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Zach Puller <[email protected]>
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please update the docs for all of the public APIs in SpillableHandle and for SpillableHandle itself to make it clear what APIs need to be thread safe and how they are supposed to be protected. For example spill has no indication as to how it should behave when multiple people call it. We can infer that it one wins and the others fail. I want this mostly so it is clear what the contract is for each of these APIs. spillable being a best effort API intended only for quick filtering is fine, but it needs to be documented so if someone tries to use it in a way that requires it to be exact we know that is wrong and it violates the contract. This also will let me reason about the code and who is calling it so that I can better reason about the correctness.

Also could you please explain how error handling/error recovery is intended to happen when spilling? Like if an exception is thrown while we are in the middle of trying to spill something what should happen. Are we supposed to just keep it in the spilling state?

@zpuller
Copy link
Collaborator Author

zpuller commented Dec 17, 2024

Could you please update the docs for all of the public APIs in SpillableHandle and for SpillableHandle itself to make it clear what APIs need to be thread safe and how they are supposed to be protected.

Absolutely yeah, I'm just planning to resolve the other outstanding issues in the PR and ensure I have a clear understanding and answer to the other questions, and then will document accordingly.

@zpuller zpuller marked this pull request as draft December 19, 2024 18:30
@zpuller
Copy link
Collaborator Author

zpuller commented Dec 19, 2024

I discussed offline with @abellina and he gave me some ideas on how to rework this to not require a separate lock, but to just move the IO component out of the protected section of spill and swap the buffers atomically. I pushed some changes but I'm also converting this back to draft as it's not fully ready for review yet. I still have to address some comments above, plus I am planning to do some manual testing with pulling in the most recent 25.02 branch changes.

@zpuller zpuller changed the title Add an explicit spill lock to the Spill Framework Perform handle spill IO outside of locked section in SpillFramework Dec 19, 2024
@jihoonson
Copy link
Collaborator

I'm triggering a build job to see if some test consistently fails for all PRs.

@jihoonson
Copy link
Collaborator

build

zpuller added 10 commits January 8, 2025 10:10
Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Zach Puller <[email protected]>
@zpuller
Copy link
Collaborator Author

zpuller commented Jan 14, 2025

I've pushed several updates to the branch. Per my previous comment, now the changes are structured in such a way that the interface does not need to change.

I was able to test this and see evidence in traces showing that it allows multiple threads to spill concurrently.

I considered refactoring some duplicate code fragments between the handle types around how we do the buffer swaps etc. but I'm not sure if it will actually simplify the overall code readability/PR so I held off for now.

@zpuller zpuller marked this pull request as ready for review January 14, 2025 19:28
Signed-off-by: Zach Puller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants