Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CancelScope shielding fails to protect against cancellation until task_status.started is called on asyncio #837

Open
2 tasks done
tapetersen opened this issue Dec 9, 2024 · 9 comments · May be fixed by #839
Open
2 tasks done
Labels
bug Something isn't working

Comments

@tapetersen
Copy link
Contributor

tapetersen commented Dec 9, 2024

Things to check first

  • I have searched the existing issues and didn't find my bug already reported there

  • I have checked that my bug is still present in the latest release

AnyIO version

4.7.0

Python version

3.10

What happened?

Running some quite involved code that needs to do shielded cleanup if it fails sometime exits a shielded scope due to cancellation of the host-task calling TaskGroup.start before the child-task has called task_status.started()

I know these are edge cases and may also be really hard to solve so feel free to resolve as such.

How can we reproduce the bug?

I've managed to reduce it to the following test-case.

async def test_anyio_cancel_shielding_start():
    entered_inner_scope = anyio.Event()
    async with (
        anyio.create_task_group() as tg,
        anyio.create_task_group() as inner_tg,
    ):
        async def inner_task(*, task_status: anyio.abc.TaskStatus = anyio.TASK_STATUS_IGNORED):
            # task_status.started() # Uncommenting this lets the task hang forever instead

            with anyio.CancelScope(shield=True):
                entered_inner_scope.set()
                try:
                    await anyio.sleep_forever()
                except anyio.get_cancelled_exc_class():
                    assert False, "Shouldn't be cancelled in a shielded scope"

        async def start_inner_task():
            await inner_tg.start(inner_task)

        tg.start_soon(start_inner_task)
        await entered_inner_scope.wait()
        tg.cancel_scope.cancel()



if __name__ == "__main__":
    anyio.run(test_anyio_cancel_shielding_start)
@tapetersen tapetersen added the bug Something isn't working label Dec 9, 2024
@tapetersen
Copy link
Contributor Author

I guess the issue is the direct asyncio.Task.cancel here that doesn't take CancelScopes into account.

We can work around it but needed to debug where the issue actually was anyway to make the workaround reliable.

@agronholm
Copy link
Owner

If you were led to understand that AnyIO cancel scopes shield from native asyncio cancellation then you were told wrong. It doesn't. AnyIO cancellation is level cancellation while asyncio cancellation is edge cancellation. Edge cancellation does not work together with level cancellation. AnyIO cancel scopes only shield from AnyIO cancellation on asyncio. On Trio, they shield from both native and AnyIO cancellation, as both use level cancellation.

@agronholm
Copy link
Owner

I think I may have spoken too soon, and you are well versed in level cancellation semantics. Apologies. I'll try the example and see what's what.

@tapetersen
Copy link
Contributor Author

tapetersen commented Dec 9, 2024

No problem and I know this is really tricky. I just took a look through the test-cases in a "how hard can it be excercise" knowing full-well that I could expect it to be very hard.

I wouldn't expect the scopes to shield from external cancellation I've started myself but the confusing issue issue here is that TaskGroup.start uses a native cancellation if itself is cancelled using only anyio API's.

That you even support the case where the actual start call is natively cancelled is more than I would've expected and complicates the ideas, at least I had, to maybe correct it.

@agronholm
Copy link
Owner

The example does not stumble onto the assertion error on Trio, so there is a bug of some sort. Hard to say at this point how difficult fixing this will be.

@tapetersen
Copy link
Contributor Author

I get that and wouldn't have expected it to error on trio as I would've assumed TaskGroup.start is just the native method there (and can rely on all tasks having proper CancelScopes) .

As said before I don't expect a fast resolution here but needed to dig this out to know what workaround was needed.

The tricky part I see here is that it needs to support (and tries to from what I see in the test-cases such as test_start_native_host_cancelled) the case where the caller of TaskGroup.start isn't itself in a CancelScope and therefore can't rely on the new task being automatically cancelled by the same scope that threw the CancelledError we just caught (and therefore resorts to using a native task.cancel() call)

@agronholm
Copy link
Owner

Flat out removing the call makes the new test in the associated PR pass but others hang.

@tapetersen
Copy link
Contributor Author

Yes, and I first tried some variations with more checks to try doing it only in the case of not coming from an anyio task (checking if we actually have a cancel_scope).

That would maybe solve it in the case of only using anyio tasks like my example while leaving the issue if you call start from a non-anyio managed task but even then it caused some issues.

Looking into the alternative of creating a new CancelScope for start that could be cancelled instead of using the native cancel directly (not sure if that would've been acceptable performance wise) I realised it may be tricky how that would behave with parent/child scope relations after task_status.started was called.

Maybe it could work to create the new scope in the TaskGroup directly even if we haven't reparented the task there yet but I don't grasp those part well enough yet to know.

Like I said, we can work around it knowing the problem and I'm mostly poking around using it as an excuse to dig into the nitty-gritty details of the code-base even if it doesn't lead to a fix.

@agronholm
Copy link
Owner

I'm sure I'll find a workable solution in due time. Good to hear that you can at least work around the problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants