Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable unwinding for catch_unwind error payloads. #99032

Conversation

danielhenrymantilla
Copy link
Contributor

@danielhenrymantilla danielhenrymantilla commented Jul 7, 2022

Fixes #86027.

This does something similar to what was suggested over #86027 (comment), that is, to tweak/amend the Box<dyn Any…> obtained from catch_unwind:

  • keep the .type_id() the same to avoid breakage with downcasting;
  • but make it so the virtual .drop_in_place() is somehow overridden with a shim around the real drop glue that prevents unwinding (e.g., by aborting when that happens).

This is achieved through the DropNoUnwindSameAnyTypeId<T>, wrapper:

  • with the very same layout as the T it wraps;
  • with an overridden/fake .type_id() so as to impersonate its inner T;
  • with a manual impl Drop which uses an abort bomb to ensure no unwinding can happen.

That way, the Box<DropNoUnwindSameAnyTypeId<T>>, when box-erased to a Box<dyn Any…>, becomes, both layout-wise, and type_id-wise, undistinguishable from a Box<T>, thence avoiding any breakage.

And yet, when that Box<dyn Any…> is implicitly dropped with catch_unwinds, no further unwinding can happen.


Alternative implementation

FWIW, another approach to achieve the same idea would be to create one's own tweaked version of the vtable of Any, and directly use it to manually unsize a Box<T> to a Box<dyn Any…> (using ptr::from_raw_parts_mut) to achieve what this PR automagically does with Box<Wrapper<T>>. That being said, since the exact vtable layout of Any, much like the vtable layout of any trait, is only known to the compiler, this other approach could only be achieved by exploiting stdlib magic knowledge of lang stuff. For instance, it would be brittle: Any's definition, the vtable generation algorithm of the compiler, and this hand-rolled vtable would all three have to remain in sync at all times.

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Jul 7, 2022
@rustbot
Copy link
Collaborator

rustbot commented Jul 7, 2022

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-highfive
Copy link
Collaborator

r? @joshtriplett

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 7, 2022
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@danielhenrymantilla danielhenrymantilla force-pushed the no_unwind_drop_panic_payload branch from 3061c68 to 2a7e2d7 Compare July 8, 2022 11:31
@rust-log-analyzer

This comment has been minimized.

library/core/src/any.rs Outdated Show resolved Hide resolved
@danielhenrymantilla
Copy link
Contributor Author

danielhenrymantilla commented Jul 20, 2022

@rustbot review

(cc @joshtriplett)

@danielhenrymantilla
Copy link
Contributor Author

r? @yaahc

@rust-highfive rust-highfive assigned yaahc and unassigned joshtriplett Jul 21, 2022
library/core/src/any.rs Outdated Show resolved Hide resolved
@nbdd0121
Copy link
Contributor

How about resume_unwind?

@danielhenrymantilla danielhenrymantilla force-pushed the no_unwind_drop_panic_payload branch 2 times, most recently from cb6eb89 to d502f6e Compare August 6, 2022 15:27
@danielhenrymantilla
Copy link
Contributor Author

@nbdd0121 good catch: handling that one would require more tweaks, so it may be better to leave it for a follow-up PR, to keep each PR easier to review?

@nbdd0121
Copy link
Contributor

nbdd0121 commented Aug 6, 2022

@nbdd0121 good catch: handling that one would require more tweaks, so it may be better to leave it for a follow-up PR, to keep each PR easier to review?

I'd like to know how you intend to solve it. If it can't be solved, then IMO this approach would be ditched in a whole and we'd need an alternative approach.

@CAD97
Copy link
Contributor

CAD97 commented Aug 7, 2022

AIUI, this approach relies on the fact that panic_any boxes the payload to instead box ButDropUnwindsAbort<Payload> instead. This is not possible to do for resume_unwind, as it receives Box<dyn Any> directly and doesn't construct a new box; the only way for resume_unwind to attach a different vtable drop_in_place would be a dynamically generated vtable (and worse than that, it'd have to be dynamically monomorphized to call the prior drop_in_place vtable entry).

Thus, without an approach like vtable chaining, I don't think it's possible to solve this problem purely in library code.

@bjorn3
Copy link
Member

bjorn3 commented Aug 7, 2022

For resume_unwind would it be possible for the thrown Box<dyn Any> to be a ButDropUnwindsAbort<Box<dyn Any>> any forward all methods to make it transparently behave as if there is no double boxing going on?

@danielhenrymantilla
Copy link
Contributor Author

I'd like to know how you intend to solve it.

An extra hidden (virtual) method on Any, that performs a *mut dyn Any -> *mut dyn Any conversion (assuming *mut is Box, much like the panic machinery does), using something akin to https://github.com/rust-lang/rust/pull/99032/files#diff-88e2a536317b831c2e958b9205fde12f5edaabefba963bdd3a7503bbdedf8da9R689 (just imagine the input already being boxed, and thus using something à la transmute::<Box<T /* : Sized */>, Box<Wrapper<T>>>() to perform the Box<Wrapper<T>> construction (which is then widened to a Box<dyn Any>)

@rust-log-analyzer

This comment has been minimized.

@Mark-Simulacrum Mark-Simulacrum removed the I-libs-nominated Nominated for discussion during a libs team meeting. label Aug 14, 2022
@Mark-Simulacrum
Copy link
Member

I think it makes sense for T-libs-api to discuss whether we want such a change in the catch_unwind impl; I'm going to nominate for that purpose. There was some past discussion in T-libs it sounds like (#86027 (comment)), but not in T-libs-api, and that discussion is fairly old.

I think my main concern is that this seems like a relatively invasive solution (e.g., adding some overhead to dyn Any), and I'm not sure the relative benefit to a small amount of code using catch_unwind is worth it. At minimum, we should be adding a note to the function IMO that we're adding this behavior, or, if we don't decide to merge this PR adding a note that the caller should be careful to catch possible unwinds from the Drop of the Box. (Either way I think we have to document this).

I also don't see anything immediately wrong with the Any TypeId override, but it doesn't make me particularly confident that we're not opening up some other soundness hole (e.g., some way to observe that wrapper and cause soundness problems as a result). I think it should be OK, but it doesn't inspire confidence in my eyes.

Have we considered a compiler intrinsic based implementation here, that avoids the overhead by directly replacing the vtable with one where the drop_in_place is wrapped in an abort-on-unwind wrapper?

@Mark-Simulacrum Mark-Simulacrum added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Aug 14, 2022
@CAD97
Copy link
Contributor

CAD97 commented Aug 14, 2022

Have we considered a compiler intrinsic based implementation here, that avoids the overhead by directly replacing the vtable with one where the drop_in_place is wrapped in an abort-on-unwind wrapper?

Either way, the cost is going to be the same. The vtable needs to be codegenned if it's going to be used, because it doesn't have any state to store the old pointer.

The "better" solution would be to change catch_unwind from returning Box<dyn Any> to Box<PanicPayload>1. Without an overload to allow taking a function pointer with the Box<dyn Any> type, though, it'd be a breaking change.

Footnotes

  1. struct PanicPayload(ManuallyDrop<dyn Any>);
    impl Drop for PanicPayload {
        fn drop(&mut self) {
            panic_count::increment();
            if let Err(_err) = catch_unwind(|| unsafe { ManuallyDrop::drop(&mut self.0) }) {
                // should have aborted already
                rtprintpanic!("fatal error: unwound while dropping panic payload");
                rtabort!();
            }
            panic_count::decrement();
        }
    }
    impl Any for PanicPayload {
        // questionable; what this PR is doing
        fn type_id(&self) -> TypeId { self.0.type_id() }
    }
    

@Mark-Simulacrum
Copy link
Member

Ah, right. I suppose that could change if we managed to move this to the inputs of panic (e.g., resume_unwind, etc), or were willing to runtime create them, but neither seem particularly appealing.

Anyway, will leave this nominated.

@danielhenrymantilla
Copy link
Contributor Author

danielhenrymantilla commented Aug 16, 2022

It might worth to only conditionally have these facilities when compiling libcore with unwind enabled?

This would be nice, yeah. I don't know if it's actually doable, or at least I do know that I don't know how to do it 😅. Very easily left as a follow-up PR, though, see below.


Since I won't be able to attend the libs meeting, here are some notes / my take for t-libs to discuss:

I think my main concern is that this seems like a relatively invasive solution (e.g., adding some overhead to dyn Any), and I'm not sure the relative benefit to a small amount of code using catch_unwind is worth it

I'd like to nuance two things here:

So I'm kind of tempted to reverse the question: imagine if we already had dyn Any with bigger vtables, and somebody made a PR to reduce them at the cost of introducing the catch_unwind pitfall, would we go for it?

  • Imho, thus, investigating ways to follow-up with improvements over my "naïve" implemenation is rather the way to go 🙂

Mainly: it is very plausible that, mid-term, we end up with a compiler / lang-based implementation of this feature. The compiler is more fit to properly interact with the panic="abort" dichotomy, as well as patching vtables without involving extra types (e.g., no type_id specialization).

With that in mind, this PR then appears to be a way to directly fix and close #86027 today, replacing the technical debt of "we have a potentially soundness-endangering footgun" with the technical debt of "we could try to make dyn Anys vtable smaller1, at least on panic="abort"2, and remove type_id specialization" (through a lang-based implementation).


Speaking of type_id specialization, I agree it's something with which one has to be very careful. That's why I've made both the trait for such a specialization, and the only instance specializing it, private to the ::core::any module: the soundness contract can be locally audited.

  • And regarding such local audit, here is mine. The only instance of the wrapper ever constructed is when casting a *mut T to a *mut SameTypeIdWrapper<T>, which is then immediately type-erased into a *mut dyn Any: the static SameTypeIdWrapper is thus only ever so much as glimpsed inside this one function performing that cast, right before the type is immediately erased, so before any statically dispatchable .type_id() whatsoever is used on it. That's why any soundness-endangering type_id mistmatch is impossible to observe (TypeId::of::<SameTypeIdWrapper<…>>(), itself, is impossible to observe).

Footnotes

  1. as CAD mentioned, it is unlikely that we'll actually be able to avoid the vtable churn unless we get rid of catch_unwind's current signature

  2. or some future panic-in-drop=abort knob (the existence of such a knob, by the way, is thus orthogonal to the problem of fixing Footgun with catch_unwind when catching panic-on-drop types #86027, as showcased by this PR).

@Mark-Simulacrum
Copy link
Member

I think there's a lot of code using catch_unwind, but very little code that panics with some T whose Drop panics. That doesn't mean we shouldn't fix this, but I think the cost/benefit is not crystal clear to me. At the same time, I agree the cost is relatively low, so I'm mostly in the middle on whether this is the right choice for std. On the other hand, preventing nasty behavior like this seems like the right thing in general, so maybe it is worth it. (My calculus would be a little different if this was an actual soundness bug, but in practice it is true that the bug is in the code not guarding against that extra panic -- and we already know that panic from any Drop is a "common" source of bugs in unsafe code, at least at the theoretical level; it'd be interesting to do an audit of a bunch of cases to see whether fixing catch_unwind actually fixes their code or just removes one of multiple such "oops" points).

I definitely agree that we can iterate in the future so long as we're okay with the general concept of this special guard.

re:soundness, I agree, it seems pretty local and correct to me :) It's just the kind of code that makes me passively unhappy.

@LegionMammal978
Copy link
Contributor

LegionMammal978 commented Aug 16, 2022

(My calculus would be a little different if this was an actual soundness bug, but in practice it is true that the bug is in the code not guarding against that extra panic -- and we already know that panic from any Drop is a "common" source of bugs in unsafe code, at least at the theoretical level; it'd be interesting to do an audit of a bunch of cases to see whether fixing catch_unwind actually fixes their code or just removes one of multiple such "oops" points)

A did a survey a while back of ~160 crates that use catch_unwind(); I mainly focused on whether they can unsoundly unwind under current semantics. Proving soundness with these modified semantics would be rather time consuming, since it can depend on the entire crate's abstractions as well as the often-poorly-documented guarantees of foreign libraries. The main problem is, many crates tend to use catch_unwind() as their outermost line of defense against panics, not knowing about the payload footgun. Unsafe catch_unwind() usage is mainly in extern "C" callback wrappers, which very rarely do user-supplied-type dropping or any other processing after they call the user-supplied callback. (Most don't account for allocations like Box::new() unwinding, though; if that ever became a possibility in stable Rust, a number of crates would become unsound.)

@Mark-Simulacrum
Copy link
Member

Yeah, that makes sense -- I think surveying sites that start panics (e.g., panic!, resume_unwind, etc) would be more effective there, but the point stands.

@Mark-Simulacrum Mark-Simulacrum added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Aug 20, 2022
@bors
Copy link
Contributor

bors commented Aug 23, 2022

☔ The latest upstream changes (presumably #99917) made this pull request unmergeable. Please resolve the merge conflicts.

@danielhenrymantilla danielhenrymantilla force-pushed the no_unwind_drop_panic_payload branch from f9ce33f to 67cd60f Compare August 24, 2022 10:13
@m-ou-se m-ou-se removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Aug 30, 2022
@bors
Copy link
Contributor

bors commented Sep 26, 2022

☔ The latest upstream changes (presumably #102051) made this pull request unmergeable. Please resolve the merge conflicts.

danielhenrymantilla and others added 3 commits September 26, 2022 21:51
This does something similar to what was suggested over
rust-lang#86027 (comment)
that is, to cheat a little bit and tweak/amend the `Box<dyn Any…>` obtained
from `catch_unwind`:
  - keep the `.type_id()` the same to avoid breakage with downcasting;
  - but make it so the virtual `.drop_in_place()` is somehow overridden with
    a shim around the real drop glue that prevents unwinding (_e.g._, by
    aborting when that happens).

This is achieved through the `DropNoUnwindSameAnyTypeId<T>`, wrapper:
  - with the very same layout as the `T` it wraps;
  - with an overridden/fake `.type_id()` so as to impersonate its inner `T`;
  - with a manual `impl Drop` which uses an abort bomb to ensure no
    unwinding can happen.

That way, the `Box<DropNoUnwindSameAnyTypeId<T>>`, when box-erased to a
`Box<dyn Any…>`, becomes, both layout-wise, and `type_id`-wise,
undistinguishable from a `Box<T>`, thence avoiding any breakage.

And yet, when that `Box<dyn Any…>` is implicitly dropped with
`catch_unwind`s, no further unwinding can happen.

Handle `resume_unwind` payloads too

Clean up logic: centralize drop-override in catch_unwind & virtual method
Mark `panic_abort` as `no-unwind`.

Co-Authored-By: Christopher Durham <[email protected]>
Co-Authored-By: Gary Guo <[email protected]>
 1. Fix runtime UI test that expected drop of panic payload to unwind

 2. Add panic payload tests: ensure proper drop, and add one with catch_unwind

 3. Add test asserting proper abort of unwind in panic payload
@danielhenrymantilla danielhenrymantilla force-pushed the no_unwind_drop_panic_payload branch from 67cd60f to 94118a4 Compare September 26, 2022 20:00
@bors
Copy link
Contributor

bors commented Oct 12, 2022

☔ The latest upstream changes (presumably #102732) made this pull request unmergeable. Please resolve the merge conflicts.

@m-ou-se m-ou-se added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Nov 15, 2022
@m-ou-se
Copy link
Member

m-ou-se commented Nov 15, 2022

As discussed in the libs-api meeting: marking this as blocked. Let's first see where rust-lang/rfcs#3288 goes.

@Dylan-DPC
Copy link
Member

Closing this as the rfc hasn't moved much in 2 years and still open with new concerns being raised. So it's going to be a long time before it gets accepted (if it even does) and unblocks this pr. It is better to create a new pr once that happens.

@Dylan-DPC Dylan-DPC closed this Jul 23, 2024
@danielhenrymantilla
Copy link
Contributor Author

Which concerns? There has only been one concern for this PR, about a minor regression w.r.t. slightly bigger Any codegen (when panic = "unwind"). In practice, the actual unofficial concern that this PR has had is that it subsumes the main argument on which the bigger and way more regressing rust-lang/rfcs#3288 hinges 😕

@Dylan-DPC
Copy link
Member

@danielhenrymantilla uhm sorry, I meant concerns on the rfc not on this pr :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-blocked Status: Blocked on something else such as an RFC or other implementation work. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Footgun with catch_unwind when catching panic-on-drop types