-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #[inline(always)] on closures with target feature 1.1 #111836
Conversation
r? @davidtwco (rustbot has picked a reviewer for you, use r? to override) |
@calebzulawski we can re-roll reviewer if you'd like 🙂 |
I've gotten rid of my other hundreds of notifications, I can work on this finally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I basically suggested it but now that I have a good look at the code and am a few months of thinking about target features wiser, I'm concerned about the behavior of this approach for this particular, admittedly somewhat contrived example:
#![feature(target_feature_11)]
use core::arch::x86_64::*;
#[target_feature(enable = "avx")]
pub unsafe fn escape(a: f64, b: f64, c: f64, d: f64) -> impl Fn() -> __m256d {
#[inline(always)]
move || _mm256_set_pd(a, b, c, d)
}
#[target_feature(enable = "avx")]
pub unsafe fn way_out() -> fn(__m256d) -> i32 {
#[inline(always)]
move |a| _mm256_movemask_pd(a)
}
pub fn unsafe_haven(a: f64, b: f64, c: f64, d: f64) -> i32 {
// Problem: Even though this code declared
// that it met escape()'s and way_out()'s unsafe preconditions,
// THIS function doesn't have the target features!
let escapee = unsafe { escape(a, b, c, d) };
let escaping_avx_type = escapee();
let opening = unsafe { way_out() };
opening(escaping_avx_type)
}
#[inline(always)] | ||
move || {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this forced me to check if you can annotate closures with target_feature(enable)
. (You cannot, fortunately.)
// would result in this closure being compiled without the inherited target features, but this | ||
// is probably a poor usage of `#[inline(always)]` and easily avoided by not using the attribute. | ||
if tcx.features().target_feature_11 | ||
&& tcx.is_closure(did.to_def_id()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...apparently is_closure
will return true
if this is a generator, also. I frankly have no idea how that should work, but dropping the features should remain safe in that case, at least...
// its parent function, which effectively inherits the features anyway. Boxing this closure | ||
// would result in this closure being compiled without the inherited target features, but this | ||
// is probably a poor usage of `#[inline(always)]` and easily avoided by not using the attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boxing seems like a waste, yes, but now that I am thinking about it, this seems like it could result in confusing behavior in the "escaping closure" case, when that would result, instead of the IIFE? Does that even make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlining with closures is unfortunately always confusing. Box<dyn FnOnce()>
, for example, implements FnOnce
itself:
rust/library/alloc/src/boxed.rs
Lines 2003 to 2009 in c4083fa
impl<Args: Tuple, F: FnOnce<Args> + ?Sized, A: Allocator> FnOnce<Args> for Box<F, A> { | |
type Output = <F as FnOnce<Args>>::Output; | |
extern "rust-call" fn call_once(self, args: Args) -> Self::Output { | |
<F as FnOnce<Args>>::call_once(*self, args) | |
} | |
} |
This call_once
doesn't have any inline attribute at all! Therefore, the boxed closure's call_once
inlines into this call_once
, and then it's up in the air after that.
I don't think there is actually anything wrong here. The |
Yes, I'm handwaving feature detection for this example. Technically it's not unsound until someone actually calls it. :^) The "ultimate" question seems to be if this is truly preferable over demoting the Currently, the Relevant issues and commits: |
I'm not sure if Regardless, I think this behavior is probably best for now because I'm sure there is existing code with |
It is because we say it is, as I understand it. LLVM is allowed to choose to not error on that case and simply silently ignore it, and as I understand it has in the past, and as you observed, it only applies to direct calls. I guess the matter of indirection is most of what's really becoming pertinent, now that I think about it: If we do this, then "featureful inlining" stops before the closure, but if we don't, it continues into the closure, but the closure itself may not get inlined. So if there is some reason that the closure's exterior gets "outlined" anyways, like the I might be wrong, obviously. However, one thing I am confident about is that we should not have to guess: This needs, at minimum, codegen tests in order to validate the LLVMIR is what we expect for both the direct call and indirect call cases, and we're going to need enough nesting that we can see all the consequences. This will help clarify what LLVM actually does, illuminate which approaches might actually lead to performance regressions, and catch whether LLVM decides to change its mind. |
I recommend making a There is no way this is going to be the last of these. |
To take a step back for a moment, extending That said, like the added comment indicates, using I'm basically saying it's not worth overthinking it. I'm confident this change won't do anything unsound, it might not have completely optimal codegen in unusual edge cases, but I think it's easy to work around. At worst, this behavior could be adjusted in a follow up PR, since it's just codegen and not language semantics :) |
I agree (re: "At worst, this behavior could be adjusted in a follow up PR, since it's just codegen"), I just still want to see codegen tests so that if LLVM changes their inlining rules again for target features we can catch it. :^) |
@rustbot author |
All tests good 🙂 |
Let's give this a whirl. @bors r+ rollup=never |
☀️ Test successful - checks-actions |
Finished benchmarking commit (1c44af9): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 651.172s -> 651.296s (0.02%) |
Fixes #108655. I think this is the most obvious solution that isn't overly complicated. The comment includes more justification, but I think this is likely better than demoting the
#[inline(always)]
to#[inline]
, since existing code is unaffected.