Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FMV] Multi Versioned Inlining #71714

Open
jroelofs opened this issue Nov 8, 2023 · 10 comments
Open

[FMV] Multi Versioned Inlining #71714

jroelofs opened this issue Nov 8, 2023 · 10 comments

Comments

@jroelofs
Copy link
Contributor

jroelofs commented Nov 8, 2023

When calling an ACLE Multi Versioned function from another that is also versioned, we should statically resolve that specific callsite to allow inlining.

Proposed semantics: take the caller's feature set, and consider all callee versions with features that are in that subset, sort them according to their priority the same way the resolver would choose, and emit a remark explaining which one was picked. Worst case, pick the default implementation of the callee.

I.e. given:

__attribute__((target_version("simd")))
int callee() {
  return 1;
}

__attribute__((target_version("default")))
int callee() {
  return 0;
}

__attribute__((target_version("simd")))
int caller() {
  return 4 + callee();
}

__attribute__((target_version("default")))
int caller() {
  return 2 + callee();
}

int main() {
    return caller();
}

We should resolve some of the call sites in the IR to the equivalent of:

__attribute__((target_version("simd")))
int callee() {
  return 1;
}

__attribute__((target_version("default")))
int callee() {
  return 0;
}

__attribute__((target_version("simd")))
int caller() {
  return 4 + _callee._Msimd();
}

__attribute__((target_version("default")))
int caller() {
  return 2 + _callee();
}

int main() {
    return caller();
}

so that the simd version of caller can be optimized to:

__attribute__((target_version("simd")))
int caller() {
  return 5;
}
@jroelofs
Copy link
Contributor Author

jroelofs commented Nov 8, 2023

cc: @ilinpv

@Gerolf-Apple
Copy link
Contributor

For FMV the caller and callee attributes need to be compatible. This may not be true for the default implementation.

@DanielKristofKiss
Copy link
Member

Thanks for bringing this up.
The elimination definitely worthwhile in cases.

In the worst case when nothing can be picked just bail out as probably not worth doing it. ( and issue a remark )

"default" implementation might pick up features from -march so it could have more features then a given caller. Runtime I don't think it could go wrong but we would generate wrong code.

Wondering how much of it to be added to ACLE, to be sure all toolchain behaves in the same way.

@jroelofs
Copy link
Contributor Author

jroelofs commented Dec 4, 2023

"default" implementation might pick up features from -march so it could have more features then a given caller. Runtime I don't think it could go wrong but we would generate wrong code.

This sounds like an issue with inlining in general, and might not be specific to this FMV optimization.

@jroelofs
Copy link
Contributor Author

jroelofs commented Dec 5, 2023

I think there's a choice to be made here as to how to manage partial resolution when the caller/callee have different target_version's in their respective cohorts:

__attribute__((target_version("neon")))
int callee() {
  return 1;
}

__attribute__((target_version("neon+dotprod")))
int callee() {
  return 2;
}

__attribute__((target_version("default")))
int callee() {
  return 3;
}

__attribute__((target_version("neon")))
int caller() {
  return 10 + callee();
}

__attribute__((target_version("default")))
int caller() {
  return 20 + callee();
}

The simplest answer is probably to declare that FMV'd callers will directly call the FMV'd callee with the same feature set, if such a callee exists. Otherwise, dispatch through the resolver.

If we wanted to be precise, the correct best choice is to take the feature sets of the caller and callee, and take their intersection, selecting the result with the highest priority.

Either one of these options can reveal a behavioral change compared to dispatching through the resolver on platforms that support a feature enabled in a callee's target_version when the corresponding caller's target_version doesn't exist.

@jroelofs
Copy link
Contributor Author

jroelofs commented Dec 5, 2023

... and given that, IMO the "simple answer" is the best one, and should be how this gets codified into the ACLE.

@andrewcarlotti
Copy link

andrewcarlotti commented Dec 22, 2023

The simplest answer is probably to declare that FMV'd callers will directly call the FMV'd callee with the same feature set, if such a callee exists. Otherwise, dispatch through the resolver.

This "simple" answer would be wrong in some cases. If you have a function foo with "sve" and "default" versions calls function bar with "sve2", "sve" and "default" versions, then you can't remove the indirection from the sve version of foo because you don't know whether the correct version of bar is the"sve" one or the "sve2" one.

Let's use foo.featsA to denote a version of foo which specifies a reequirement for feature set featsA.

For a caller version foo.featsA, a correct algorithm would require checking whether there is a callee version bar.featsZ satisfying:

  • The callee feature set featsZ is implied by the caller feature set featsA (perhaps after including any features enabled globally at the command line, or by other function attributes).
  • For every higher priority callee version bar.featsY, there is a higher priority caller version foo.featsB where the feature set featsY implies that the feature set featsB is also available.

The second of these points is checking that the resolver wouldn't select any previous callee version, and the first point is checking that the resolver wouldn't skip over the candidate callee version.

Aside from this, there's also a requirement to check that the function can't be interposed by a different implementation at link or load time; this is similar to the checks required to establish whether inlining is safe.

Updated 2024-04-02: The last references to featsY and featsB were originally the wrong way around; I've corrected them now.

@jroelofs
Copy link
Contributor Author

This "simple" answer would be wrong in some cases. If you have a function foo with "sve" and "default" versions calls function bar with "sve2", "sve" and "default" versions, then you can't remove the indirection from the sve version of foo because you don't know whether the correct version of bar is the"sve" one or the "sve2" one.

Sure, but it isn't "wrong" if we're re-defining the behavior of the spec. Agreed re: optimizing according to the current spec.

Aside from this, there's also a requirement to check that the function can't be interposed by a different implementation at link or load time; this is similar to the checks required to establish whether inlining is safe.

Yes, good point!

@labrinea
Copy link
Collaborator

Either one of these options can reveal a behavioral change compared to dispatching through the resolver on platforms that support a feature enabled in a callee's target_version when the corresponding caller's target_version doesn't exist.

This doesn't seem right to me. An optimization (which is what we are talking about here) should not alter the behavior of the original program, therefore I don't believe there is something to document in ACLE about it. That said we can still bypass the resolver as long as:

  • there is a callee version whose feature set is a subset of the caller's feature set,
  • among such candidates the one of highest priority is the best match,
  • there is no callee version whose feature set is not a subset of the caller's feature set with higher priority than the best match.

@labrinea
Copy link
Collaborator

labrinea commented Apr 7, 2024

@andrewcarlotti reading your comment more carefully I believe I now understand what you meant. Can you confirm the behavior with #87939 adheres to your criteria? Just look at the test file. Cheers

qihangkong pushed a commit to rvgpu/llvm that referenced this issue Apr 18, 2024
… callers

... when there is a callee with a matching feature set, and no other higher
priority callee.  This optimization helps the inliner see past the
ifunc+resolver to the callee that we know it will always land on.

This is a conservative implementation of: llvm/llvm-project#71714
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants