Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FMV][AArch64][clang] Emit fmv-features metadata in LLVM IR. #118544

Merged
merged 4 commits into from
Jan 7, 2025

Conversation

labrinea
Copy link
Collaborator

@labrinea labrinea commented Dec 3, 2024

We need to be able to propagate information about FMV attribute strings from C/C++ source to LLVM IR. This is necessary so that we can distinguish which target-features are coming from the cmdline, which are coming from the target attribute, and which are coming from feature dependency expansion. We need this for static resolution of calls in LLVM. Here's a motivating example:

Suppose you have target_version("i8mm+dotprod") and target_version("fcma"). The first version clearly has higher priority. Now suppose you specify -march=armv8-a+i8mm on the command line. Then the versions would have target-features "+i8mm,+dotprod" and "+i8mm,+fcma" respectively. If you are using those to deduce version priority, then you would incorrectly deduce that the second version was higher priority than the first.

We need to be able to propagate information about FMV attribute strings from
C/C++ source to LLVM IR. This is necessary so that we can distinguish which
target-features are coming from the cmdline, which are coming from the target
attribute, and which are coming from feature dependency expansion. We need
this for static resolution of calls in LLVM. Here's a motivating example:

Suppose you have target_version("i8mm+dotprod") and target_version("fcma").
The first version clearly has higher priority. Now suppose you specify
-march=armv8-a+i8mm on the command line. Then the versions would have
target-features "+i8mm,+dotprod" and "+i8mm,+fcma" respectively. If you
are using those to deduce version priority, then you would incorrectly
deduce that the second version was higher priority than the first.
@labrinea labrinea requested a review from jroelofs December 3, 2024 20:50
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen labels Dec 3, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 3, 2024

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Alexandros Lamprineas (labrinea)

Changes

We need to be able to propagate information about FMV attribute strings from C/C++ source to LLVM IR. This is necessary so that we can distinguish which target-features are coming from the cmdline, which are coming from the target attribute, and which are coming from feature dependency expansion. We need this for static resolution of calls in LLVM. Here's a motivating example:

Suppose you have target_version("i8mm+dotprod") and target_version("fcma"). The first version clearly has higher priority. Now suppose you specify -march=armv8-a+i8mm on the command line. Then the versions would have target-features "+i8mm,+dotprod" and "+i8mm,+fcma" respectively. If you are using those to deduce version priority, then you would incorrectly deduce that the second version was higher priority than the first.


Patch is 45.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118544.diff

7 Files Affected:

  • (modified) clang/lib/AST/ASTContext.cpp (+1-1)
  • (modified) clang/lib/CodeGen/CodeGenModule.cpp (+15-1)
  • (modified) clang/test/CodeGen/AArch64/fmv-dependencies.c (+48-47)
  • (added) clang/test/CodeGen/AArch64/fmv-features.c (+202)
  • (modified) clang/test/CodeGen/AArch64/fmv-streaming.c (+11-14)
  • (modified) clang/test/CodeGen/attr-target-clones-aarch64.c (+37-37)
  • (modified) clang/test/CodeGen/attr-target-version.c (+25-25)
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index 80e8c5b9df58e7..93d24aabbe04b2 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -14294,7 +14294,7 @@ QualType ASTContext::getCorrespondingSignedFixedPointType(QualType Ty) const {
 // corresponding backend features (which may contain duplicates).
 static std::vector<std::string> getFMVBackendFeaturesFor(
     const llvm::SmallVectorImpl<StringRef> &FMVFeatStrings) {
-  std::vector<std::string> BackendFeats{{"+fmv"}};
+  std::vector<std::string> BackendFeats;
   llvm::AArch64::ExtensionSet FeatureBits;
   for (StringRef F : FMVFeatStrings)
     if (auto FMVExt = llvm::AArch64::parseFMVExtension(F))
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index d3d5c0743a520b..e89fadabc57fee 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2745,7 +2745,21 @@ bool CodeGenModule::GetCPUAndFeaturesAttributes(GlobalDecl GD,
     Attrs.addAttribute("target-features", llvm::join(Features, ","));
     AddedAttr = true;
   }
-
+  if (getTarget().getTriple().isAArch64()) {
+    llvm::SmallVector<StringRef, 8> Feats;
+    if (TV)
+      TV->getFeatures(Feats);
+    else if (TC)
+      TC->getFeatures(Feats, GD.getMultiVersionIndex());
+    if (!Feats.empty()) {
+      llvm::sort(Feats);
+      std::string FMVFeatures;
+      for (StringRef F : Feats)
+        FMVFeatures.append(",+" + F.str());
+      Attrs.addAttribute("fmv-features", FMVFeatures.substr(1));
+      AddedAttr = true;
+    }
+  }
   return AddedAttr;
 }
 
diff --git a/clang/test/CodeGen/AArch64/fmv-dependencies.c b/clang/test/CodeGen/AArch64/fmv-dependencies.c
index f74b7aa32c7dca..6537545135a335 100644
--- a/clang/test/CodeGen/AArch64/fmv-dependencies.c
+++ b/clang/test/CodeGen/AArch64/fmv-dependencies.c
@@ -42,7 +42,7 @@ __attribute__((target_version("flagm"))) int fmv(void) { return 0; }
 // CHECK: define dso_local i32 @fmv._Mflagm2() #[[flagm2:[0-9]+]] {
 __attribute__((target_version("flagm2"))) int fmv(void) { return 0; }
 
-// CHECK: define dso_local i32 @fmv._Mfp() #[[default:[0-9]+]] {
+// CHECK: define dso_local i32 @fmv._Mfp() #[[fp:[0-9]+]] {
 __attribute__((target_version("fp"))) int fmv(void) { return 0; }
 
 // CHECK: define dso_local i32 @fmv._Mfp16() #[[fp16:[0-9]+]] {
@@ -99,7 +99,7 @@ __attribute__((target_version("sha2"))) int fmv(void) { return 0; }
 // CHECK: define dso_local i32 @fmv._Msha3() #[[sha3:[0-9]+]] {
 __attribute__((target_version("sha3"))) int fmv(void) { return 0; }
 
-// CHECK: define dso_local i32 @fmv._Msimd() #[[default]] {
+// CHECK: define dso_local i32 @fmv._Msimd() #[[simd:[0-9]+]] {
 __attribute__((target_version("simd"))) int fmv(void) { return 0; }
 
 // CHECK: define dso_local i32 @fmv._Msm4() #[[sm4:[0-9]+]] {
@@ -150,48 +150,49 @@ int caller() {
   return fmv();
 }
 
-// CHECK: attributes #[[aes]] = { {{.*}} "target-features"="+aes,+fmv,+fp-armv8,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[bf16]] = { {{.*}} "target-features"="+bf16,+fmv,+fp-armv8,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[bti]] = { {{.*}} "target-features"="+bti,+fmv,+fp-armv8,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[crc]] = { {{.*}} "target-features"="+crc,+fmv,+fp-armv8,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[dit]] = { {{.*}} "target-features"="+dit,+fmv,+fp-armv8,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[dotprod]] = { {{.*}} "target-features"="+dotprod,+fmv,+fp-armv8,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[dpb]] = { {{.*}} "target-features"="+ccpp,+fmv,+fp-armv8,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[dpb2]] = { {{.*}} "target-features"="+ccdp,+ccpp,+fmv,+fp-armv8,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[f32mm]] = { {{.*}} "target-features"="+f32mm,+fmv,+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+v8a"
-// CHECK: attributes #[[f64mm]] = { {{.*}} "target-features"="+f64mm,+fmv,+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+v8a"
-// CHECK: attributes #[[fcma]] = { {{.*}} "target-features"="+complxnum,+fmv,+fp-armv8,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[flagm]] = { {{.*}} "target-features"="+flagm,+fmv,+fp-armv8,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[flagm2]] = { {{.*}} "target-features"="+altnzcv,+flagm,+fmv,+fp-armv8,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[default]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[fp16]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+fullfp16,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[fp16fml]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+fp16fml,+fullfp16,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[frintts]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+fptoint,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[i8mm]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+i8mm,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[jscvt]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+jsconv,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[ls64]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+ls64,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[lse]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+lse,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[memtag]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+mte,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[mops]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+mops,+neon,+outline-atomics,+v8a"
-// CHECK: attributes #[[predres]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+neon,+outline-atomics,+predres,+v8a"
-// CHECK: attributes #[[rcpc]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+neon,+outline-atomics,+rcpc,+v8a"
-// CHECK: attributes #[[rcpc2]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+neon,+outline-atomics,+rcpc,+rcpc-immo,+v8a"
-// CHECK: attributes #[[rcpc3]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+neon,+outline-atomics,+rcpc,+rcpc-immo,+rcpc3,+v8a"
-// CHECK: attributes #[[rdm]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+neon,+outline-atomics,+rdm,+v8a"
-// CHECK: attributes #[[rng]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+neon,+outline-atomics,+rand,+v8a"
-// CHECK: attributes #[[sb]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+neon,+outline-atomics,+sb,+v8a"
-// CHECK: attributes #[[sha2]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+neon,+outline-atomics,+sha2,+v8a"
-// CHECK: attributes #[[sha3]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+neon,+outline-atomics,+sha2,+sha3,+v8a"
-// CHECK: attributes #[[sm4]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+neon,+outline-atomics,+sm4,+v8a"
-// CHECK: attributes #[[sme]] = { {{.*}} "target-features"="+bf16,+fmv,+fp-armv8,+neon,+outline-atomics,+sme,+v8a"
-// CHECK: attributes #[[sme_f64f64]] = { {{.*}} "target-features"="+bf16,+fmv,+fp-armv8,+neon,+outline-atomics,+sme,+sme-f64f64,+v8a"
-// CHECK: attributes #[[sme_i16i64]] = { {{.*}} "target-features"="+bf16,+fmv,+fp-armv8,+neon,+outline-atomics,+sme,+sme-i16i64,+v8a"
-// CHECK: attributes #[[sme2]] = { {{.*}} "target-features"="+bf16,+fmv,+fp-armv8,+neon,+outline-atomics,+sme,+sme2,+v8a"
-// CHECK: attributes #[[ssbs]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+neon,+outline-atomics,+ssbs,+v8a"
-// CHECK: attributes #[[sve]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+v8a"
-// CHECK: attributes #[[sve2]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+sve2,+v8a"
-// CHECK: attributes #[[sve2_aes]] = { {{.*}} "target-features"="+aes,+fmv,+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+sve-aes,+sve2,+sve2-aes,+v8a"
-// CHECK: attributes #[[sve2_bitperm]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+sve2,+sve2-bitperm,+v8a"
-// CHECK: attributes #[[sve2_sha3]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+fullfp16,+neon,+outline-atomics,+sha2,+sha3,+sve,+sve2,+sve2-sha3,+v8a"
-// CHECK: attributes #[[sve2_sm4]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+fullfp16,+neon,+outline-atomics,+sm4,+sve,+sve2,+sve2-sm4,+v8a"
-// CHECK: attributes #[[wfxt]] = { {{.*}} "target-features"="+fmv,+fp-armv8,+neon,+outline-atomics,+v8a,+wfxt"
+// CHECK: attributes #[[aes]] = { {{.*}} "target-features"="+aes,+fp-armv8,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[bf16]] = { {{.*}} "target-features"="+bf16,+fp-armv8,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[bti]] = { {{.*}} "target-features"="+bti,+fp-armv8,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[crc]] = { {{.*}} "target-features"="+crc,+fp-armv8,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[dit]] = { {{.*}} "target-features"="+dit,+fp-armv8,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[dotprod]] = { {{.*}} "target-features"="+dotprod,+fp-armv8,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[dpb]] = { {{.*}} "target-features"="+ccpp,+fp-armv8,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[dpb2]] = { {{.*}} "target-features"="+ccdp,+ccpp,+fp-armv8,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[f32mm]] = { {{.*}} "target-features"="+f32mm,+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+v8a"
+// CHECK: attributes #[[f64mm]] = { {{.*}} "target-features"="+f64mm,+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+v8a"
+// CHECK: attributes #[[fcma]] = { {{.*}} "target-features"="+complxnum,+fp-armv8,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[flagm]] = { {{.*}} "target-features"="+flagm,+fp-armv8,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[flagm2]] = { {{.*}} "target-features"="+altnzcv,+flagm,+fp-armv8,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[fp]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[fp16]] = { {{.*}} "target-features"="+fp-armv8,+fullfp16,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[fp16fml]] = { {{.*}} "target-features"="+fp-armv8,+fp16fml,+fullfp16,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[frintts]] = { {{.*}} "target-features"="+fp-armv8,+fptoint,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[i8mm]] = { {{.*}} "target-features"="+fp-armv8,+i8mm,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[jscvt]] = { {{.*}} "target-features"="+fp-armv8,+jsconv,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[ls64]] = { {{.*}} "target-features"="+fp-armv8,+ls64,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[lse]] = { {{.*}} "target-features"="+fp-armv8,+lse,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[memtag]] = { {{.*}} "target-features"="+fp-armv8,+mte,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[mops]] = { {{.*}} "target-features"="+fp-armv8,+mops,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[predres]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+predres,+v8a"
+// CHECK: attributes #[[rcpc]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+rcpc,+v8a"
+// CHECK: attributes #[[rcpc2]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+rcpc,+rcpc-immo,+v8a"
+// CHECK: attributes #[[rcpc3]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+rcpc,+rcpc-immo,+rcpc3,+v8a"
+// CHECK: attributes #[[rdm]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+rdm,+v8a"
+// CHECK: attributes #[[rng]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+rand,+v8a"
+// CHECK: attributes #[[sb]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+sb,+v8a"
+// CHECK: attributes #[[sha2]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+sha2,+v8a"
+// CHECK: attributes #[[sha3]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+sha2,+sha3,+v8a"
+// CHECK: attributes #[[simd]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+v8a"
+// CHECK: attributes #[[sm4]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+sm4,+v8a"
+// CHECK: attributes #[[sme]] = { {{.*}} "target-features"="+bf16,+fp-armv8,+neon,+outline-atomics,+sme,+v8a"
+// CHECK: attributes #[[sme_f64f64]] = { {{.*}} "target-features"="+bf16,+fp-armv8,+neon,+outline-atomics,+sme,+sme-f64f64,+v8a"
+// CHECK: attributes #[[sme_i16i64]] = { {{.*}} "target-features"="+bf16,+fp-armv8,+neon,+outline-atomics,+sme,+sme-i16i64,+v8a"
+// CHECK: attributes #[[sme2]] = { {{.*}} "target-features"="+bf16,+fp-armv8,+neon,+outline-atomics,+sme,+sme2,+v8a"
+// CHECK: attributes #[[ssbs]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+ssbs,+v8a"
+// CHECK: attributes #[[sve]] = { {{.*}} "target-features"="+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+v8a"
+// CHECK: attributes #[[sve2]] = { {{.*}} "target-features"="+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+sve2,+v8a"
+// CHECK: attributes #[[sve2_aes]] = { {{.*}} "target-features"="+aes,+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+sve-aes,+sve2,+sve2-aes,+v8a"
+// CHECK: attributes #[[sve2_bitperm]] = { {{.*}} "target-features"="+fp-armv8,+fullfp16,+neon,+outline-atomics,+sve,+sve2,+sve2-bitperm,+v8a"
+// CHECK: attributes #[[sve2_sha3]] = { {{.*}} "target-features"="+fp-armv8,+fullfp16,+neon,+outline-atomics,+sha2,+sha3,+sve,+sve2,+sve2-sha3,+v8a"
+// CHECK: attributes #[[sve2_sm4]] = { {{.*}} "target-features"="+fp-armv8,+fullfp16,+neon,+outline-atomics,+sm4,+sve,+sve2,+sve2-sm4,+v8a"
+// CHECK: attributes #[[wfxt]] = { {{.*}} "target-features"="+fp-armv8,+neon,+outline-atomics,+v8a,+wfxt"
diff --git a/clang/test/CodeGen/AArch64/fmv-features.c b/clang/test/CodeGen/AArch64/fmv-features.c
new file mode 100644
index 00000000000000..24082413d9e2c7
--- /dev/null
+++ b/clang/test/CodeGen/AArch64/fmv-features.c
@@ -0,0 +1,202 @@
+// Test all of the AArch64 fmv-features metadata without any dependency expansion.
+// It is used to propagate the attribute string information from C/C++ source to LLVM IR.
+
+// RUN: %clang --target=aarch64-linux-gnu --rtlib=compiler-rt -emit-llvm -S -o - %s | FileCheck %s
+
+// CHECK: define dso_local i32 @fmv._Maes() #[[aes:[0-9]+]] {
+__attribute__((target_version("aes"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mbf16() #[[bf16:[0-9]+]] {
+__attribute__((target_version("bf16"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mbti() #[[bti:[0-9]+]] {
+__attribute__((target_version("bti"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mcrc() #[[crc:[0-9]+]] {
+__attribute__((target_version("crc"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mdit() #[[dit:[0-9]+]] {
+__attribute__((target_version("dit"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mdotprod() #[[dotprod:[0-9]+]] {
+__attribute__((target_version("dotprod"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mdpb() #[[dpb:[0-9]+]] {
+__attribute__((target_version("dpb"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mdpb2() #[[dpb2:[0-9]+]] {
+__attribute__((target_version("dpb2"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mf32mm() #[[f32mm:[0-9]+]] {
+__attribute__((target_version("f32mm"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mf64mm() #[[f64mm:[0-9]+]] {
+__attribute__((target_version("f64mm"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mfcma() #[[fcma:[0-9]+]] {
+__attribute__((target_version("fcma"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mflagm() #[[flagm:[0-9]+]] {
+__attribute__((target_version("flagm"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mflagm2() #[[flagm2:[0-9]+]] {
+__attribute__((target_version("flagm2"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mfp() #[[fp:[0-9]+]] {
+__attribute__((target_version("fp"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mfp16() #[[fp16:[0-9]+]] {
+__attribute__((target_version("fp16"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mfp16fml() #[[fp16fml:[0-9]+]] {
+__attribute__((target_version("fp16fml"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mfrintts() #[[frintts:[0-9]+]] {
+__attribute__((target_version("frintts"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mi8mm() #[[i8mm:[0-9]+]] {
+__attribute__((target_version("i8mm"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mjscvt() #[[jscvt:[0-9]+]] {
+__attribute__((target_version("jscvt"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mls64() #[[ls64:[0-9]+]] {
+__attribute__((target_version("ls64"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mlse() #[[lse:[0-9]+]] {
+__attribute__((target_version("lse"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mmemtag() #[[memtag:[0-9]+]] {
+__attribute__((target_version("memtag"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mmops() #[[mops:[0-9]+]] {
+__attribute__((target_version("mops"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mpredres() #[[predres:[0-9]+]] {
+__attribute__((target_version("predres"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mrcpc() #[[rcpc:[0-9]+]] {
+__attribute__((target_version("rcpc"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mrcpc2() #[[rcpc2:[0-9]+]] {
+__attribute__((target_version("rcpc2"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mrcpc3() #[[rcpc3:[0-9]+]] {
+__attribute__((target_version("rcpc3"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mrdm() #[[rdm:[0-9]+]] {
+__attribute__((target_version("rdm"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mrng() #[[rng:[0-9]+]] {
+__attribute__((target_version("rng"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Msb() #[[sb:[0-9]+]] {
+__attribute__((target_version("sb"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Msha2() #[[sha2:[0-9]+]] {
+__attribute__((target_version("sha2"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Msha3() #[[sha3:[0-9]+]] {
+__attribute__((target_version("sha3"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Msimd() #[[simd:[0-9]+]] {
+__attribute__((target_version("simd"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Msm4() #[[sm4:[0-9]+]] {
+__attribute__((target_version("sm4"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Msme() #[[sme:[0-9]+]] {
+__attribute__((target_version("sme"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Msme-f64f64() #[[sme_f64f64:[0-9]+]] {
+__attribute__((target_version("sme-f64f64"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Msme-i16i64() #[[sme_i16i64:[0-9]+]] {
+__attribute__((target_version("sme-i16i64"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Msme2() #[[sme2:[0-9]+]] {
+__attribute__((target_version("sme2"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mssbs() #[[ssbs:[0-9]+]] {
+__attribute__((target_version("ssbs"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Msve() #[[sve:[0-9]+]] {
+__attribute__((target_version("sve"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Msve2() #[[sve2:[0-9]+]] {
+__attribute__((target_version("sve2"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Msve2-aes() #[[sve2_aes:[0-9]+]] {
+__attribute__((target_version("sve2-aes"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Msve2-bitperm() #[[sve2_bitperm:[0-9]+]] {
+__attribute__((target_version("sve2-bitperm"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Msve2-sha3() #[[sve2_sha3:[0-9]+]] {
+__attribute__((target_version("sve2-sha3"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Msve2-sm4() #[[sve2_sm4:[0-9]+]] {
+__attribute__((target_version("sve2-sm4"))) int fmv(void) { return 0; }
+
+// CHECK: define dso_local i32 @fmv._Mwfxt() #[[wfxt:[0-9]+]] {
+__attribute__((target_version("wfxt"))) int fmv(void) { return 0; }
+
+// CHECK: define ds...
[truncated]

@labrinea labrinea marked this pull request as draft December 3, 2024 21:10
@jroelofs
Copy link
Contributor

jroelofs commented Dec 3, 2024

Suppose you have target_version("i8mm+dotprod") and target_version("fcma"). The first version clearly has higher priority

According to the current rules yes, but IMO that rule is broken and doesn't match user expectation. A motivating example might be target_version("simd+dotprod") and target_version("sve"): when sve is available we should pick that version, but the current rule prioritizes the number of attribute-specified features over their overall weight.

I think if we fix that rule, and instead sort priorities lexicographically after adding implicit features implied by command-line options, then this change becomes unnecessary. Mind starting a thread with the gcc folks to sort this out?

@labrinea
Copy link
Collaborator Author

labrinea commented Dec 3, 2024

Suppose you have target_version("i8mm+dotprod") and target_version("fcma"). The first version clearly has higher priority

According to the current rules yes, but IMO that rule is broken and doesn't match user expectation. A motivating example might be target_version("simd+dotprod") and target_version("sve"): when sve is available we should pick that version, but the current rule prioritizes the number of attribute-specified features over their overall weight.

I think if we fix that rule, and instead sort priorities lexicographically after adding implicit features implied by command-line options, then this change becomes unnecessary. Mind starting a thread with the gcc folks to sort this out?

@andrewcarlotti is planning a PR in ACLE to fix the priorities and the selection algorithm. I will tag you there when this happens.

Copy link
Contributor

@jroelofs jroelofs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch itself LGTM though, despite my gripes on the current prioritization rule.

@labrinea
Copy link
Collaborator Author

labrinea commented Dec 4, 2024

A motivating example might be target_version("simd+dotprod") and target_version("sve"): when sve is available we should pick that version, but the current rule prioritizes the number of attribute-specified features over their overall weight.

I agree about this. As I said we will adress the priorities and selection algorithm separately.

I think if we fix that rule, and instead sort priorities lexicographically after adding implicit features implied by command-line options, then this change becomes unnecessary

The lexicographic ordering discards the ACLE priorities. I don't think it's what we want here. I am not sure how I feel about mixing the features inherited from the cmdline option with the fmv features. If we do that we should be detecting them at runtime if we want the static resolution optimization to behave the same as the resolver. However, making the version selection not only depend on the source code but also the cmdline may not be the right thing to do.

@labrinea labrinea marked this pull request as ready for review December 4, 2024 10:49
@jroelofs
Copy link
Contributor

jroelofs commented Dec 4, 2024

A motivating example might be target_version("simd+dotprod") and target_version("sve"): when sve is available we should pick that version, but the current rule prioritizes the number of attribute-specified features over their overall weight.

I agree about this. As I said we will adress the priorities and selection algorithm separately.

I think if we fix that rule, and instead sort priorities lexicographically after adding implicit features implied by command-line options, then this change becomes unnecessary

The lexicographic ordering discards the ACLE priorities. I don't think it's what we want here.

I should clarify that I mean: take the listed features for a given function, sort them according to their priorities highest to lowest as if each priority were a letter in a ${highest_priority+1} sized alphabet and the whole set of priorities for a function comprised a word made up of those letters, and then sort those words lexicographically with ties broken in favor of word length.

Consider a multi-versioned function with simd+dotprod, simd, sve, and default:

  • simd+dotprod has "letters" {100, 104}. Sorted, that's {104, 100}.
  • simd has one "letter" {100}
  • sve has one "letter" {310}
  • default has no "letters" {}

Now with those sorted according to that lexicographic order, we have a total order for the set:

default < simd < simd+dotprod < sve

The selection process then filters out elements that cannot be chosen per runtime availability of features, and selects the remaining choice that is "greatest" according to that order.

I am not sure how I feel about mixing the features inherited from the cmdline option with the fmv features. If we do that we should be detecting them at runtime if we want the static resolution optimization to behave the same as the resolver. However, making the version selection not only depend on the source code but also the cmdline may not be the right thing to do.

The way I see it, setting command-line options sets the floor for minimum feature set: one cannot expect to run code on a machine that does not at least meet that minimum. Because of that assumption, every version in the multi-version set inherits those implicit features. Imagine a multi-versioned function with three implementations: sha2, sha2+sha3, default. Compiled with -mcpu=generic, we'd get the usual behavior. Compiled with -mcpu=generic+sha2, the default implementation will never be called, and runtime selection will only need to decide between the version with/without sha3. Likewise, compiling with -mcpu=generic+sha3 will guarantee that both sha2 and sha3 are present in the feature set, as the former is implied by the latter in the ExtensionWithMArch sense, and thus the sha2+sha3 implementation is the only one that matters.

Do you have a case in mind where this would behave differently than the runtime resolver? I suppose we should distinguish two cases: both with the current sorting rule, and with that lexicographic-ish rule I'm proposing.

@andrewcarlotti
Copy link

Suppose you have target_version("i8mm+dotprod") and target_version("fcma"). The first version clearly has higher priority

According to the current rules yes, but IMO that rule is broken and doesn't match user expectation.

I chose my example (which @labrinea posted here) specifically so that it would still be valid after changing the priorities to use lexicographic ordering (for which I just created ARM-software/acle#370). Your alternative example doesn't exhibit the issue that this patch is intended to address.

@jroelofs
Copy link
Contributor

Suppose you have target_version("i8mm+dotprod") and target_version("fcma"). The first version clearly has higher priority

According to the current rules yes, but IMO that rule is broken and doesn't match user expectation.

I chose my example (which @labrinea posted here) specifically so that it would still be valid after changing the priorities to use lexicographic ordering (for which I just created ARM-software/acle#370). Your alternative example doesn't exhibit the issue that this patch is intended to address.

Ah, I see now. Sounds good to me.

@labrinea
Copy link
Collaborator Author

labrinea commented Dec 10, 2024

Thanks for looking into this Andrew. There's one thing I would like to clarify here. Jon expressed the opinion that in your example the command line should determine the highest priority version (i8mm+fcma), which isn't what ARM-software/acle#370 suggests. That said, this PR is still need after the amendment to ACLE. Personally I don't have a strong opinion which way to go. I am open to discuss.

I think if we fix that rule, and instead sort priorities lexicographically after adding implicit features implied by command-line options, then this change becomes unnecessary.

The proposal ARM-software/acle#370 does not take the implicit features of the cmdline into account.

@labrinea
Copy link
Collaborator Author

Just a gentle reminder that I would like us to clarify the impact of the cmdline over the version selection algorithm if any. With the Christmas break approaching I fear #87939 may miss the LLVM20 branch date.

@labrinea labrinea merged commit 93011fe into llvm:main Jan 7, 2025
8 checks passed
@labrinea labrinea deleted the fmv-features-metadata branch January 7, 2025 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants