Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding change to IndVarSimplify pass to optimize IVs stuck in trivial vector operations #122248

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

anilavakundu007
Copy link

@anilavakundu007 anilavakundu007 commented Jan 9, 2025

This patch helps LLVM to unroll loop patterns like:

typedef int ivec2 __attribute__((ext_vector_type(2)));
int data0;
void fn()
{
  int u_xlati59 = 1;
  while (true)
    {
      bool u_xlatb12 = u_xlati59 >= 3;
      if (u_xlatb12)
	{
	  break;
	}
      ivec2 u_xlati12 = (ivec2){u_xlati59, u_xlati59} + (ivec2){-1, 1};
      data0 += u_xlati12.x;
      u_xlati59 = u_xlati12.y;
    }
}

In the existing method the Induction variable fails as SCEV fails and the loop cannot be unrolled. With this patch the loop gets unrolled and the final code generated is a lot smaller.

linked issue: #121742

Copy link

github-actions bot commented Jan 9, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Jan 9, 2025

@llvm/pr-subscribers-llvm-transforms

Author: None (anilavakundu007)

Changes

This patch helps LLVM to unroll loop patterns like:

typedef int ivec2 __attribute__((ext_vector_type(2)));
int data0;
void fn()
{
  int u_xlati59 = 1;
  while (true)
    {
      bool u_xlatb12 = u_xlati59 >= 3;
      if (u_xlatb12)
	{
	  break;
	}
      ivec2 u_xlati12 = (ivec2){u_xlati59, u_xlati59} + (ivec2){-1, 1};
      data0 += u_xlati12.x;
      u_xlati59 = u_xlati12.y;
    }
}

In the existing method the Induction variable fails as SCEV fails and the loop cannot be unrolled. With this patch the loop gets unrolled and the final code generated is a lot smaller.

linked issue: #121742


Full diff: https://github.com/llvm/llvm-project/pull/122248.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/IndVarSimplify.cpp (+226)
diff --git a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
index 8a3e0bc3eb9712..37361080a1d766 100644
--- a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
+++ b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
@@ -138,6 +138,7 @@ class IndVarSimplify {
   bool RunUnswitching = false;
 
   bool handleFloatingPointIV(Loop *L, PHINode *PH);
+  bool breakVectorOpsOnIVs(Loop *L);
   bool rewriteNonIntegerIVs(Loop *L);
 
   bool simplifyAndExtend(Loop *L, SCEVExpander &Rewriter, LoopInfo *LI);
@@ -1891,6 +1892,212 @@ bool IndVarSimplify::predicateLoopExits(Loop *L, SCEVExpander &Rewriter) {
   return Changed;
 }
 
+// Get the latch condition instruction.
+static ICmpInst *getLatchCmpInst(const Loop &L) {
+  if (BasicBlock *Latch = L.getLoopLatch())
+    if (BranchInst *BI = dyn_cast_or_null<BranchInst>(Latch->getTerminator()))
+      if (BI->isConditional())
+        return dyn_cast<ICmpInst>(BI->getCondition());
+
+  return nullptr;
+}
+
+// get the vector which contains the IV
+// This function will return the vector which only contains the IV
+static Value* getVectorContaingIV(PHINode* IV)
+{
+
+   for (User* use: IV->users())
+   {
+      // check if the IV is a part of any vector or not
+      InsertElementInst *vecInsert = dyn_cast<InsertElementInst>(use);
+      if(!vecInsert)
+        continue;
+      // We need to check if this vector contains only the IV
+      FixedVectorType* vecTy = dyn_cast<FixedVectorType>(vecInsert->getType());
+      if (!vecTy)
+        continue;
+      // if it is vector of a single element with the IV as an element
+      if(vecTy->getNumElements() == 1)
+        return vecInsert;
+      // if we have larger vectors
+      if(vecTy->getNumElements() > 1)
+      {
+         //check the vector we are inserting into an empty vector
+         Value* srcVec = vecInsert->getOperand(0);
+         if (!isa<UndefValue>(srcVec) || !isa<PoisonValue>(srcVec))
+            continue;
+         //check if we are later inserting more elements into the vector or not
+         InsertElementInst *insertOtherVal = nullptr;
+         for (User* vecUse: vecInsert->users())
+         {
+            insertOtherVal = dyn_cast<InsertElementInst>(vecUse);
+            if(insertOtherVal)
+              break;
+         }
+         if(insertOtherVal)
+            continue;
+
+         // vector contains only IV
+         return vecInsert;
+      }
+   }
+   return nullptr;
+}
+
+
+// check if a PHINode is a possbile Induction variable or not
+// The existing functions do not work as SCEV fails
+// This happens when the IV is stuck in a vector operation
+// And the incoming value comes from an extract element instruction
+static PHINode* isPossibleIDVar(Loop *L)
+{
+  BasicBlock *Header = L->getHeader();
+  assert(Header && "Expected a valid loop header");
+  ICmpInst *CmpInst = getLatchCmpInst(*L);
+  if (!CmpInst)
+    return nullptr;
+
+  Value *LatchCmpOp0 = CmpInst->getOperand(0);
+  Value *LatchCmpOp1 = CmpInst->getOperand(1);
+
+  // check if the compare instruction operands are Extract element instructions
+  // We do this as we expect the extract element to be reading from the vector which has the IV
+  // If none of the operands are extract element instructions we do not proceed
+  if (!isa<ExtractElementInst>(LatchCmpOp0) && !isa<ExtractElementInst>(LatchCmpOp1))
+    return nullptr;
+
+  for (PHINode &IndVar : Header->phis())
+  {
+    if (!IndVar.getType()->isIntegerTy() && !IndVar.getType()->isFloatTy())
+      continue;
+
+    BasicBlock *Latch = L->getLoopLatch();
+    Value *StepInst = IndVar.getIncomingValueForBlock(Latch);
+
+    // case 1:
+    // IndVar = phi[{InitialValue, preheader}, {StepInst, latch}]
+    // StepInst = IndVar + step
+    // cmp = StepInst < FinalValue
+    if (StepInst == LatchCmpOp0 || StepInst == LatchCmpOp1)
+      return (getVectorContaingIV(&IndVar)) ? &IndVar : nullptr;
+
+    // case 2:
+    // IndVar = phi[{InitialValue, preheader}, {StepInst, latch}]
+    // StepInst = IndVar + step
+    // cmp = IndVar < FinalValue
+    if (&IndVar == LatchCmpOp0 || &IndVar == LatchCmpOp1)
+      return (getVectorContaingIV(&IndVar)) ? &IndVar : nullptr;
+    }
+
+    return nullptr;
+}
+
+// Function to check if the source vector for the extract element
+// is the same as the vector containing the IV
+// If that can be proved the extract element can be removed
+static bool checkVecOp(Value* BinOperand, Value* VecIV)
+{
+   if (BinOperand == VecIV)
+      return true;
+
+   ShuffleVectorInst *shuffleInst = dyn_cast<ShuffleVectorInst>(BinOperand);
+   if (!shuffleInst)
+      return false;
+
+   // Handle patterns where:
+   //  first operand is the vector containing only the IV
+   //  Mask only selects the first element from the first source vector
+   // TODO: Add more patterns?
+   bool isFirstSrc = (shuffleInst->getOperand(0) == VecIV);
+   auto shuffleMask = shuffleInst->getShuffleMask();
+
+   return (isFirstSrc && shuffleInst->isZeroEltSplatMask(shuffleMask, shuffleMask.size()));
+}
+
+
+static bool ReplaceExtractInst(ConstantDataVector* values, unsigned Opcode,
+                               ExtractElementInst* elemInst, PHINode* IV)
+{
+    unsigned extIdx = cast<ConstantInt>(elemInst->getIndexOperand())->getZExtValue();
+    IRBuilder<> B(elemInst);
+    Value* extractedVal = values->getElementAsConstant(extIdx);
+    Value* newInst = nullptr;
+    bool changed = true;
+    switch(Opcode)
+    {
+       case Instruction::Add:
+          newInst = B.CreateAdd(IV, extractedVal);
+          break;
+       case Instruction::Sub:
+          newInst = B.CreateSub(IV, extractedVal);
+          break;
+       case Instruction::Mul:
+          newInst = B.CreateMul(IV, extractedVal);
+          break;
+       case Instruction::FMul:
+          newInst = B.CreateFMul(IV, extractedVal);
+          break;
+       case Instruction::UDiv:
+          newInst = B.CreateUDiv(IV, extractedVal);
+          break;
+       case Instruction::SDiv:
+          newInst = B.CreateSDiv(IV, extractedVal);
+          break;
+       default:
+          changed = false;
+    };
+
+    if (changed)
+    {
+      LLVM_DEBUG(dbgs() << "INDVARS: Rewriting Extract Element:\n" << *elemInst <<"\n"
+                    << " With :" << *newInst <<"\n");
+      elemInst->replaceAllUsesWith(newInst);
+      elemInst->eraseFromParent();
+    }
+    return changed;
+}
+
+
+bool IndVarSimplify::breakVectorOpsOnIVs(Loop *L) {
+
+  PHINode *IV = isPossibleIDVar(L);
+  if(!IV)
+    return false;
+
+  // Process the vector operation
+  ICmpInst *CmpInst = getLatchCmpInst(*L);
+  unsigned idx = isa<ExtractElementInst>(CmpInst->getOperand(0)) ? 0 : 1;
+  ExtractElementInst *exElem = cast<ExtractElementInst>(CmpInst->getOperand(idx));
+
+  // check if the idx is consant
+  if (!isa<ConstantInt>(exElem->getIndexOperand()))
+    return false;
+
+  // check if the extract element comes from a binary operation
+  BinaryOperator *SrcVec = dyn_cast<BinaryOperator>(exElem->getVectorOperand());
+  if (!SrcVec)
+    return false;
+
+  // if both operands of the binary op is not a constant data vector then let go
+  Value *BinOperand0 = SrcVec->getOperand(0);
+  Value *BinOperand1 = SrcVec->getOperand(1);
+  if(!isa<ConstantDataVector>(BinOperand0) && !isa<ConstantDataVector>(BinOperand1))
+   return false;
+
+  unsigned ConstVecIdx = isa<ConstantDataVector>(BinOperand0) ? 0 : 1;
+  Value* VecWithIV = getVectorContaingIV(IV);
+
+  if(!checkVecOp(SrcVec->getOperand(!ConstVecIdx), VecWithIV))
+    return false;
+
+  ConstantDataVector *DataVec = cast<ConstantDataVector>(SrcVec->getOperand(ConstVecIdx));
+  return ReplaceExtractInst(DataVec, SrcVec->getOpcode(), exElem, IV);
+
+}
+
+
+
 //===----------------------------------------------------------------------===//
 //  IndVarSimplify driver. Manage several subpasses of IV simplification.
 //===----------------------------------------------------------------------===//
@@ -1913,6 +2120,25 @@ bool IndVarSimplify::run(Loop *L) {
     return false;
 
   bool Changed = false;
+  // Breaks trivial operation on vector which contain just the Induction variable
+  // This pass looks for the following pattern in the IR
+      // %header:
+        // %phiVar = i32/float [%r1, %pre_header], [%extVal, %latch_block]
+        // more instructions
+        // %initVec = insertelement <k x i32/float> undef/posion, %phiVar
+        // %extendVec = shufflevector <k x i32/float> %initVec, <k x i32/float> undef/poison, <m x i32/float> zeroInitializer 
+        // more instructions
+        // %Op = binOp %extendVec, %constDataVec
+        // more instructions
+        // %extVal = extractElement %Op, constIdx
+        // %branchVal = icmp unaryOp %extVal, %loopBound
+        // branch %branchVal B1, B2
+  // Essentially the pass looks for a possible induction variable being extracted from a vector
+  // the vector should have a splat value that is equal to the IV
+  // replaces the extractelement on the IV (%extVal = extractElement %Op, constIdx) with a scalar as:
+        // binOp %extVal = binOp %phiVar, %constDataVec[constIdx]
+        // %branchVal = icmp unaryOp %extVal, %loopBound
+  Changed |= breakVectorOpsOnIVs(L);
   // If there are any floating-point recurrences, attempt to
   // transform them to use integer recurrences.
   Changed |= rewriteNonIntegerIVs(L);

@dtcxzyw dtcxzyw requested review from nikic and fhahn January 9, 2025 10:59
Copy link

github-actions bot commented Jan 9, 2025

⚠️ undef deprecator found issues in your code. ⚠️

You can test this locally with the following command:
git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 459d413b74b7f41e820328fefc38ff93b2e42b00 86f9ce395a940f062afb9acc6d8dc0cfd31555e8 llvm/lib/Transforms/Scalar/IndVarSimplify.cpp

The following files introduce new uses of undef:

  • llvm/lib/Transforms/Scalar/IndVarSimplify.cpp

Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields undef. You should use poison values for placeholders instead.

In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:

define void @fn() {
  ...
  br i1 undef, ...
}

Please use the following instead:

define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}

Please refer to the Undefined Behavior Manual for more information.

Copy link

github-actions bot commented Jan 9, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 459d413b74b7f41e820328fefc38ff93b2e42b00 86f9ce395a940f062afb9acc6d8dc0cfd31555e8 --extensions cpp -- llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
index 37361080a1..42c5d13bb1 100644
--- a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
+++ b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
@@ -1904,54 +1904,48 @@ static ICmpInst *getLatchCmpInst(const Loop &L) {
 
 // get the vector which contains the IV
 // This function will return the vector which only contains the IV
-static Value* getVectorContaingIV(PHINode* IV)
-{
-
-   for (User* use: IV->users())
-   {
-      // check if the IV is a part of any vector or not
-      InsertElementInst *vecInsert = dyn_cast<InsertElementInst>(use);
-      if(!vecInsert)
-        continue;
-      // We need to check if this vector contains only the IV
-      FixedVectorType* vecTy = dyn_cast<FixedVectorType>(vecInsert->getType());
-      if (!vecTy)
+static Value *getVectorContaingIV(PHINode *IV) {
+
+  for (User *use : IV->users()) {
+    // check if the IV is a part of any vector or not
+    InsertElementInst *vecInsert = dyn_cast<InsertElementInst>(use);
+    if (!vecInsert)
+      continue;
+    // We need to check if this vector contains only the IV
+    FixedVectorType *vecTy = dyn_cast<FixedVectorType>(vecInsert->getType());
+    if (!vecTy)
+      continue;
+    // if it is vector of a single element with the IV as an element
+    if (vecTy->getNumElements() == 1)
+      return vecInsert;
+    // if we have larger vectors
+    if (vecTy->getNumElements() > 1) {
+      // check the vector we are inserting into an empty vector
+      Value *srcVec = vecInsert->getOperand(0);
+      if (!isa<UndefValue>(srcVec) || !isa<PoisonValue>(srcVec))
         continue;
-      // if it is vector of a single element with the IV as an element
-      if(vecTy->getNumElements() == 1)
-        return vecInsert;
-      // if we have larger vectors
-      if(vecTy->getNumElements() > 1)
-      {
-         //check the vector we are inserting into an empty vector
-         Value* srcVec = vecInsert->getOperand(0);
-         if (!isa<UndefValue>(srcVec) || !isa<PoisonValue>(srcVec))
-            continue;
-         //check if we are later inserting more elements into the vector or not
-         InsertElementInst *insertOtherVal = nullptr;
-         for (User* vecUse: vecInsert->users())
-         {
-            insertOtherVal = dyn_cast<InsertElementInst>(vecUse);
-            if(insertOtherVal)
-              break;
-         }
-         if(insertOtherVal)
-            continue;
-
-         // vector contains only IV
-         return vecInsert;
+      // check if we are later inserting more elements into the vector or not
+      InsertElementInst *insertOtherVal = nullptr;
+      for (User *vecUse : vecInsert->users()) {
+        insertOtherVal = dyn_cast<InsertElementInst>(vecUse);
+        if (insertOtherVal)
+          break;
       }
-   }
-   return nullptr;
-}
+      if (insertOtherVal)
+        continue;
 
+      // vector contains only IV
+      return vecInsert;
+    }
+  }
+  return nullptr;
+}
 
 // check if a PHINode is a possbile Induction variable or not
 // The existing functions do not work as SCEV fails
 // This happens when the IV is stuck in a vector operation
 // And the incoming value comes from an extract element instruction
-static PHINode* isPossibleIDVar(Loop *L)
-{
+static PHINode *isPossibleIDVar(Loop *L) {
   BasicBlock *Header = L->getHeader();
   assert(Header && "Expected a valid loop header");
   ICmpInst *CmpInst = getLatchCmpInst(*L);
@@ -1962,13 +1956,14 @@ static PHINode* isPossibleIDVar(Loop *L)
   Value *LatchCmpOp1 = CmpInst->getOperand(1);
 
   // check if the compare instruction operands are Extract element instructions
-  // We do this as we expect the extract element to be reading from the vector which has the IV
-  // If none of the operands are extract element instructions we do not proceed
-  if (!isa<ExtractElementInst>(LatchCmpOp0) && !isa<ExtractElementInst>(LatchCmpOp1))
+  // We do this as we expect the extract element to be reading from the vector
+  // which has the IV If none of the operands are extract element instructions
+  // we do not proceed
+  if (!isa<ExtractElementInst>(LatchCmpOp0) &&
+      !isa<ExtractElementInst>(LatchCmpOp1))
     return nullptr;
 
-  for (PHINode &IndVar : Header->phis())
-  {
+  for (PHINode &IndVar : Header->phis()) {
     if (!IndVar.getType()->isIntegerTy() && !IndVar.getType()->isFloatTy())
       continue;
 
@@ -1988,87 +1983,85 @@ static PHINode* isPossibleIDVar(Loop *L)
     // cmp = IndVar < FinalValue
     if (&IndVar == LatchCmpOp0 || &IndVar == LatchCmpOp1)
       return (getVectorContaingIV(&IndVar)) ? &IndVar : nullptr;
-    }
+  }
 
-    return nullptr;
+  return nullptr;
 }
 
 // Function to check if the source vector for the extract element
 // is the same as the vector containing the IV
 // If that can be proved the extract element can be removed
-static bool checkVecOp(Value* BinOperand, Value* VecIV)
-{
-   if (BinOperand == VecIV)
-      return true;
+static bool checkVecOp(Value *BinOperand, Value *VecIV) {
+  if (BinOperand == VecIV)
+    return true;
 
-   ShuffleVectorInst *shuffleInst = dyn_cast<ShuffleVectorInst>(BinOperand);
-   if (!shuffleInst)
-      return false;
+  ShuffleVectorInst *shuffleInst = dyn_cast<ShuffleVectorInst>(BinOperand);
+  if (!shuffleInst)
+    return false;
 
-   // Handle patterns where:
-   //  first operand is the vector containing only the IV
-   //  Mask only selects the first element from the first source vector
-   // TODO: Add more patterns?
-   bool isFirstSrc = (shuffleInst->getOperand(0) == VecIV);
-   auto shuffleMask = shuffleInst->getShuffleMask();
+  // Handle patterns where:
+  //  first operand is the vector containing only the IV
+  //  Mask only selects the first element from the first source vector
+  // TODO: Add more patterns?
+  bool isFirstSrc = (shuffleInst->getOperand(0) == VecIV);
+  auto shuffleMask = shuffleInst->getShuffleMask();
 
-   return (isFirstSrc && shuffleInst->isZeroEltSplatMask(shuffleMask, shuffleMask.size()));
+  return (isFirstSrc &&
+          shuffleInst->isZeroEltSplatMask(shuffleMask, shuffleMask.size()));
 }
 
+static bool ReplaceExtractInst(ConstantDataVector *values, unsigned Opcode,
+                               ExtractElementInst *elemInst, PHINode *IV) {
+  unsigned extIdx =
+      cast<ConstantInt>(elemInst->getIndexOperand())->getZExtValue();
+  IRBuilder<> B(elemInst);
+  Value *extractedVal = values->getElementAsConstant(extIdx);
+  Value *newInst = nullptr;
+  bool changed = true;
+  switch (Opcode) {
+  case Instruction::Add:
+    newInst = B.CreateAdd(IV, extractedVal);
+    break;
+  case Instruction::Sub:
+    newInst = B.CreateSub(IV, extractedVal);
+    break;
+  case Instruction::Mul:
+    newInst = B.CreateMul(IV, extractedVal);
+    break;
+  case Instruction::FMul:
+    newInst = B.CreateFMul(IV, extractedVal);
+    break;
+  case Instruction::UDiv:
+    newInst = B.CreateUDiv(IV, extractedVal);
+    break;
+  case Instruction::SDiv:
+    newInst = B.CreateSDiv(IV, extractedVal);
+    break;
+  default:
+    changed = false;
+  };
 
-static bool ReplaceExtractInst(ConstantDataVector* values, unsigned Opcode,
-                               ExtractElementInst* elemInst, PHINode* IV)
-{
-    unsigned extIdx = cast<ConstantInt>(elemInst->getIndexOperand())->getZExtValue();
-    IRBuilder<> B(elemInst);
-    Value* extractedVal = values->getElementAsConstant(extIdx);
-    Value* newInst = nullptr;
-    bool changed = true;
-    switch(Opcode)
-    {
-       case Instruction::Add:
-          newInst = B.CreateAdd(IV, extractedVal);
-          break;
-       case Instruction::Sub:
-          newInst = B.CreateSub(IV, extractedVal);
-          break;
-       case Instruction::Mul:
-          newInst = B.CreateMul(IV, extractedVal);
-          break;
-       case Instruction::FMul:
-          newInst = B.CreateFMul(IV, extractedVal);
-          break;
-       case Instruction::UDiv:
-          newInst = B.CreateUDiv(IV, extractedVal);
-          break;
-       case Instruction::SDiv:
-          newInst = B.CreateSDiv(IV, extractedVal);
-          break;
-       default:
-          changed = false;
-    };
-
-    if (changed)
-    {
-      LLVM_DEBUG(dbgs() << "INDVARS: Rewriting Extract Element:\n" << *elemInst <<"\n"
-                    << " With :" << *newInst <<"\n");
-      elemInst->replaceAllUsesWith(newInst);
-      elemInst->eraseFromParent();
-    }
-    return changed;
+  if (changed) {
+    LLVM_DEBUG(dbgs() << "INDVARS: Rewriting Extract Element:\n"
+                      << *elemInst << "\n"
+                      << " With :" << *newInst << "\n");
+    elemInst->replaceAllUsesWith(newInst);
+    elemInst->eraseFromParent();
+  }
+  return changed;
 }
 
-
 bool IndVarSimplify::breakVectorOpsOnIVs(Loop *L) {
 
   PHINode *IV = isPossibleIDVar(L);
-  if(!IV)
+  if (!IV)
     return false;
 
   // Process the vector operation
   ICmpInst *CmpInst = getLatchCmpInst(*L);
   unsigned idx = isa<ExtractElementInst>(CmpInst->getOperand(0)) ? 0 : 1;
-  ExtractElementInst *exElem = cast<ExtractElementInst>(CmpInst->getOperand(idx));
+  ExtractElementInst *exElem =
+      cast<ExtractElementInst>(CmpInst->getOperand(idx));
 
   // check if the idx is consant
   if (!isa<ConstantInt>(exElem->getIndexOperand()))
@@ -2082,22 +2075,21 @@ bool IndVarSimplify::breakVectorOpsOnIVs(Loop *L) {
   // if both operands of the binary op is not a constant data vector then let go
   Value *BinOperand0 = SrcVec->getOperand(0);
   Value *BinOperand1 = SrcVec->getOperand(1);
-  if(!isa<ConstantDataVector>(BinOperand0) && !isa<ConstantDataVector>(BinOperand1))
-   return false;
+  if (!isa<ConstantDataVector>(BinOperand0) &&
+      !isa<ConstantDataVector>(BinOperand1))
+    return false;
 
   unsigned ConstVecIdx = isa<ConstantDataVector>(BinOperand0) ? 0 : 1;
-  Value* VecWithIV = getVectorContaingIV(IV);
+  Value *VecWithIV = getVectorContaingIV(IV);
 
-  if(!checkVecOp(SrcVec->getOperand(!ConstVecIdx), VecWithIV))
+  if (!checkVecOp(SrcVec->getOperand(!ConstVecIdx), VecWithIV))
     return false;
 
-  ConstantDataVector *DataVec = cast<ConstantDataVector>(SrcVec->getOperand(ConstVecIdx));
+  ConstantDataVector *DataVec =
+      cast<ConstantDataVector>(SrcVec->getOperand(ConstVecIdx));
   return ReplaceExtractInst(DataVec, SrcVec->getOpcode(), exElem, IV);
-
 }
 
-
-
 //===----------------------------------------------------------------------===//
 //  IndVarSimplify driver. Manage several subpasses of IV simplification.
 //===----------------------------------------------------------------------===//
@@ -2120,24 +2112,21 @@ bool IndVarSimplify::run(Loop *L) {
     return false;
 
   bool Changed = false;
-  // Breaks trivial operation on vector which contain just the Induction variable
-  // This pass looks for the following pattern in the IR
-      // %header:
-        // %phiVar = i32/float [%r1, %pre_header], [%extVal, %latch_block]
-        // more instructions
-        // %initVec = insertelement <k x i32/float> undef/posion, %phiVar
-        // %extendVec = shufflevector <k x i32/float> %initVec, <k x i32/float> undef/poison, <m x i32/float> zeroInitializer 
-        // more instructions
-        // %Op = binOp %extendVec, %constDataVec
-        // more instructions
-        // %extVal = extractElement %Op, constIdx
-        // %branchVal = icmp unaryOp %extVal, %loopBound
-        // branch %branchVal B1, B2
-  // Essentially the pass looks for a possible induction variable being extracted from a vector
-  // the vector should have a splat value that is equal to the IV
-  // replaces the extractelement on the IV (%extVal = extractElement %Op, constIdx) with a scalar as:
-        // binOp %extVal = binOp %phiVar, %constDataVec[constIdx]
-        // %branchVal = icmp unaryOp %extVal, %loopBound
+  // Breaks trivial operation on vector which contain just the Induction
+  // variable This pass looks for the following pattern in the IR %header:
+  // %phiVar = i32/float [%r1, %pre_header], [%extVal, %latch_block]
+  // more instructions
+  // %initVec = insertelement <k x i32/float> undef/posion, %phiVar
+  // %extendVec = shufflevector <k x i32/float> %initVec, <k x i32/float>
+  // undef/poison, <m x i32/float> zeroInitializer more instructions %Op = binOp
+  // %extendVec, %constDataVec more instructions %extVal = extractElement %Op,
+  // constIdx %branchVal = icmp unaryOp %extVal, %loopBound branch %branchVal
+  // B1, B2
+  // Essentially the pass looks for a possible induction variable being
+  // extracted from a vector the vector should have a splat value that is equal
+  // to the IV replaces the extractelement on the IV (%extVal = extractElement
+  // %Op, constIdx) with a scalar as: binOp %extVal = binOp %phiVar,
+  // %constDataVec[constIdx] %branchVal = icmp unaryOp %extVal, %loopBound
   Changed |= breakVectorOpsOnIVs(L);
   // If there are any floating-point recurrences, attempt to
   // transform them to use integer recurrences.

@nikic
Copy link
Contributor

nikic commented Jan 9, 2025

Can you share an IR sample before and after the pass, so it's easier to understand what it does?

@nikic
Copy link
Contributor

nikic commented Jan 9, 2025

Kind of related issue is #121796. I haven't tried to understand what your transform is doing, but probably an easier / more general fix is to teach SCEV to look through some vector operations.

@anilavakundu007
Copy link
Author

Can you share an IR sample before and after the pass, so it's easier to understand what it does?

Before IndVarSimplify Pass (in the -O2 pipeline):

 %vecinit = insertelement <2 x i32> poison, i32 %u_xlati59.06, i64 0
 %vecinit1 = shufflevector <2 x i32> %vecinit, <2 x i32> poison, <2 x i32> zeroinitializer
 %add = add <2 x i32> %vecinit1, <i32 -1, i32 1>
 %0 = extractelement <2 x i32> %add, i64 0
 %add3 = add nsw i32 %add345, %0
 %1 = extractelement <2 x i32> %add, i64 1
 %cmp = icmp sgt i32 %1, 2

After my patch in IndVarSimplify Pass:

%vecinit = insertelement <2 x i32> poison, i32 %u_xlati59.06, i64 0                                                                 
%vecinit1 = shufflevector <2 x i32> %vecinit, <2 x i32> poison, <2 x i32> zeroinitializer                                            
%add = add <2 x i32> %vecinit1, <i32 -1, i32 1>                                                                                      
%0 = extractelement <2 x i32> %add, i64 0                                                                                            
%add3 = add nsw i32 %add345, %0                                                                                                      
%1 = add nsw i32 %u_xlati59.06, 1                                                                                                    
%cmp = icmp sgt i32 %1, 2 

Here the:

%1 = extractelement <2 x i32> %add, i64 1 
%add = add <2 x i32> %vecinit1, <i32 -1, i32 1>

is broken to have a scalar bin op instruction:

%1 = add nsw i32 %u_xlati59.06, 1 

@anilavakundu007
Copy link
Author

Kind of related issue is #121796. I haven't tried to understand what your transform is doing, but probably an easier / more general fix is to teach SCEV to look through some vector operations.

I had looked through the SCEV implementation and it appears to bail out on extractelement Instructions if I remember correctly

@nikic
Copy link
Contributor

nikic commented Jan 9, 2025

Thanks. I think in your particular case, I suspect it probably makes sense to scalarize the vector op independently of whether it contains an induction variable? This case creates a splat vector, does a vector add and then immediately extracts both elements. It seems unlikely that this is going to be faster than just doing two adds on any architecture.

@anilavakundu007
Copy link
Author

Thanks. I think in your particular case, I suspect it probably makes sense to scalarize the vector op independently of whether it contains an induction variable? This case creates a splat vector, does a vector add and then immediately extracts both elements. It seems unlikely that this is going to be faster than just doing two adds on any architecture.

The reason this primarily focused on IVs as once the vector operation is broken LLVM can basically strip everything down to couple of instructions (following loop unroll).

@anilavakundu007
Copy link
Author

Also, we mostly observed such patterns on IVs from couple of GLSL shaders that prevented loop unrolling hence wanted to focus on IVs particularly!

@anilavakundu007
Copy link
Author

Hi, a gentle ping; any suggestions/modification for this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants