Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VectorToXeGPU: Allows lowering vector.transfer_read and vector.transfer_write to XeGPU #773

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Scarlet1ssimo
Copy link

@Scarlet1ssimo Scarlet1ssimo commented Jun 10, 2024

Please review these guidelines to help with the review process:

  • Have you provided a meaningful PR description? Hope so.
  • Have you added a test, a reproducer, or a reference to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • If this PR is a work in progress, are you filing the PR as a draft?
  • Have you organized your commits logically and ensured each can be built by itself?

This patch allows lowering vector.transfer_read and vector.transfer_write, which is quite common when handling the vectorization stuff, to corresponding XeGPU dialect.
Namely, it first create a descriptor then apply either a LoadNdOp or a StoreNdOp.
Directly accessing 1d vector runs into some unknown issues, specifically no error reported during compilation and run but got wrong memory access pattern. So temporarily, for access a 1d vector, it first translates to accessing a 1x? vector, which then get reshaped back to 1d. At least it works as expected. Hope this could be fixed on some other sides.
Tested on PVC device.

@silee2 silee2 self-requested a review June 17, 2024 20:20
@silee2
Copy link
Contributor

silee2 commented Jun 17, 2024

@Scarlet1ssimo I have a general question about the placement of this code.
Both the source and target dialect of this conversion pass are MLIR upstream dialect.
This conversion pass can be placed in upstream MLIR or as part of your target project as well.
Any specific reason IMEX is the best place for this pass?

mlir::Value desc;
if (auto MemRefTypedSource =
mlir::cast<mlir::TypedValue<mlir::MemRefType>>(source)) {
desc = rewriter.create<mlir::xegpu::CreateNdDescOp>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be a check for memref rank here? XeGPU supports limited ranks.

mlir::Value desc;
if (auto MemRefTypedSource =
mlir::cast<mlir::TypedValue<mlir::MemRefType>>(source)) {
desc = rewriter.create<mlir::xegpu::CreateNdDescOp>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as my comment above: Check rank and return failure if unsupported shape.

@@ -0,0 +1,114 @@
// RUN: %python_executable %imex_runner --requires=l0-runtime -i %s --pass-pipeline-file=%p/vector-to-llvm.pp \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test in an Integration test running on GPU.
I would suggest placing in somewhere like under
test/Integration/Dialect/Vector/

@@ -0,0 +1,18 @@
builtin.module(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move to a folder for Integration tests along with the test case above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants