[QST] Getting confused when reason about Volta's MMA_Atom. #1271

umiswing · 2023-12-14T08:13:44Z

umiswing
Dec 14, 2023

According to https://github.com/NVIDIA/cutlass/blob/main/media/docs/cute/0x_gemm_tutorial.md, we have a map between transpose and matrices' layout.

I can't reason about Volta's MMA_Atom in https://github.com/NVIDIA/cutlass/blob/main/media/docs/cute/0t_mma_atom.md with above table.

For example, mma_attom gives following layouts for TN, based on the assumption that matrix A is M-Major and matrix B is N-Major (according to my understanding), which contradicts the table above.

  // (T8,V4) -> (m,k) 
  using ALayout = Layout<Shape <_8,_4>,
                         Stride<_1,_8>>;
  // (T8,V4) -> (n,k) 
  using BLayout = Layout<Shape <_8,_4>,
                         Stride<_1,_8>>;

Have I misunderstood anything?

Answered by ccecka

Dec 14, 2023

The NT, TN, NN, and TT of the instructions are often mistakenly conflated with the layout of the data as well. The instructions and their traits don't say anything about the layout of data they operate on. The different instructions can, in principle, work on any data in any layout.

The MMA_Traits that you show do not describe the layout of the data, they describe the partitioning pattern of the instruction. Those partitioning patterns can be applied to any tensor of data with any layout.

See a similar question here with example code:
#1226

View full answer

ccecka · 2023-12-14T08:59:19Z

ccecka
Dec 14, 2023

The NT, TN, NN, and TT of the instructions are often mistakenly conflated with the layout of the data as well. The instructions and their traits don't say anything about the layout of data they operate on. The different instructions can, in principle, work on any data in any layout.

The MMA_Traits that you show do not describe the layout of the data, they describe the partitioning pattern of the instruction. Those partitioning patterns can be applied to any tensor of data with any layout.

See a similar question here with example code:
#1226

1 reply

umiswing Dec 14, 2023
Author

Thanks!

umiswing · 2023-12-18T13:38:45Z

umiswing
Dec 18, 2023
Author

cutlass follows the convention that matrix A is logical MxK, B is logical NxK ,C is logical MxN and they are both column-major. cutlass constructs the layout based on this convention.

"NT" indicates an A.col B.row instruction while "TN" indicates an A.row B.col instruction. We can't get a real index in an mma instruction, A.col B.row and A.row B.col only describe the partitioning pattern. cutlass wraps the pattern as "NT" or "TN" following the column-major convention.

There's no problem to follow the the row-major and describe an A.row B.col instruction as "NT" as long as getting the right layout.

Is this a correct understanding?

1 reply

foreverlms Dec 31, 2024

I think it is: ALayout only maps TV to coordinates of A, and for coordinates, CuTe chooses using column-major to encode this coordinates(Actually we can use row-major if you like). Once we get the coordinates, we could use the layout of A itself to get the offset/index in memory to get the value/data.

And NN, NT just describes the data accessing pattern of the instruction. If you pass row-major/col-major A to TN instruction, both will produce the output. But in row-major perf is better. Correct me if I am wrong.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Getting confused when reason about Volta's MMA_Atom. #1271

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

[QST] Getting confused when reason about Volta's MMA_Atom. #1271

umiswing Dec 14, 2023

Replies: 2 comments · 2 replies

ccecka Dec 14, 2023

umiswing Dec 14, 2023 Author

umiswing Dec 18, 2023 Author

foreverlms Dec 31, 2024

umiswing
Dec 14, 2023

Replies: 2 comments 2 replies

ccecka
Dec 14, 2023

umiswing Dec 14, 2023
Author

umiswing
Dec 18, 2023
Author