Skip to content

Tuning wgrad kernels with split-k, in practice #396

Answered by manishucsd
masahi asked this question in Q&A
Discussion options

You must be logged in to vote

Below are some guidelines and information on finding the best tile shape, alignment, split-k-mode (serial, parallel), and split-k slice.

1. Tile Shape: You would want to go with the largest Tile Shape for the most reuse; however, the trade-off is that a large Tile Shape might not be able to reach full GPU utilization because of quantization effects. Thus, it is best that you sweep through all possible Tile Shape for each problem size.

2. Alignment: This one is straightforward. The largest possible alignment will always win. Thus, for F16 input go with align8 wgrad kernels.

3. Split-k-mode: Parallel split-k-mode always surpasses serial split-k-mode. Parallel split-k-mode runs a reduction k…

Replies: 5 comments 12 replies

Comment options

You must be logged in to vote
1 reply
@masahi
Comment options

Answer selected by masahi
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
10 replies
@hwu36
Comment options

hwu36 Feb 5, 2022
Maintainer

@masahi
Comment options

@hwu36
Comment options

hwu36 Feb 5, 2022
Maintainer

@masahi
Comment options

@masahi
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@masahi
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants