Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tianxing/moe gemm #685

Merged
merged 28 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
71649a6
Implemented moe gemm, test and benchmarking.
Chi-Chu319 Dec 2, 2024
5adb971
removed benchmark files
Chi-Chu319 Dec 31, 2024
6852a96
Merge branch 'main_perf' into tianxing/moe-gemm
Chi-Chu319 Dec 31, 2024
ce4c4e8
removed all the benchmark files
Chi-Chu319 Dec 31, 2024
826cf5d
updated readme
Chi-Chu319 Dec 31, 2024
c06c61e
remove the -tune option, and consolidated the config files
Chi-Chu319 Jan 3, 2025
34130b9
pre commit
Chi-Chu319 Jan 3, 2025
c2b85ff
Merge branch 'main_perf' into tianxing/moe-gemm
Chi-Chu319 Jan 7, 2025
dc68ce1
updated M_THRESHOLD and configs after tunnig
Chi-Chu319 Jan 7, 2025
04a1629
Merge branch 'main_perf' into tianxing/moe-gemm
Chi-Chu319 Jan 7, 2025
53fa702
mistral model benchmarking
Chi-Chu319 Jan 7, 2025
1698d79
pre commit
Chi-Chu319 Jan 7, 2025
42d8dbc
noqa: E402
Chi-Chu319 Jan 7, 2025
3a04e90
pre commit
Chi-Chu319 Jan 7, 2025
9c83fd7
more fine tuned model config. show mem throught put in benchmark
Chi-Chu319 Jan 8, 2025
e9d3dc2
pre-commit
Chi-Chu319 Jan 8, 2025
ad2daad
fixed bandwidth computation
Chi-Chu319 Jan 8, 2025
30a488c
First and second gemm odel benchmarking
Chi-Chu319 Jan 10, 2025
39eca09
reversed k n
Chi-Chu319 Jan 10, 2025
0da016e
pre commit
Chi-Chu319 Jan 10, 2025
be6520b
pre-commit fix format
Chi-Chu319 Jan 10, 2025
54d207d
pre commit fix
Chi-Chu319 Jan 10, 2025
2fe49dd
pre commit fix
Chi-Chu319 Jan 10, 2025
6959222
pre commit
Chi-Chu319 Jan 10, 2025
8a84237
Update python/perf-kernels/fused_moe/moe-gemm.py
Chi-Chu319 Jan 13, 2025
d0160e6
print time for all the benchmark cases. added comment for byte calcul…
Chi-Chu319 Jan 14, 2025
a406e79
pre commit
Chi-Chu319 Jan 14, 2025
475a5b5
Merge branch 'tianxing/moe-gemm' of github.com:ROCm/triton into tianx…
Chi-Chu319 Jan 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/amd_perf_kernel_Integration_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ jobs:
pytest -vvvv ./python/perf-kernels/softmax.py
pytest -vvv ./python/perf-kernels/rmsnorm.py
pytest -vvv ./python/perf-kernels/layernorm.py
pytest -vvv ./python/perf-kernels/fused_moe/moe-gemm.py
sh ./python/perf-kernels/streamk/utils/unittest.sh
pytest -vvv ./python/perf-kernels/multreduce_matmul_kernel.py
- name: Run Perf Kernels Benchmark
Expand Down
3 changes: 3 additions & 0 deletions python/perf-kernels/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,3 +99,6 @@ Kernel that implements RMS Norm over a row of tensor.

## `layernorm.py`
Kernel that implements Layer Normalization over a row on tensor

## `fused_moe/moe-gemm.py`
Kernel that implements moe gemm.
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
{
"small_M": {
"BLOCK_SIZE_M": 64,
"BLOCK_SIZE_N": 64,
"BLOCK_SIZE_K": 64,
"GROUP_SIZE_M": 4,
"num_warps": 8,
"num_stages": 2,
"waves_per_eu": 0,
"matrix_instr_nonkdim": 16,
"kpack": 2
},
"medium_M": {
"BLOCK_SIZE_M": 128,
"BLOCK_SIZE_N": 128,
"BLOCK_SIZE_K": 128,
"GROUP_SIZE_M": 1,
"num_warps": 8,
"num_stages": 2,
"waves_per_eu": 0,
"matrix_instr_nonkdim": 16,
"kpack": 2
},
"large_M": {
"BLOCK_SIZE_M": 256,
"BLOCK_SIZE_N": 256,
"BLOCK_SIZE_K": 64,
"GROUP_SIZE_M": 1,
"num_warps": 8,
"num_stages": 2,
"waves_per_eu": 0,
"matrix_instr_nonkdim": 16,
"kpack": 2
}
}
Loading
Loading