-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
misaligned memory access from transpose kernel #3701
Comments
This was broken when Fuser/csrc/scheduler/transpose.cpp Line 869 in ef6f169
The following patch diff --git a/csrc/scheduler/transpose.cpp b/csrc/scheduler/transpose.cpp
index bef6b2b7..ecfa5615 100644
--- a/csrc/scheduler/transpose.cpp
+++ b/csrc/scheduler/transpose.cpp
@@ -860,15 +860,9 @@ void scheduleTranspose(Fusion* fusion, const TransposeParams* tparams) {
grouped_inputs_outputs[1].begin(), grouped_inputs_outputs[1].end());
for (auto tv : grouped_inputs_outputs[1]) {
if (tv->isFusionInput()) {
- auto existing_cache = ir_utils::consumerTvsOf(tv)[0];
- if (ir_utils::consumerTvsOf(existing_cache).size() > 1) {
- auto new_cache = tv->cacheAfter();
- new_cache->setMemoryType(MemoryType::Shared);
- group2_and_cached_inputs.emplace(new_cache);
- } else {
- existing_cache->setMemoryType(MemoryType::Shared);
- group2_and_cached_inputs.emplace(existing_cache);
- }
+ auto new_cache = tv->cacheAfter();
+ new_cache->setMemoryType(MemoryType::Shared);
+ group2_and_cached_inputs.emplace(new_cache);
}
}
// set cached outputs of group 2 to shared memory can be a workaround (note that IIUC, any scheduler that caches inputs in shared memory can be broken by #3621. Transpose and matmul are of that type. Normalization would likely suffer too. cc @naoyam |
Looks like it's caused by #3621
repro script
throws
The text was updated successfully, but these errors were encountered: