-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inductor][Optimus] Fix group fusion stride layout #2442
base: main
Are you sure you want to change the base?
Conversation
This pull request was exported from Phabricator. Differential Revision: D61888433 |
This pull request was exported from Phabricator. Differential Revision: D61888433 |
Summary: Pull Request resolved: pytorch#2442 context: https://fb.workplace.com/groups/1075192433118967/permalink/1401282167176657/ moving the changes to the group gemm op has compilation errors, see details in D55606636 Differential Revision: D61888433
d218f8b
to
0bc66e5
Compare
Summary: X-link: pytorch/benchmark#2442 context: https://fb.workplace.com/groups/1075192433118967/permalink/1401282167176657/ moving the changes to the group gemm op has compilation errors, see details in D55606636 Test Plan: # local reproduce ``` CUDA_LAUNCH_BLOCKING=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split-group --model_type "afoc" --flow_id 544109991 ``` Counter({'pattern_matcher_nodes': 1215, 'pattern_matcher_count': 1090, 'normalization_pass': 430, 'remove_split_with_size_one_pass': 416, 'batch_aten_mul': 13, 'scmerge_split_sections_removed': 11, 'scmerge_cat_removed': 5, 'scmerge_cat_added': 4, 'batch_linear_post_grad': 4, 'scmerge_split_removed': 3, 'batch_aten_sub': 2, 'batch_layernorm': 1, 'group_linear': 1}) ``` CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode group-batch-split --model_type "cmf_shrink" --flow_id 587303213 ``` P1551948670 Counter({'pattern_matcher_nodes': 2244, 'pattern_matcher_count': 1738, 'normalization_pass': 404, 'extern_calls': 370, 'benchmarking.TritonBenchmarker.benchmark_gpu': 293, 'remove_split_with_size_one_pass': 269, 'merge_splits_pass': 74, 'normalization_aten_pass': 56, 'batch_aten_mul': 11, 'fxgraph_cache_miss': 10, 'group_linear': 9, 'scmerge_split_sections_removed': 5, 'scmerge_split_removed': 4, 'scmerge_cat_removed': 4, 'unbind_stack_pass': 4, 'batch_sigmoid': 2, 'batch_linear': 2, 'move_reshape_out_of_split_stack_pass': 2, 'batch_aten_sub': 2, 'batch_aten_add': 2, 'batch_layernorm': 1, 'scmerge_split_added': 1, 'scmerge_cat_added': 1, 'split_stack_to_cats_pass': 1, 'split_cat_to_slices_pass': 1, 'benchmarking.TritonBenchmarker.triton_do_bench': 1, 'batch_relu': 1}) # e2e ### AFOC baseline: f545589474 proposal: f545589302 {F1474302182} ### cmf shrink ads_dper3:0e442d2994ad1421377489d53ef99593 training_platform:be4b7015f1582fb1760bd72cf83ff38d baseline f635512197 baseline + group_fusion f635975547 The group fusion can be enabled but has qps regression by using group fusion. Differential Revision: D61888433
Summary: X-link: pytorch/pytorch#134696 Pull Request resolved: pytorch#2442 context: https://fb.workplace.com/groups/1075192433118967/permalink/1401282167176657/ moving the changes to the group gemm op has compilation errors, see details in D55606636 Differential Revision: D61888433
This pull request was exported from Phabricator. Differential Revision: D61888433 |
0bc66e5
to
878b2b5
Compare
Summary: Pull Request resolved: pytorch#134696 X-link: pytorch/benchmark#2442 context: https://fb.workplace.com/groups/1075192433118967/permalink/1401282167176657/ moving the changes to the group gemm op has compilation errors, see details in D55606636 Test Plan: # local reproduce ``` CUDA_LAUNCH_BLOCKING=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split-group --model_type "afoc" --flow_id 544109991 ``` Counter({'pattern_matcher_nodes': 1215, 'pattern_matcher_count': 1090, 'normalization_pass': 430, 'remove_split_with_size_one_pass': 416, 'batch_aten_mul': 13, 'scmerge_split_sections_removed': 11, 'scmerge_cat_removed': 5, 'scmerge_cat_added': 4, 'batch_linear_post_grad': 4, 'scmerge_split_removed': 3, 'batch_aten_sub': 2, 'batch_layernorm': 1, 'group_linear': 1}) ``` CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode group-batch-split --model_type "cmf_shrink" --flow_id 587303213 ``` P1551948670 Counter({'pattern_matcher_nodes': 2244, 'pattern_matcher_count': 1738, 'normalization_pass': 404, 'extern_calls': 370, 'benchmarking.TritonBenchmarker.benchmark_gpu': 293, 'remove_split_with_size_one_pass': 269, 'merge_splits_pass': 74, 'normalization_aten_pass': 56, 'batch_aten_mul': 11, 'fxgraph_cache_miss': 10, 'group_linear': 9, 'scmerge_split_sections_removed': 5, 'scmerge_split_removed': 4, 'scmerge_cat_removed': 4, 'unbind_stack_pass': 4, 'batch_sigmoid': 2, 'batch_linear': 2, 'move_reshape_out_of_split_stack_pass': 2, 'batch_aten_sub': 2, 'batch_aten_add': 2, 'batch_layernorm': 1, 'scmerge_split_added': 1, 'scmerge_cat_added': 1, 'split_stack_to_cats_pass': 1, 'split_cat_to_slices_pass': 1, 'benchmarking.TritonBenchmarker.triton_do_bench': 1, 'batch_relu': 1}) # e2e ### AFOC baseline: f545589474 proposal: f545589302 {F1474302182} ### cmf shrink ads_dper3:0e442d2994ad1421377489d53ef99593 training_platform:be4b7015f1582fb1760bd72cf83ff38d baseline f635512197 baseline + group_fusion f635975547 {F1832419326} {F1832419319} {F1832419401} The group fusion can be enabled but has qps regression by using group fusion, will do a dive deep study. Differential Revision: D61888433
@xuzhao9 Is this the correct place to change this file? I remember you mentioned this |
@FindHao Exporting to pytorch/benchmark is optional. We will keep this PR and when it merges in fbcode or pytorch/pytorch, the changes will automatically be applied to pytorch/benchmark. |
Summary:
context:
https://fb.workplace.com/groups/1075192433118967/permalink/1401282167176657/
moving the changes to the group gemm op has compilation errors, see details in D55606636
Differential Revision: D61888433