Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix group fusion stride layout #2441

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mengluy0125
Copy link
Contributor

Summary:
X-link: pytorch/pytorch#122839

context:
https://fb.workplace.com/groups/1075192433118967/permalink/1401282167176657/

moving the changes to the group gemm op has compilation errors, see details in D55606636

Differential Revision: D55449814

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55449814

@mengluy0125 mengluy0125 changed the title Fix group fusion for AFOC Fix group fusion stride layout Aug 27, 2024
Summary:
Pull Request resolved: pytorch#2441

X-link: pytorch/pytorch#122839

context:
https://fb.workplace.com/groups/1075192433118967/permalink/1401282167176657/

moving the changes to the group gemm op has compilation errors, see details in D55606636

Differential Revision: D55449814
mengluy0125 added a commit to mengluy0125/pytorch that referenced this pull request Aug 27, 2024
Summary:
X-link: pytorch/benchmark#2441

Pull Request resolved: pytorch#122839

context:
https://fb.workplace.com/groups/1075192433118967/permalink/1401282167176657/

moving the changes to the group gemm op has compilation errors, see details in D55606636

Test Plan:
# local reproduce
```
CUDA_LAUNCH_BLOCKING=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split-group --model_type "afoc" --flow_id 544109991
```

Counter({'pattern_matcher_nodes': 1215, 'pattern_matcher_count': 1090, 'normalization_pass': 430, 'remove_split_with_size_one_pass': 416, 'batch_aten_mul': 13, 'scmerge_split_sections_removed': 11, 'scmerge_cat_removed': 5, 'scmerge_cat_added': 4, 'batch_linear_post_grad': 4, 'scmerge_split_removed': 3, 'batch_aten_sub': 2, 'batch_layernorm': 1, 'group_linear': 1})

```
CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode group-batch-split --model_type "cmf_shrink" --flow_id 587303213
```
P1551948670
Counter({'pattern_matcher_nodes': 2244, 'pattern_matcher_count': 1738, 'normalization_pass': 404, 'extern_calls': 370, 'benchmarking.TritonBenchmarker.benchmark_gpu': 293, 'remove_split_with_size_one_pass': 269, 'merge_splits_pass': 74, 'normalization_aten_pass': 56, 'batch_aten_mul': 11, 'fxgraph_cache_miss': 10, 'group_linear': 9, 'scmerge_split_sections_removed': 5, 'scmerge_split_removed': 4, 'scmerge_cat_removed': 4, 'unbind_stack_pass': 4, 'batch_sigmoid': 2, 'batch_linear': 2, 'move_reshape_out_of_split_stack_pass': 2, 'batch_aten_sub': 2, 'batch_aten_add': 2, 'batch_layernorm': 1, 'scmerge_split_added': 1, 'scmerge_cat_added': 1, 'split_stack_to_cats_pass': 1, 'split_cat_to_slices_pass': 1, 'benchmarking.TritonBenchmarker.triton_do_bench': 1, 'batch_relu': 1})

# e2e

### AFOC
baseline:
f545589474
proposal:
f545589302

 {F1474302182}

### cmf shrink

baseline
f635512197

baseline + group_fusion

The group fusion can be enabled but has qps regression by using group fusion.

Differential Revision: D55449814
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D55449814

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants