Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed blocking heuristics for M,K,N in BRGEMM Matmul to enable better memory/thread utilization in aarch64 #2103

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Shreyas-fuj
Copy link
Contributor

Description

This PR brings some optimizations to brgemm matmul operator by improving memory utilization and multithreading capabilities.

This PR contains the following changes:

  • Modification of blocking parameters for M,K,N based on some heuristics obtained by testing matmul on shapes of majority of language models.
  • Assembly level optimization which removes the necessity of the fadd() instruction before storing the accumulator results in destination matrix.

General

  • [y] Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
make test

99% tests passed, 2 tests failed out of 200

Total Test time (real) = 2142.22 sec

The following tests FAILED:
	159 - test_graph_unit_dnnl_large_partition_usm_cpu (Failed)
	181 - test_benchdnn_modeC_graph_ci_cpu (Failed)
Errors while running CTest
Output from these tests are in: /home/shreyas/G/shr-fuj/oneDNN_open_source/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
make: *** [Makefile:71: test] Error 8

  • [y] Have you formatted the code using clang-format?

@Shreyas-fuj Shreyas-fuj requested review from a team as code owners September 20, 2024 06:31
@Shreyas-fuj Shreyas-fuj changed the title Changed blocking in heuristics for M,K,N in BRGEMM Matmul to enable better memory/thread utilization Changed blocking heuristics for M,K,N in BRGEMM Matmul to enable better memory/thread utilization Sep 20, 2024
@Shreyas-fuj Shreyas-fuj changed the title Changed blocking heuristics for M,K,N in BRGEMM Matmul to enable better memory/thread utilization Changed blocking heuristics for M,K,N in BRGEMM Matmul to enable better memory/thread utilization in aarch64 Sep 20, 2024
@@ -191,6 +191,8 @@ struct brgemm_t {
int LDB = 0;
int LDC = 0;
int LDD = 0;

int M, K, N;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

M, K, and N are literally declared above as bcast_dim, reduce_dim and load_dim. Any specific reason to add duplicated entries for them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants