-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalise the SME state management attributes #276
Conversation
92eedc9
to
9d319db
Compare
9d319db updates the patch to include the SME2 intrinsics. It also tries to clarify that the requirements on inline asms apply to the asm instructions themselves; they're not something that the compiler is supposed to do. |
From a cursory look, I don't see any issues. But it does need review by someone with more knowledge/experience than me. |
This patch replaces the __arm_new_za, __arm_shared_za and __arm_preserves_za attributes in favour of: * `__arm_new("za")` * `__arm_in("za")` * `__arm_out("za")` * `__arm_inout("za")` * `__arm_preserves("za")` As described in ARM-software/acle#276. One change is that `__arm_in/out/inout/preserves(S)` are all mutually exclusive, whereas previously it was fine to write `__arm_shared_za __arm_preserves_za`. This case is now represented with `__arm_in("za")`. The current implementation uses the same LLVM attributes under the hood, since `__arm_in/out/inout` are all variations of "shared ZA", so can use the existing `aarch64_pstate_za_shared` attribute in LLVM. A future patch will add support for the new "zt0" state as introduced with SME2.
This patch replaces the __arm_new_za, __arm_shared_za and __arm_preserves_za attributes in favour of: * `__arm_new("za")` * `__arm_in("za")` * `__arm_out("za")` * `__arm_inout("za")` * `__arm_preserves("za")` As described in ARM-software/acle#276. One change is that `__arm_in/out/inout/preserves(S)` are all mutually exclusive, whereas previously it was fine to write `__arm_shared_za __arm_preserves_za`. This case is now represented with `__arm_in("za")`. The current implementation uses the same LLVM attributes under the hood, since `__arm_in/out/inout` are all variations of "shared ZA", so can use the existing `aarch64_pstate_za_shared` attribute in LLVM. A future patch will add support for the new "zt0" state as introduced with SME2.
#76971) This patch replaces the `__arm_new_za`, `__arm_shared_za` and `__arm_preserves_za` attributes in favour of: * `__arm_new("za")` * `__arm_in("za")` * `__arm_out("za")` * `__arm_inout("za")` * `__arm_preserves("za")` As described in ARM-software/acle#276. One change is that `__arm_in/out/inout/preserves(S)` are all mutually exclusive, whereas previously it was fine to write `__arm_shared_za __arm_preserves_za`. This case is now represented with `__arm_in("za")`. The current implementation uses the same LLVM attributes under the hood, since `__arm_in/out/inout` are all variations of "shared ZA", so can use the existing `aarch64_pstate_za_shared` attribute in LLVM. #77941 will add support for the new "zt0" state as introduced with SME2.
Since ARM-software/acle#276 the ACLE defines attributes to better describe the use of a given SME state. Previously the attributes merely described the possibility of it being 'shared' or 'preserved', whereas the new attributes have more semantics and also describe how the data flows through the program. For ZT0 we already had to add new LLVM IR attributes: * arm_new_zt0 * arm_in_zt0 * arm_out_zt0 * arm_inout_zt0 * arm_preserves_zt0 We have now done the same for ZA, such that we add: * arm_new_za (previously `aarch64_pstate_za_new`) * arm_in_za (more specific variation of `aarch64_pstate_za_shared`) * arm_out_za (more specific variation of `aarch64_pstate_za_shared`) * arm_inout_za (more specific variation of `aarch64_pstate_za_shared`) * arm_preserves_za (previously `aarch64_pstate_za_shared, aarch64_pstate_za_preserved`) This explicitly removes 'pstate' from the name, because with SME2 and the new ACLE attributes there is a difference between "sharing ZA" (sharing the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).
20a6f51
to
e01cda9
Compare
This enables specifing "za" or "zt0" to the clobber list for inline asm. This complies with the acle SME addition to the asm extension here: ARM-software/acle#276
llvm#76971) This patch replaces the `__arm_new_za`, `__arm_shared_za` and `__arm_preserves_za` attributes in favour of: * `__arm_new("za")` * `__arm_in("za")` * `__arm_out("za")` * `__arm_inout("za")` * `__arm_preserves("za")` As described in ARM-software/acle#276. One change is that `__arm_in/out/inout/preserves(S)` are all mutually exclusive, whereas previously it was fine to write `__arm_shared_za __arm_preserves_za`. This case is now represented with `__arm_in("za")`. The current implementation uses the same LLVM attributes under the hood, since `__arm_in/out/inout` are all variations of "shared ZA", so can use the existing `aarch64_pstate_za_shared` attribute in LLVM. llvm#77941 will add support for the new "zt0" state as introduced with SME2.
Since ARM-software/acle#276 the ACLE defines attributes to better describe the use of a given SME state. Previously the attributes merely described the possibility of it being 'shared' or 'preserved', whereas the new attributes have more semantics and also describe how the data flows through the program. For ZT0 we already had to add new LLVM IR attributes: * aarch64_new_zt0 * aarch64_in_zt0 * aarch64_out_zt0 * aarch64_inout_zt0 * aarch64_preserves_zt0 We have now done the same for ZA, such that we add: * aarch64_new_za (previously `aarch64_pstate_za_new`) * aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_inout_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_preserves_za (previously `aarch64_pstate_za_shared, aarch64_pstate_za_preserved`) This explicitly removes 'pstate' from the name, because with SME2 and the new ACLE attributes there is a difference between "sharing ZA" (sharing the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).
main/acle.md
Outdated
@@ -362,6 +362,9 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin | |||
* Added description of SVE reinterpret intrinsics. | |||
* Changes for [Function Multi Versioning](#function-multi-versioning): | |||
* Added [MOPS](#memcpy-family-of-operations-intrinsics---mops). | |||
* Added a [State management](#state-management) section, replacing the | |||
`__arm_shared_za` and `__arm_new_za` attributes in the previous Alpha |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`__arm_shared_za` and `__arm_new_za` attributes in the previous Alpha | |
`__arm_shared_za`, `__arm_preserves_za` and `__arm_new_za` attributes in the previous Alpha |
|
||
However, there are some pieces of architectural state for which this | ||
approach is not appropriate. For example, [SME's ZA](#za-storage) is a | ||
single piece of storage: there are not multiple ZAs, and so it does |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
single piece of storage: there are not multiple ZAs, and so it does | |
single piece of storage: there are no multiple ZAs, and so it does |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this change is correct.
f_private(); // ZA is not live, no save necessary | ||
|
||
setup_za(); // ZA is live after this call | ||
f_shares_zt0(); // The compiler should save and restore ZA | ||
// around the call ("caller-save") | ||
f_shares_za(); // ZA is live before and after this call | ||
f_private(); // The compiler should preserve ZA across the call | ||
// It can use the lazy-save mechanism | ||
use_za(); // ZA is no longer live after this call | ||
|
||
f_private(); // ZA is not live, no save necessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also describe the liveness of ZT0 (and the need to save/restore it) as part of the example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. In the end it seemed simpler to use a separate example, since it became a bit unwieldly otherwise.
main/acle.md
Outdated
|
||
If an asm takes this option for state that is controlled by PSTATE.ZA, | ||
the asm itself is responsible for handling the [[AAPCS64]](#AAPCS64) | ||
lazy save scheme, as described below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this sentence a bit confusing. What exactly is it that is described below? It's not really the lazy-save scheme, as this is referenced but not described.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried saying various alternative things, but I'm not sure they added much, so in the end I just deleted “, as described below”.
ZA is guaranteed to be active on entry to an inline asm A in a | ||
function F if at least one of the following is true: | ||
|
||
* F [uses `"za"`](#uses-state) and A's clobber list includes `"za"`. | ||
|
||
* F [uses `"zt0"`](#uses-state) and A's clobber list includes `"zt0"`. | ||
|
||
Otherwise, ZA can be in any state on entry to A if at least one of the | ||
following is true: | ||
|
||
* F [uses](#uses-state) `"za"` | ||
|
||
* F [uses](#uses-state) `"zt0"` | ||
|
||
Otherwise, ZA can be off or dormant on entry to A, as for what AAPCS64 | ||
calls “private-ZA” functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be phrased the other way around:
- If F uses state S and A's clobber list includes S, then ZA is guaranteed to be active on entry to A.
- If F uses state S and A's clobber list does not include S, then ZA can be in any state on entry to A.
- If F does not use state S, then ZA can be off or dormant on entry to A.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difficulty with breaking it down like this is that the active/dormant/off states depend on both ZA and ZT0 together. So:
- If F uses state S and A's clobber list does not include S, then ZA can be in any state on entry to A.
- If F does not use state S, then ZA can be off or dormant on entry to A.
is not true for S==ZA if F uses ZT0 and A's clobber list includes ZT0. In particular, the last point would contradict the first point, since the first point would say ZA must be active (for S==ZT0), whereas the last point would say it must dormant or off (for S==ZA).
So I think it needs to be a chain of if-then-elses that deals with all PSTATE.ZA-based state together.
I can simplify the middle one to:
Otherwise, ZA can be in any state on entry to A if F uses
"za"
or"zt0"
if you think that's better. I wrote it the current way partly for consistency with the first “if”, partly to mimic the rules-based language in the new ISA supplements, and partly to avoid ambiguity about whether “or” includes “and” (which would become harder to deal with if there were ever more than two pieces of state controlled by PSTATE.ZA).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah fair point. In that case I'm happy with the way it's written, I don't think combining those two bullet points into one makes a big difference, and at least it's now consistent with the case above.
Since ARM-software/acle#276 the ACLE defines attributes to better describe the use of a given SME state. Previously the attributes merely described the possibility of it being 'shared' or 'preserved', whereas the new attributes have more semantics and also describe how the data flows through the program. For ZT0 we already had to add new LLVM IR attributes: * aarch64_new_zt0 * aarch64_in_zt0 * aarch64_out_zt0 * aarch64_inout_zt0 * aarch64_preserves_zt0 We have now done the same for ZA, such that we add: * aarch64_new_za (previously `aarch64_pstate_za_new`) * aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_inout_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_preserves_za (previously `aarch64_pstate_za_shared, aarch64_pstate_za_preserved`) This explicitly removes 'pstate' from the name, because with SME2 and the new ACLE attributes there is a difference between "sharing ZA" (sharing the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments! This PR looks good to me now.
This patch replaces __arm_shared_za and __arm_new_za with more general attributes. The purpose is twofold: * To allow the same approach to be taken for ZT0, and for any other similar state that is added in future. * To allow the programmer to give the compiler more information about when state is live (if the programmer wants to). __arm_shared_za is directly equivalent to __arm_inout("za"). Any code that is already using __arm_shared_za, or that prefers that syntax for some reason, could just #define one to the other. Similarly, __arm_new_za is directly equivalent to __arm_new("za"). The patch also removes __arm_preserves_za and replaces it with a more restricted __arm_preserves attribute. The old __arm_preserves_za could be attached to both private-ZA and shared-ZA functions. However, the private-ZA version had somewhat dubious semantics: * It made a promise about how ZA would be handled by a C/C++ function that doesn't use ZA directly. It wasn't obvious how the burden of keeping that promise was distributed between the programmer and the compiler. (The feature was always intended to be low-level.) * The semantics for private-ZA functions meant that callers would still need to prepare a lazy save buffer. __arm_preserves_za just meant that they could avoid having to restore from it afterwards. I'm hoping to replace the private-ZA form of __arm_preserves_za with an alternative, optional, extension that avoids the need for the lazy save buffer, and that would be handled entirely by the compiler. In contrast, __arm_preserves("za") makes a function shared-ZA. Co-authored-by: Sander de Smalen <[email protected]>
ba68f64
to
aecd9e0
Compare
Since ARM-software/acle#276 the ACLE defines attributes to better describe the use of a given SME state. Previously the attributes merely described the possibility of it being 'shared' or 'preserved', whereas the new attributes have more semantics and also describe how the data flows through the program. For ZT0 we already had to add new LLVM IR attributes: * aarch64_new_zt0 * aarch64_in_zt0 * aarch64_out_zt0 * aarch64_inout_zt0 * aarch64_preserves_zt0 We have now done the same for ZA, such that we add: * aarch64_new_za (previously `aarch64_pstate_za_new`) * aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_inout_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_preserves_za (previously `aarch64_pstate_za_shared, aarch64_pstate_za_preserved`) This explicitly removes 'pstate' from the name, because with SME2 and the new ACLE attributes there is a difference between "sharing ZA" (sharing the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).
) Since ARM-software/acle#276 the ACLE defines attributes to better describe the use of a given SME state. Previously the attributes merely described the possibility of it being 'shared' or 'preserved', whereas the new attributes have more semantics and also describe how the data flows through the program. For ZT0 we already had to add new LLVM IR attributes: * aarch64_new_zt0 * aarch64_in_zt0 * aarch64_out_zt0 * aarch64_inout_zt0 * aarch64_preserves_zt0 We have now done the same for ZA, such that we add: * aarch64_new_za (previously `aarch64_pstate_za_new`) * aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_inout_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_preserves_za (previously `aarch64_pstate_za_shared, aarch64_pstate_za_preserved`) This explicitly removes 'pstate' from the name, because with SME2 and the new ACLE attributes there is a difference between "sharing ZA" (sharing the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).
) Since ARM-software/acle#276 the ACLE defines attributes to better describe the use of a given SME state. Previously the attributes merely described the possibility of it being 'shared' or 'preserved', whereas the new attributes have more semantics and also describe how the data flows through the program. For ZT0 we already had to add new LLVM IR attributes: * aarch64_new_zt0 * aarch64_in_zt0 * aarch64_out_zt0 * aarch64_inout_zt0 * aarch64_preserves_zt0 We have now done the same for ZA, such that we add: * aarch64_new_za (previously `aarch64_pstate_za_new`) * aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_inout_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_preserves_za (previously `aarch64_pstate_za_shared, aarch64_pstate_za_preserved`) This explicitly removes 'pstate' from the name, because with SME2 and the new ACLE attributes there is a difference between "sharing ZA" (sharing the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).
This enables specifing "za" or "zt0" to the clobber list for inline asm. This complies with the acle SME addition to the asm extension here: ARM-software/acle#276
) Since ARM-software/acle#276 the ACLE defines attributes to better describe the use of a given SME state. Previously the attributes merely described the possibility of it being 'shared' or 'preserved', whereas the new attributes have more semantics and also describe how the data flows through the program. For ZT0 we already had to add new LLVM IR attributes: * aarch64_new_zt0 * aarch64_in_zt0 * aarch64_out_zt0 * aarch64_inout_zt0 * aarch64_preserves_zt0 We have now done the same for ZA, such that we add: * aarch64_new_za (previously `aarch64_pstate_za_new`) * aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_inout_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_preserves_za (previously `aarch64_pstate_za_shared, aarch64_pstate_za_preserved`) This explicitly removes 'pstate' from the name, because with SME2 and the new ACLE attributes there is a difference between "sharing ZA" (sharing the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).
This enables specifing "za" or "zt0" to the clobber list for inline asm. This complies with the acle SME addition to the asm extension here: ARM-software/acle#276
This enables specifing "za" or "zt0" to the clobber list for inline asm. This complies with the acle SME addition to the asm extension here: ARM-software/acle#276 (cherry picked from commit d9c20e4)
This enables specifing "za" or "zt0" to the clobber list for inline asm. This complies with the acle SME addition to the asm extension here: ARM-software/acle#276 (cherry picked from commit d9c20e4)
This enables specifing "za" or "zt0" to the clobber list for inline asm. This complies with the acle SME addition to the asm extension here: ARM-software/acle#276 (cherry picked from commit d9c20e4)
This patch replaces __arm_shared_za and __arm_new_za with more general attributes. The purpose is twofold: * To allow the same approach to be taken for ZT0, and for any other similar state that is added in future. * To allow the programmer to give the compiler more information about when state is live (if the programmer wants to). __arm_shared_za is directly equivalent to __arm_inout("za"). Any code that is already using __arm_shared_za, or that prefers that syntax for some reason, could just #define one to the other. Similarly, __arm_new_za is directly equivalent to __arm_new("za"). The patch also removes __arm_preserves_za and replaces it with a more restricted __arm_preserves attribute. The old __arm_preserves_za could be attached to both private-ZA and shared-ZA functions. However, the private-ZA version had somewhat dubious semantics: * It made a promise about how ZA would be handled by a C/C++ function that doesn't use ZA directly. It wasn't obvious how the burden of keeping that promise was distributed between the programmer and the compiler. (The feature was always intended to be low-level.) * The semantics for private-ZA functions meant that callers would still need to prepare a lazy save buffer. __arm_preserves_za just meant that they could avoid having to restore from it afterwards. I'm hoping to replace the private-ZA form of __arm_preserves_za with an alternative, optional, extension that avoids the need for the lazy save buffer, and that would be handled entirely by the compiler. In contrast, __arm_preserves("za") makes a function shared-ZA. Co-authored-by: Sander de Smalen <[email protected]>
} | ||
``` | ||
|
||
ZT0 cannot be lazily saved, so if ZT0 is live before a call to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
saved lazily - don't split the verb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One tiny change
This patch replaces __arm_shared_za and __arm_new_za with more general attributes. The purpose is twofold:
To allow the same approach to be taken for ZT0, and for any other similar state that is added in future.
To allow the programmer to give the compiler more information about when state is live (if the programmer wants to).
__arm_shared_za is directly equivalent to __arm_inout("za"). Any code that is already using __arm_shared_za, or that prefers that syntax for some reason, could just #define one to the other. Similarly, __arm_new_za is directly equivalent to __arm_new("za").
The patch also removes __arm_preserves_za and replaces it with a more restricted __arm_preserves attribute. The old __arm_preserves_za could be attached to both private-ZA and shared-ZA functions. However, the private-ZA version had somewhat dubious semantics:
It made a promise about how ZA would be handled by a C/C++ function that doesn't use ZA directly. It wasn't obvious how the burden of keeping that promise was distributed between the programmer and the compiler. (The feature was always intended to be low-level.)
The semantics for private-ZA functions meant that callers would still need to prepare a lazy save buffer. __arm_preserves_za just meant that they could avoid having to restore from it afterwards.
I'm hoping to replace the private-ZA form of __arm_preserves_za with an alternative, optional, extension that avoids the need for the lazy save buffer, and that would be handled entirely by the compiler.
In contrast, __arm_preserves("za") makes a function shared-ZA.
name: Pull request
about: Technical issues, document format problems, bugs in scripts or feature proposal.
Thank you for submitting a pull request!
If this PR is about a bugfix:
Please use the bugfix label and make sure to go through the checklist below.
If this PR is about a proposal:
We are looking forward to evaluate your proposal, and if possible to
make it part of the Arm C Language Extension (ACLE) specifications.
We would like to encourage you reading through the contribution
guidelines, in particular the section on submitting
a proposal.
Please use the proposal label.
As for any pull request, please make sure to go through the below
checklist.
Checklist: (mark with
X
those which apply)PR (do not bother creating the issue if all you want to do is
fixing the bug yourself).
SPDX-FileCopyrightText
lines on topof any file I have edited. Format is
SPDX-FileCopyrightText: Copyright {year} {entity or name} <{contact informations}>
(Please update existing copyright lines if applicable. You can
specify year ranges with hyphen , as in
2017-2019
, and usecommas to separate gaps, as in
2018-2020, 2022
).Copyright
section of the sources of thespecification I have edited (this will show up in the text
rendered in the PDF and other output format supported). The
format is the same described in the previous item.
tricky to set up on non-*nix machines). The sequence can be
found in the contribution
guidelines. Don't
worry if you cannot run these scripts on your machine, your
patch will be automatically checked in the Actions of the pull
request.
introduced in this PR in the section Changes for next
release of the section Change Control/Document history
of the document. Create Changes for next release if it does
not exist. Notice that changes that are not modifying the
content and rendering of the specifications (both HTML and PDF)
do not need to be listed.
correctness of the result in the PDF output (please refer to the
instructions on how to build the PDFs
locally).
draftversion
is set totrue
in the YAML headerof the sources of the specifications I have modified.
in the README page of the project.