Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SME] Add ZA-compatible interface and routines to save/restore SME state #264

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

sdesmalen-arm
Copy link

This implements requests to add a new "ZA-compatible" interface which can be called with ZA state being either 'off', 'active' or 'dormant', and which can preserve any and all state enabled under PSTATE.ZA.

…ate.

This implements requests to add a new "ZA-compatible" interface which
can be called with ZA state being either 'off', 'active' or 'dormant',
and which can preserve any and all state enabled under PSTATE.ZA.

.. _`private-ZA`:
.. _`shared-ZA`:
.. _`compatible-ZA`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this ZA-agnostic, for compatibility with the ACLE PR?

Subroutines with a `private-ZA`_ interface and subroutines with a `shared-ZA`_
interface can both (at their option) choose to guarantee that they
The compatible-ZA interface is intended to be called from any function
without requiring a change to PSTATE.ZA and is generally used in conjunction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this stronger, and say that the function must preserve all state associated with PSTATE.ZA.

with the expectation that it `preserves ZA`_.

Subroutines with a `private-ZA`_ interface, `shared-ZA`_ interface or
`compatible-ZA`_ interface can (at their option) choose to guarantee that they
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…and then leave this paragraph as it was originally.

@@ -2069,6 +2077,8 @@ support routines:

* the current values of TPIDR2_EL0, PSTATE.SM and PSTATE.ZA.

* the registers enabled by PSTATE.ZA, if PSTATE.ZA is enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this change referring to? It doesn't look like the patch changes __arm_sme_state.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this should not have been committed. I've removed the change.

The size is guaranteed to be a multiple of 16.

The layout that corresponds to the calculated size is unspecified,
but the assumption is that the size always matches the implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a requirement rather than an assumption.

* If bit 1 of ``OPTIONS`` is 0 and ZT0 is available, then the subroutine
stores the full contents of ZT0 to ``PTR->ZT0``.

* If bit 63 of ``OPTIONS`` is 0, then the subroutine disables PSTATE.ZA.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that it isn't compatible-ZA according to the above table, since ZA is in a different state on return.

Also, bit 31 == 0 && bit 2 == 1 would be an invalid combination, since we would potentially have a nonnull (unsaved) TPIDR2_EL0 while PSTATE.ZA==0.

+--------+-----------------------------------------+
| bits | Options |
+========+=========================================+
| 63 | Preserve ZA using lazy save mechanism |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth the complication of supporting two ways of saving ZA? Couldn't we start with the lazy version, and add the non-lazy version as a variation later if necessary?

| 62 - 3 | Zero for this revision of the AAPCS64, |
| | but reserved for future expansion |
+--------+-----------------------------------------+
| 2 | Exclude TPIDR2_EL0 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I'm not sure it's worth having a separate toggle for this and ZA.


* both bit 0 and bit 63 of ``OPTIONS`` are 1.

* If PSTATE.ZA is 0, then the subroutine enables PSTATE.ZA.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should only do this if PSTATE.ZA was 1 when __arm_sme_save was called. Similarly for the restorations below.

``PTR->TPIDR2_EL0`` at unspecified offsets in the buffer pointed to by PTR:

* If bit 63 of ``OPTIONS`` is 1 and TPIDR2_EL0 is null, then the function
copies ``PTR->BLK`` to X0 and calls ``__arm_tpidr2_restore``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
copies ``PTR->BLK`` to X0 and calls ``__arm_tpidr2_restore``.
points X0 at ``PTR->BLK`` and calls ``__arm_tpidr2_restore``.

* Renamed `compatible ZA` -> `agnostic ZA`

* Changed __arm_sme_save/restore such that the save routine should
  record whether ZA or ZT0 is saved and such that the restore routine
  should check whether ZA or ZT0 was saved. This removes the
  (previously) implicit assumption in `__arm_sme-save` that PSTATE.ZA
  must be 1 if PTR is not nullptr.

* ZA is now always saved/restored using the lazy-save mechanism.

* Changed __arm_sme_save/restore to have a custom ZA interface instead
  of 'agnostic-ZA' which was incorrect.
Copy link
Contributor

@rsandifo-arm rsandifo-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't clear to me whether we expect callers of __arm_sme_state_size to check for zero returns, and skip the save and restore calls for that case. If so (“case A”), I think this makes things unnecessarily complex for the callers. If not (“case B”), the buffer should have nonzero size even if PSTATE.ZA==0, so that it can hold at least the SAVED_ZA and SAVED_ZT0 fields.

And for case B, the save and restore functions should be callable even in threads that don't have access to SME.

In a similar vein, how about making __arm_sme_restore decide what to do based only on the SAVED_* fields, rather than passing OPTIONS again? The caller of the save and restore functions is expected to be the same, so it should know what needs to be preserved.

I wonder how useful the OPTIONS will be in practice, given that __arm_sme_save must leave PSTATE.ZA in a dormant or off state. It seems unlikely that a caller would know enough about its position in the call hierarchy/overall program to know that certain state doesn't need to be preserved, but at the same time not know enough for it to have a specific __arm_in/out/inout/preserves annotation. No objection to keeping OPTIONS if you prefer though.

as well as any other state required for `__arm_sme_save`_ and
`__arm_sme_restore`_.

`__arm_sme_state_size`_ assumes that ZA is saved lazily and will account
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is potentially confusing, since it isn't clear who is assumed to do the lazy saving. I think what this paragraph is saying comes under the previous sentence — “as well as any other state required for __arm_sme_save_ and __arm_sme_restore_” — and so it might be better just to delete this.

``PTR->BLK``, ``PTR->ZA`` and ``PTR->TPIDR2_EL0`` at unspecified offsets
in the buffer pointed to by ``PTR``:

* The full contents of ``TPIDR2_EL0`` are written to ``PTR->TPIDR2_EL0``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The full contents of ``TPIDR2_EL0`` are written to ``PTR->TPIDR2_EL0``.
* The contents of ``TPIDR2_EL0`` are written to ``PTR->TPIDR2_EL0``.

Using “full” sounded odd in reference to a system register that contains a pointer.


* The value 1 is written to ``PTR->SAVED_ZA``.

* If bit 1 of ``OPTIONS`` is 0 and ZT0 is available, then for the addresses
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that if bit 0 is 1, the function should turn PSTATE.ZA off before returning. In other words, ZA must not be active on return from this function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, you're right!

@sdesmalen-arm
Copy link
Author

It isn't clear to me whether we expect callers of __arm_sme_state_size to check for zero returns, and skip the save and restore calls for that case. If so (“case A”), I think this makes things unnecessarily complex for the callers. If not (“case B”), the buffer should have nonzero size even if PSTATE.ZA==0, so that it can hold at least the SAVED_ZA and SAVED_ZT0 fields.

And for case B, the save and restore functions should be callable even in threads that don't have access to SME.

The initial thinking was "case A", i.e. the intrinsics are only called when there is something worth saving. But now that I've added SAVED_ZA and SAVED_ZT0, it probably makes sense to remove that assumption.

I wonder how useful the OPTIONS will be in practice, given that __arm_sme_save must leave PSTATE.ZA in a dormant or off state. It seems unlikely that a caller would know enough about its position in the call hierarchy/overall program to know that certain state doesn't need to be preserved, but at the same time not know enough for it to have a specific __arm_in/out/inout/preserves annotation. No objection to keeping OPTIONS if you prefer though.

My reasons for adding OPTIONS was to give an interface that allows the routines to be used more generically, e.g. to support implementing the following case more easily using these intrinsics:

void bar() __arm_inout("zt0");
void foo() __arm_inout("za", "zt0") { bar(); }

where only "za" needs preserving across the call, but not "zt0". This could be done using explicit allocas, calling the __arm_tpidr2_* routines, but this functionality may become more involved in the future should more state be added to PSTATE.ZA.

In a similar vein, how about making __arm_sme_restore decide what to do based only on the SAVED_* fields, rather than passing OPTIONS again? The caller of the save and restore functions is expected to be the same, so it should know what needs to be preserved.

The reasoning for that was that one might only want to restore part of the saved state at one point, and restore the other part at another point. The compiler or user writing asm could use these routines to save/restore any state, e.g.

void bar();
void use_zt0() __arm_inout("zt0");
void use_za() __arm_inout("za");

void foo() __arm_inout("za", "zt0") {
               // preserve both zt0 and za, could use the __arm_sme_save routine for this.
  bar();
               // reload only zt0
  use_zt0();
               // save only zt0 (za was already saved)
  bar();
               // restore za
  use_za();
               // restore zt0
}

@rsandifo-arm
Copy link
Contributor

I wonder how useful the OPTIONS will be in practice, given that __arm_sme_save must leave PSTATE.ZA in a dormant or off state. It seems unlikely that a caller would know enough about its position in the call hierarchy/overall program to know that certain state doesn't need to be preserved, but at the same time not know enough for it to have a specific __arm_in/out/inout/preserves annotation. No objection to keeping OPTIONS if you prefer though.

My reasons for adding OPTIONS was to give an interface that allows the routines to be used more generically, e.g. to support implementing the following case more easily using these intrinsics:

void bar() __arm_inout("zt0");
void foo() __arm_inout("za", "zt0") { bar(); }

where only "za" needs preserving across the call, but not "zt0". This could be done using explicit allocas, calling the __arm_tpidr2_* routines, but this functionality may become more involved in the future should more state be added to PSTATE.ZA.

I don't think the new routines work for that case though. The routines rely on saving ZA lazily, whereas the above needs to save ZA eagerly. bar is entitled to assume that ZA is active rather than dormant on entry.

@sdesmalen-arm
Copy link
Author

I wonder how useful the OPTIONS will be in practice, given that __arm_sme_save must leave PSTATE.ZA in a dormant or off state. It seems unlikely that a caller would know enough about its position in the call hierarchy/overall program to know that certain state doesn't need to be preserved, but at the same time not know enough for it to have a specific __arm_in/out/inout/preserves annotation. No objection to keeping OPTIONS if you prefer though.

My reasons for adding OPTIONS was to give an interface that allows the routines to be used more generically, e.g. to support implementing the following case more easily using these intrinsics:

void bar() __arm_inout("zt0");
void foo() __arm_inout("za", "zt0") { bar(); }

where only "za" needs preserving across the call, but not "zt0". This could be done using explicit allocas, calling the __arm_tpidr2_* routines, but this functionality may become more involved in the future should more state be added to PSTATE.ZA.

I don't think the new routines work for that case though. The routines rely on saving ZA lazily, whereas the above needs to save ZA eagerly. bar is entitled to assume that ZA is active rather than dormant on entry.

Yes, that was the reason I initially added the "lazily" as an option to the struct. I figured it should still be possible to set up the lazy-save using these routines and then commit the lazy-save manually before the call to bar and then restore afterwards, if ZA needs saving eagerly. Or is there some subtlety that I'm missing?

@rsandifo-arm
Copy link
Contributor

Yes, that was the reason I initially added the "lazily" as an option to the struct. I figured it should still be possible to set up the lazy-save using these routines and then commit the lazy-save manually before the call to bar and then restore afterwards, if ZA needs saving eagerly. Or is there some subtlety that I'm missing?

The difference for me is that the new routines are fundamentally about processing the live ZA state (and any future SME state) such that it's safe to call a normal function. In some cases this will involve turning PSTATE.ZA off. This is different from what the mixed-sharing example needs: there the intention is explicitly to keep ZA on (and active).

The new routines are also about providing an interface that can handle unknown future state. Thus the assumption is that extra state must be stored unless explicitly turned off (via OPTIONS). This is different from the example of moving between different sharing types, since there the compiler (by necessity) knows the state that is live in the caller and the state that is shared with the callee. The compiler only wants to store the difference between the two, and not any future unknown state.

When the callee shares some ZA state, but shares less state than the caller, the caller has to do an eager save and restore of the remaining state. I think we should just treat this as being conceptually like any other caller save. Admittedly the save and restore are quite heavy operations for ZA, but in concept they're not much different from a register save and restore. Treating them like that also helps with your later example involving calls to separate __arm_inout("zt0") and __arm_inout("za") functions.

As @rsandifo-arm explained, the same routines cannot be used for
general use of saving/restoring of state enabled by PSTATE.ZA
because of the different expectations on the ZA interface
when partially saving/restoring state enabled by PSTATE.ZA.

Removing the option, drastically simplifies the logic. Note that
I have removed the two booleans (internal to the save/restore
routines) that distinguish between having saved 'ZA' and 'ZT0',
and replaced that with a single 'VALID' bit, because I think we
can assume that the save/restore routines are called by PEs that
have the same SME state.
Copy link
Contributor

@rsandifo-arm rsandifo-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for only raising this now, but I wonder if we should reconsider the case in which ZA is dormant on entry to an agnostic-ZA function. Rather than say that agnostic-ZA functions must preserve the state (and so leave ZA dormant), should we instead say that agnostic-ZA behaves like private-ZA for that case? No other PSTATE.ZA state can be live if ZA is dormant. In particular, if ZA is dormant then ZT0 must be dead.

In other words, agnostic-ZA is like private-ZA except that ZA can be active on entry. If ZA is active on entry, it must be unchanged on return.

(If in future there is any other SME state that needs to be preserved, and that is not controlled by PSTATE.ZA, then that would still need to be saved and restored.)

The dormant-on-entry case isn't too interesting for direct calls between streaming code and agnostic-ZA functions, since the streaming code wouldn't be expected to set up a lazy save in that case. But perhaps it is more relevant if streaming code calls a normal function and that normal function calls a private-ZA function. (Ideally, the normal function would be changed to private-ZA, but that might not always be possible for ABI reasons.)

Other than that, it looks good to me. Some minor comments below.

* For addresses ``PTR->BLK``, ``PTR->ZA`` and ``PTR->TPIDR2_EL0`` at
unspecified offsets in the buffer pointed to by ``PTR``:

* The subroutine aborts in some platform-specific manner if the current
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this, since:

If the thread has access to SME then it must also have access to TPIDR2_EL0.


* The current thread does not have access to SME.

* The current thread does not have access to ``TPIDR2_EL0`` when PSTATE.ZA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this can be deleted.

* X1-X15, X19-X29 and SP are call-preserved.
* Z0-Z31 are call-preserved.
* P0-P15 are call-preserved.
* the subroutine `preserves ZA`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to delete this, since “agnostic-ZA” has its own requirements about the state of ZA on return.


* The custom ``ZA`` interface has the following properties:

* If ZA state is 'off' on entry and SME state needs restoring, then it is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

active or dormant, since if ZA was dormant on entry to __arm_sme_save, and if the lazy save is later committed, we'll end up restoring the previous dormant state. Similarly for the final bullet point.

But I think we move this down immediately before “The subroutine behaves as follows:” and then just say something like:

  • The ZA state on normal return is the same as the ZA state on entry to the __arm_sme_save call that initialized PTR.

(Sentence feels a bit clunky though, so improvements definitely welcome.)

* The subroutine takes the following arguments:

PTR
a 64-bit data pointer passed in X0 that points to a buffer which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about instead saying:

  • a 64-bit data pointer in X0 that must previously have been passed to __arm_sme_save. The buffer that it points to must still contain the data written by that call.

?

* The subroutine behaves as follows:

* If ``PTR`` does not point to a valid buffer with the required size, the
behaviour of calling this routine is undefined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

behavior (and similarly throughout).

sdesmalen-arm added a commit to sdesmalen-arm/llvm-project that referenced this pull request Sep 9, 2024
sdesmalen-arm added a commit to sdesmalen-arm/llvm-project that referenced this pull request Sep 9, 2024
This implements the lowering of calls from agnostic-ZA
functions to non-agnostic-ZA functions, using the ABI routines
`__arm_sme_state_size`, `__arm_sme_save` and `__arm_sme_restore`.

This implements the proposal described in the following PRs:
* ARM-software/acle#336
* ARM-software/abi-aa#264
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants