From aeccbc9e3019e967d3d3efa0ea4a361900b857ff Mon Sep 17 00:00:00 2001 From: Sander de Smalen Date: Fri, 7 Jun 2024 11:26:33 +0100 Subject: [PATCH 1/6] [SME] Add ZA-compatible interface and routines to save/restore SME state. This implements requests to add a new "ZA-compatible" interface which can be called with ZA state being either 'off', 'active' or 'dormant', and which can preserve any and all state enabled under PSTATE.ZA. --- aapcs64/aapcs64.rst | 252 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 249 insertions(+), 3 deletions(-) diff --git a/aapcs64/aapcs64.rst b/aapcs64/aapcs64.rst index 1c134f1..759937d 100644 --- a/aapcs64/aapcs64.rst +++ b/aapcs64/aapcs64.rst @@ -1749,10 +1749,11 @@ ZA interfaces As noted in `ZA states`_, there are three possible ZA states: off, dormant, and active. A subroutine's “ZA interface” specifies the possible states of ZA on entry to a subroutine and the possible states of ZA on a -`normal return`_. The AAPCS64 defines two types of ZA interface: +`normal return`_. The AAPCS64 defines three types of ZA interface: .. _`private-ZA`: .. _`shared-ZA`: +.. _`compatible-ZA`: +-------------------+-------------------+---------------------------+ | Type of interface | ZA state on entry | ZA state on normal return | @@ -1761,6 +1762,9 @@ states of ZA on entry to a subroutine and the possible states of ZA on a +-------------------+-------------------+---------------------------+ | shared ZA | active | active | +-------------------+-------------------+---------------------------+ +| compatible ZA | active, dormant | unchanged | +| | or off | | ++-------------------+-------------------+---------------------------+ Every subroutine has exactly one ZA interface. A subroutine's ZA interface is independent of all other aspects of its interface. Callers must know @@ -1776,8 +1780,12 @@ The shared-ZA interface is so called because it allows the subroutine to share ZA contents with its caller. This can be useful if an SME operation is split into several cooperating subroutines. -Subroutines with a `private-ZA`_ interface and subroutines with a `shared-ZA`_ -interface can both (at their option) choose to guarantee that they +The compatible-ZA interface is intended to be called from any function +without requiring a change to PSTATE.ZA and is generally used in conjunction +with the expectation that it `preserves ZA`_. + +Subroutines with a `private-ZA`_ interface, `shared-ZA`_ interface or +`compatible-ZA`_ interface can (at their option) choose to guarantee that they `preserve ZA`_. Parameter passing @@ -2081,6 +2089,17 @@ support routines: ``__arm_get_current_vg`` Provides a safe way to detect the current value of VG. +``__arm_sme_state_size`` + Provides a simple way to query the total size required to save the requested + state. + +``__arm_sme_save`` + Provides a safe way to save all state enabled by PSTATE.ZA. + +``__arm_sme_restore`` + Provides a safe way to restore all state enabled by PSTATE.ZA from a buffer. + + ``__arm_sme_state`` ^^^^^^^^^^^^^^^^^^^ @@ -2305,6 +2324,233 @@ value of VG, with the subroutine having the following properties: * Otherwise, the subroutine returns the value 0 in X0. + +``__arm_sme_state_size`` +^^^^^^^^^^^^^^^^^^^^^^^^ + +**(Beta)** + +Platforms that support SME must provide a subroutine that returns a size +that is large enough to represent all state enabled by PSTATE.ZA. + +* The subroutine is called ``__arm_sme_state_size``. + +* The subroutine has a `compatible-ZA`_ `streaming-compatible interface`_ with + the following properties: + + * X1-X15, X19-X29 and SP are call-preserved. + * Z0-Z31 are call-preserved. + * P0-P15 are call-preserved. + * the subroutine `preserves ZA`_. + +* The subroutine takes the following argument: + + OPTIONS + a 64-bit value passed in X0 describing the following options: + + +--------+-----------------------------------------+ + | bits | Options | + +========+=========================================+ + | 63 | Preserve ZA using lazy save mechanism | + +--------+-----------------------------------------+ + | 62 - 3 | Zero for this revision of the AAPCS64, | + | | but reserved for future expansion | + +--------+-----------------------------------------+ + | 2 | Exclude TPIDR2_EL0 | + +--------+-----------------------------------------+ + | 1 | Exclude ZT0 | + +--------+-----------------------------------------+ + | 0 | Exclude ZA | + +--------+-----------------------------------------+ + + If bit 63 is 1, then the size calculation will include the allocation + of a TPIDR2 block. + + A value of 0 means that all SME state will be considered in the size + calculation. + +* The subroutine returns an unsigned double word in X0 that represents + a size in bytes that is large enough to represent all state enabled by + PSTATE.ZA, predicated on the requirements specified in ``OPTIONS``. + The size is guaranteed to be a multiple of 16. + + The layout that corresponds to the calculated size is unspecified, + but the assumption is that the size always matches the implementation + of the `__arm_sme_save`_ and `__arm_sme_restore`_ routines. + +* The subroutine behaves as follows: + + * If both bit 0 and bit 63 of ``OPTIONS`` are 1, then the subroutine aborts + in some platform-specific manner. + + * If the current thread has access to FEAT_SME and PSTATE.ZA is 1, + X0 contains the total size required to represent all SME state enabled + under PSTATE.ZA, with the exception of the state requested to be excluded + as described in ``OPTIONS``. + + * Otherwise, X0 is 0. + + +``__arm_sme_save`` +^^^^^^^^^^^^^^^^^^ + +**(Beta)** + +Platforms that support SME must provide a subroutine to save any state enabled +by PSTATE.ZA. + +* The subroutine is called ``__arm_sme_save``. + +* The subroutine has a `compatible-ZA`_ `streaming-compatible interface`_ with + the following properties: + + * X2-X15, X19-X29 and SP are call-preserved. + * Z0-Z31 are call-preserved. + * P0-P15 are call-preserved. + +* The subroutine takes the following arguments: + + OPTIONS + a 64-bit value passed in X0 describing the following options: + + +--------+-----------------------------------------+ + | bits | Options | + +========+=========================================+ + | 63 | Preserve ZA using lazy save mechanism | + +--------+-----------------------------------------+ + | 62 - 3 | Zero for this revision of the AAPCS64, | + | | but reserved for future expansion | + +--------+-----------------------------------------+ + | 2 | Exclude TPIDR2_EL0 | + +--------+-----------------------------------------+ + | 1 | Exclude ZT0 | + +--------+-----------------------------------------+ + | 0 | Exclude ZA | + +--------+-----------------------------------------+ + + A value of 0 means all SME state will be saved. + + PTR + a 64-bit data pointer passed in X1 that points to a buffer which is + guaranteed to be large enough to represent all SME state for the + requirements specified by ``OPTIONS``. + +* The subroutine does not return a value. + +* The subroutine behaves as follows: + + * If PTR is null, then the subroutine does nothing. + + * The subroutine aborts in some platform-specific manner if either of the + following conditions is true: + + * the current thread does not have access to SME or PSTATE.ZA is 0. + + * the current thread does not have access to TPIDR2_EL0 and bit 2 of + ``OPTIONS`` is 0. + + * both bit 0 and bit 63 of ``OPTIONS`` are 1. + + * For addresses ``PTR->BLK``, ``PTR->ZA_BUFFER``, ``PTR->ZA``, ``PTR->ZT0`` + and ``PTR->TPIDR2_EL0`` at unspecified offsets in the buffer pointed to by + PTR: + + * If bit 2 of ``OPTIONS`` is 0, the subroutine stores the contents of + TPIDR2_EL0 to ``PTR->TPIDR2_EL0``. + + * If bit 63 of ``OPTIONS`` is 1, then the subroutine sets up a lazy-save by + storing the address ``PTR->ZA_BUFFER`` to ``PTR->BLK.za_save_buffer``, + storing the streaming vector length in bytes (``SVL.B``) to + ``PTR->BLK.num_za_save_slices`` and copying the address ``PTR->BLK`` to + ``TPIDR2_EL0``. + + * If bit 0 of ``OPTIONS`` is 0, then the subroutine stores the entire + contents of ZA to ``PTR->ZA_BUFFER``. + + * If bit 1 of ``OPTIONS`` is 0 and ZT0 is available, then the subroutine + stores the full contents of ZT0 to ``PTR->ZT0``. + + * If bit 63 of ``OPTIONS`` is 0, then the subroutine disables PSTATE.ZA. + + +``__arm_sme_restore`` +^^^^^^^^^^^^^^^^^^^^^ + +**(Beta)** + +Platforms that support SME must provide a subroutine to restore any state +enabled by PSTATE.ZA. + +* The subroutine is called ``__arm_sme_restore``. + +* The subroutine has a `compatible-ZA`_ `streaming-compatible interface`_ with + the following properties: + + * X2-X15, X19-X29 and SP are call-preserved. + * Z0-Z31 are call-preserved. + * P0-P15 are call-preserved. + +* The subroutine takes the following arguments: + + OPTIONS + a 64-bit value passed in X0 describing the following options: + + +--------+-----------------------------------------+ + | bits | Options | + +========+=========================================+ + | 63 | Preserve ZA using lazy save mechanism | + +--------+-----------------------------------------+ + | 62 - 3 | Zero for this revision of the AAPCS64, | + | | but reserved for future expansion | + +--------+-----------------------------------------+ + | 2 | Exclude TPIDR2_EL0 | + +--------+-----------------------------------------+ + | 1 | Exclude ZT0 | + +--------+-----------------------------------------+ + | 0 | Exclude ZA | + +--------+-----------------------------------------+ + + A value of 0 means all SME state will be restored. + + PTR + a 64-bit data pointer passed in X1 that points to a buffer which is + guaranteed to be large enough to represent all SME state for the + requirements specified by ``OPTIONS``. + +* The subroutine does not return a value. + +* The subroutine behaves as follows: + + * If PTR is null, then the subroutine does nothing. + + * The subroutine aborts in some platform-specific manner if either of the + following conditions is true: + + * the current thread does not have access to SME or PSTATE.ZA is 0. + + * the current thread does not have access to TPIDR2_EL0 and bit 2 of + ``OPTIONS`` is 0. + + * both bit 0 and bit 63 of ``OPTIONS`` are 1. + + * If PSTATE.ZA is 0, then the subroutine enables PSTATE.ZA. + + * For addresses ``PTR->BLK``, ``PTR->ZA``, ``PTR->ZT0`` and + ``PTR->TPIDR2_EL0`` at unspecified offsets in the buffer pointed to by PTR: + + * If bit 63 of ``OPTIONS`` is 1 and TPIDR2_EL0 is null, then the function + copies ``PTR->BLK`` to X0 and calls ``__arm_tpidr2_restore``. + + * If bit 0 of ``OPTIONS`` is 0, then the subroutine restores the entire + contents of ZA from ``PTR->ZA``. + + * If bit 1 of ``OPTIONS`` is 0 and ZT0 is available, then the subroutine + restores the entire contents of ZT0 from ``PTR->ZT0``. + + * If bit 2 of ``OPTIONS`` is 0, the subroutine restores the contents of + TPIDR2_EL0 from ``PTR->TPIDR2_EL0``. + + Pseudo-code examples ==================== From 6f1bba8834f22d685e6fbc898cd2715c0ad16eb8 Mon Sep 17 00:00:00 2001 From: Sander de Smalen Date: Thu, 22 Aug 2024 09:38:48 +0100 Subject: [PATCH 2/6] Address review comments. * Renamed `compatible ZA` -> `agnostic ZA` * Changed __arm_sme_save/restore such that the save routine should record whether ZA or ZT0 is saved and such that the restore routine should check whether ZA or ZT0 was saved. This removes the (previously) implicit assumption in `__arm_sme-save` that PSTATE.ZA must be 1 if PTR is not nullptr. * ZA is now always saved/restored using the lazy-save mechanism. * Changed __arm_sme_save/restore to have a custom ZA interface instead of 'agnostic-ZA' which was incorrect. --- aapcs64/aapcs64.rst | 145 ++++++++++++++++++++------------------------ 1 file changed, 67 insertions(+), 78 deletions(-) diff --git a/aapcs64/aapcs64.rst b/aapcs64/aapcs64.rst index 759937d..0f6f102 100644 --- a/aapcs64/aapcs64.rst +++ b/aapcs64/aapcs64.rst @@ -1753,7 +1753,8 @@ states of ZA on entry to a subroutine and the possible states of ZA on a .. _`private-ZA`: .. _`shared-ZA`: -.. _`compatible-ZA`: +.. _`agnostic-ZA`: +.. _`restore-ZA`: +-------------------+-------------------+---------------------------+ | Type of interface | ZA state on entry | ZA state on normal return | @@ -1762,7 +1763,7 @@ states of ZA on entry to a subroutine and the possible states of ZA on a +-------------------+-------------------+---------------------------+ | shared ZA | active | active | +-------------------+-------------------+---------------------------+ -| compatible ZA | active, dormant | unchanged | +| agnostic ZA | active, dormant | unchanged | | | or off | | +-------------------+-------------------+---------------------------+ @@ -1780,12 +1781,12 @@ The shared-ZA interface is so called because it allows the subroutine to share ZA contents with its caller. This can be useful if an SME operation is split into several cooperating subroutines. -The compatible-ZA interface is intended to be called from any function -without requiring a change to PSTATE.ZA and is generally used in conjunction -with the expectation that it `preserves ZA`_. +The agnostic-ZA interface is intended to be called from any function without +requiring a change to PSTATE.ZA and must preserve all state associated with +PSTATE.ZA. -Subroutines with a `private-ZA`_ interface, `shared-ZA`_ interface or -`compatible-ZA`_ interface can (at their option) choose to guarantee that they +Subroutines with a `private-ZA`_ interface and subroutines with a `shared-ZA`_ +interface can both (at their option) choose to guarantee that they `preserve ZA`_. Parameter passing @@ -2094,10 +2095,10 @@ support routines: state. ``__arm_sme_save`` - Provides a safe way to save all state enabled by PSTATE.ZA. + Provides a safe way to save state enabled by PSTATE.ZA to a buffer. ``__arm_sme_restore`` - Provides a safe way to restore all state enabled by PSTATE.ZA from a buffer. + Provides a safe way to restore state enabled by PSTATE.ZA from a buffer. ``__arm_sme_state`` @@ -2335,7 +2336,7 @@ that is large enough to represent all state enabled by PSTATE.ZA. * The subroutine is called ``__arm_sme_state_size``. -* The subroutine has a `compatible-ZA`_ `streaming-compatible interface`_ with +* The subroutine has an `agnostic-ZA`_ `streaming-compatible interface`_ with the following properties: * X1-X15, X19-X29 and SP are call-preserved. @@ -2351,42 +2352,37 @@ that is large enough to represent all state enabled by PSTATE.ZA. +--------+-----------------------------------------+ | bits | Options | +========+=========================================+ - | 63 | Preserve ZA using lazy save mechanism | - +--------+-----------------------------------------+ - | 62 - 3 | Zero for this revision of the AAPCS64, | + | 63 - 2 | Zero for this revision of the AAPCS64, | | | but reserved for future expansion | +--------+-----------------------------------------+ - | 2 | Exclude TPIDR2_EL0 | - +--------+-----------------------------------------+ | 1 | Exclude ZT0 | +--------+-----------------------------------------+ | 0 | Exclude ZA | +--------+-----------------------------------------+ - If bit 63 is 1, then the size calculation will include the allocation - of a TPIDR2 block. - A value of 0 means that all SME state will be considered in the size calculation. * The subroutine returns an unsigned double word in X0 that represents a size in bytes that is large enough to represent all state enabled by - PSTATE.ZA, predicated on the requirements specified in ``OPTIONS``. - The size is guaranteed to be a multiple of 16. + PSTATE.ZA, predicated on the requirements specified in ``OPTIONS``, + as well as any other state required for `__arm_sme_save`_ and + `__arm_sme_restore`_. - The layout that corresponds to the calculated size is unspecified, - but the assumption is that the size always matches the implementation - of the `__arm_sme_save`_ and `__arm_sme_restore`_ routines. + `__arm_sme_state_size`_ assumes that ZA is saved lazily and will account + for the save of ``TPIDR2_EL0``. -* The subroutine behaves as follows: + The exact layout used to calculate the size is unspecified. The + implementations of `__arm_sme_save`_ and `__arm_sme_restore`_ and + `__arm_sme_state_size`_ must all assume the same layout. + + The size is guaranteed to be a multiple of 16. - * If both bit 0 and bit 63 of ``OPTIONS`` are 1, then the subroutine aborts - in some platform-specific manner. +* The subroutine behaves as follows: * If the current thread has access to FEAT_SME and PSTATE.ZA is 1, X0 contains the total size required to represent all SME state enabled - under PSTATE.ZA, with the exception of the state requested to be excluded - as described in ``OPTIONS``. + under PSTATE.ZA predicated on the requirements specified in ``OPTIONS``. * Otherwise, X0 is 0. @@ -2401,7 +2397,7 @@ by PSTATE.ZA. * The subroutine is called ``__arm_sme_save``. -* The subroutine has a `compatible-ZA`_ `streaming-compatible interface`_ with +* The subroutine has a custom ``ZA`` `streaming-compatible interface`_ with the following properties: * X2-X15, X19-X29 and SP are call-preserved. @@ -2416,13 +2412,9 @@ by PSTATE.ZA. +--------+-----------------------------------------+ | bits | Options | +========+=========================================+ - | 63 | Preserve ZA using lazy save mechanism | - +--------+-----------------------------------------+ - | 62 - 3 | Zero for this revision of the AAPCS64, | + | 63 - 2 | Zero for this revision of the AAPCS64, | | | but reserved for future expansion | +--------+-----------------------------------------+ - | 2 | Exclude TPIDR2_EL0 | - +--------+-----------------------------------------+ | 1 | Exclude ZT0 | +--------+-----------------------------------------+ | 0 | Exclude ZA | @@ -2439,39 +2431,39 @@ by PSTATE.ZA. * The subroutine behaves as follows: - * If PTR is null, then the subroutine does nothing. - * The subroutine aborts in some platform-specific manner if either of the following conditions is true: - * the current thread does not have access to SME or PSTATE.ZA is 0. + * The current thread does not have access to SME. + + * The current thread does not have access to ``TPIDR2_EL0`` when + PSTATE.ZA is 1. - * the current thread does not have access to TPIDR2_EL0 and bit 2 of - ``OPTIONS`` is 0. + * If ``PTR`` does not point to a valid buffer with the required size, the + behaviour of calling this routine is undefined. - * both bit 0 and bit 63 of ``OPTIONS`` are 1. + * If PSTATE.ZA is 0, the subroutine does nothing. - * For addresses ``PTR->BLK``, ``PTR->ZA_BUFFER``, ``PTR->ZA``, ``PTR->ZT0`` - and ``PTR->TPIDR2_EL0`` at unspecified offsets in the buffer pointed to by - PTR: + * If bit 0 of ``OPTIONS`` is 0, then for addresses ``PTR->SAVED_ZA``, + ``PTR->BLK``, ``PTR->ZA`` and ``PTR->TPIDR2_EL0`` at unspecified offsets + in the buffer pointed to by ``PTR``: - * If bit 2 of ``OPTIONS`` is 0, the subroutine stores the contents of - TPIDR2_EL0 to ``PTR->TPIDR2_EL0``. + * The full contents of ``TPIDR2_EL0`` are written to ``PTR->TPIDR2_EL0``. - * If bit 63 of ``OPTIONS`` is 1, then the subroutine sets up a lazy-save by - storing the address ``PTR->ZA_BUFFER`` to ``PTR->BLK.za_save_buffer``, - storing the streaming vector length in bytes (``SVL.B``) to - ``PTR->BLK.num_za_save_slices`` and copying the address ``PTR->BLK`` to - ``TPIDR2_EL0``. + * The address ``PTR->ZA`` is written to ``PTR->BLK.za_save_buffer``, + the streaming vector length in bytes (``SVL.B``) is written to + ``PTR->BLK.num_za_save_slices`` and the address ``PTR->BLK`` is + written to ``TPIDR2_EL0``, thus setting up a lazy save. - * If bit 0 of ``OPTIONS`` is 0, then the subroutine stores the entire - contents of ZA to ``PTR->ZA_BUFFER``. + * The value 1 is written to ``PTR->SAVED_ZA``. - * If bit 1 of ``OPTIONS`` is 0 and ZT0 is available, then the subroutine - stores the full contents of ZT0 to ``PTR->ZT0``. + * If bit 1 of ``OPTIONS`` is 0 and ZT0 is available, then for the addresses + ``PTR->SAVED_ZT0`` and ``PTR->ZT0`` at unspecified offsets in the + buffer pointed to by ``PTR``: - * If bit 63 of ``OPTIONS`` is 0, then the subroutine disables PSTATE.ZA. + * The full contents of ZT0 are written to ``PTR->ZT0``. + * The value 1 is written to ``PTR->SAVED_ZT0``. ``__arm_sme_restore`` ^^^^^^^^^^^^^^^^^^^^^ @@ -2483,7 +2475,7 @@ enabled by PSTATE.ZA. * The subroutine is called ``__arm_sme_restore``. -* The subroutine has a `compatible-ZA`_ `streaming-compatible interface`_ with +* The subroutine has a custom ``ZA`` `streaming-compatible interface`_ with the following properties: * X2-X15, X19-X29 and SP are call-preserved. @@ -2498,13 +2490,9 @@ enabled by PSTATE.ZA. +--------+-----------------------------------------+ | bits | Options | +========+=========================================+ - | 63 | Preserve ZA using lazy save mechanism | - +--------+-----------------------------------------+ - | 62 - 3 | Zero for this revision of the AAPCS64, | + | 63 - 2 | Zero for this revision of the AAPCS64, | | | but reserved for future expansion | +--------+-----------------------------------------+ - | 2 | Exclude TPIDR2_EL0 | - +--------+-----------------------------------------+ | 1 | Exclude ZT0 | +--------+-----------------------------------------+ | 0 | Exclude ZA | @@ -2521,34 +2509,35 @@ enabled by PSTATE.ZA. * The subroutine behaves as follows: - * If PTR is null, then the subroutine does nothing. - * The subroutine aborts in some platform-specific manner if either of the following conditions is true: - * the current thread does not have access to SME or PSTATE.ZA is 0. + * The current thread does not have access to SME. + + * The current thread does not have access to ``TPIDR2_EL0`` when + PSTATE.ZA is 1. - * the current thread does not have access to TPIDR2_EL0 and bit 2 of - ``OPTIONS`` is 0. + * If ``PTR`` does not point to a valid buffer with the required size, the + behaviour of calling this routine is undefined. - * both bit 0 and bit 63 of ``OPTIONS`` are 1. + * For addresses ``PTR->SAVED_ZA``, ``PTR->BLK`` and ``PTR->TPIDR2_EL0`` + at unspecified offsets in the buffer pointed to by ``PTR``, if + ``PTR->SAVED_ZA`` is 1 and bit 0 of ``OPTIONS`` is 0, then: - * If PSTATE.ZA is 0, then the subroutine enables PSTATE.ZA. + * If PSTATE.ZA is 0, the subroutine enables PSTATE.ZA. - * For addresses ``PTR->BLK``, ``PTR->ZA``, ``PTR->ZT0`` and - ``PTR->TPIDR2_EL0`` at unspecified offsets in the buffer pointed to by PTR: + * If ``TPIDR2_EL0`` is a NULL pointer, then the subroutine points X0 to + ``PTR->BLK`` and calls ``__arm_tpidr2_restore``. - * If bit 63 of ``OPTIONS`` is 1 and TPIDR2_EL0 is null, then the function - copies ``PTR->BLK`` to X0 and calls ``__arm_tpidr2_restore``. + * The contents of ``PTR->TPIDR2_EL0`` are copied to ``TPIDR2_EL0``. - * If bit 0 of ``OPTIONS`` is 0, then the subroutine restores the entire - contents of ZA from ``PTR->ZA``. + * For addresses ``PTR->SAVED_ZT0`` and ``PTR->ZT0`` at unspecified + offsets in the buffer pointed to by ``PTR``, if ``PTR->SAVED_ZT0`` is 1 + and bit 1 of ``OPTIONS`` is 0, then: - * If bit 1 of ``OPTIONS`` is 0 and ZT0 is available, then the subroutine - restores the entire contents of ZT0 from ``PTR->ZT0``. + * If PSTATE.ZA is 0, the subroutine enables PSTATE.ZA. - * If bit 2 of ``OPTIONS`` is 0, the subroutine restores the contents of - TPIDR2_EL0 from ``PTR->TPIDR2_EL0``. + * The full contents of ``PTR->ZT0`` are copied to ZT0. Pseudo-code examples From 34eeeccf03f47e04882c56df38257d403eab330b Mon Sep 17 00:00:00 2001 From: Sander de Smalen Date: Thu, 22 Aug 2024 10:02:29 +0100 Subject: [PATCH 3/6] Remove redundant change --- aapcs64/aapcs64.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/aapcs64/aapcs64.rst b/aapcs64/aapcs64.rst index 0f6f102..9099e11 100644 --- a/aapcs64/aapcs64.rst +++ b/aapcs64/aapcs64.rst @@ -1754,7 +1754,6 @@ states of ZA on entry to a subroutine and the possible states of ZA on a .. _`private-ZA`: .. _`shared-ZA`: .. _`agnostic-ZA`: -.. _`restore-ZA`: +-------------------+-------------------+---------------------------+ | Type of interface | ZA state on entry | ZA state on normal return | From 35aae186ba258e2b8d4b2007721a90fb71c4da10 Mon Sep 17 00:00:00 2001 From: Sander de Smalen Date: Thu, 22 Aug 2024 17:24:05 +0100 Subject: [PATCH 4/6] Removed OPTIONS argument from the subroutines. As @rsandifo-arm explained, the same routines cannot be used for general use of saving/restoring of state enabled by PSTATE.ZA because of the different expectations on the ZA interface when partially saving/restoring state enabled by PSTATE.ZA. Removing the option, drastically simplifies the logic. Note that I have removed the two booleans (internal to the save/restore routines) that distinguish between having saved 'ZA' and 'ZT0', and replaced that with a single 'VALID' bit, because I think we can assume that the save/restore routines are called by PEs that have the same SME state. --- aapcs64/aapcs64.rst | 160 ++++++++++++++++---------------------------- 1 file changed, 59 insertions(+), 101 deletions(-) diff --git a/aapcs64/aapcs64.rst b/aapcs64/aapcs64.rst index 9099e11..2286190 100644 --- a/aapcs64/aapcs64.rst +++ b/aapcs64/aapcs64.rst @@ -2343,34 +2343,13 @@ that is large enough to represent all state enabled by PSTATE.ZA. * P0-P15 are call-preserved. * the subroutine `preserves ZA`_. -* The subroutine takes the following argument: - - OPTIONS - a 64-bit value passed in X0 describing the following options: - - +--------+-----------------------------------------+ - | bits | Options | - +========+=========================================+ - | 63 - 2 | Zero for this revision of the AAPCS64, | - | | but reserved for future expansion | - +--------+-----------------------------------------+ - | 1 | Exclude ZT0 | - +--------+-----------------------------------------+ - | 0 | Exclude ZA | - +--------+-----------------------------------------+ - - A value of 0 means that all SME state will be considered in the size - calculation. +* The subroutine takes no arguments. * The subroutine returns an unsigned double word in X0 that represents a size in bytes that is large enough to represent all state enabled by - PSTATE.ZA, predicated on the requirements specified in ``OPTIONS``, - as well as any other state required for `__arm_sme_save`_ and + PSTATE.ZA as well as any other state required for `__arm_sme_save`_ and `__arm_sme_restore`_. - `__arm_sme_state_size`_ assumes that ZA is saved lazily and will account - for the save of ``TPIDR2_EL0``. - The exact layout used to calculate the size is unspecified. The implementations of `__arm_sme_save`_ and `__arm_sme_restore`_ and `__arm_sme_state_size`_ must all assume the same layout. @@ -2380,10 +2359,11 @@ that is large enough to represent all state enabled by PSTATE.ZA. * The subroutine behaves as follows: * If the current thread has access to FEAT_SME and PSTATE.ZA is 1, - X0 contains the total size required to represent all SME state enabled - under PSTATE.ZA predicated on the requirements specified in ``OPTIONS``. + X0 contains the total size required to save and restore all SME state + enabled under PSTATE.ZA. - * Otherwise, X0 is 0. + * Otherwise, X0 contains a size large enough to represent internal state + required for `__arm_sme_save`_ and `__arm_sme_restore`_. ``__arm_sme_save`` @@ -2399,70 +2379,53 @@ by PSTATE.ZA. * The subroutine has a custom ``ZA`` `streaming-compatible interface`_ with the following properties: - * X2-X15, X19-X29 and SP are call-preserved. + * X1-X15, X19-X29 and SP are call-preserved. * Z0-Z31 are call-preserved. * P0-P15 are call-preserved. -* The subroutine takes the following arguments: +* The custom ``ZA`` interface has the following properties: - OPTIONS - a 64-bit value passed in X0 describing the following options: + * If ZA state is 'off' or 'dormant' on entry, then it is unchanged on normal + return. + * If ZA state is 'active' on entry, then it is 'dormant' on normal return. - +--------+-----------------------------------------+ - | bits | Options | - +========+=========================================+ - | 63 - 2 | Zero for this revision of the AAPCS64, | - | | but reserved for future expansion | - +--------+-----------------------------------------+ - | 1 | Exclude ZT0 | - +--------+-----------------------------------------+ - | 0 | Exclude ZA | - +--------+-----------------------------------------+ - - A value of 0 means all SME state will be saved. +* The subroutine takes the following arguments: PTR - a 64-bit data pointer passed in X1 that points to a buffer which is - guaranteed to be large enough to represent all SME state for the - requirements specified by ``OPTIONS``. + a 64-bit data pointer passed in X0 that points to a buffer which + is guaranteed to have a size that is equal to or larger than the size + returned by `__arm_sme_state_size`_. * The subroutine does not return a value. * The subroutine behaves as follows: - * The subroutine aborts in some platform-specific manner if either of the - following conditions is true: - - * The current thread does not have access to SME. - - * The current thread does not have access to ``TPIDR2_EL0`` when - PSTATE.ZA is 1. - * If ``PTR`` does not point to a valid buffer with the required size, the - behaviour of calling this routine is undefined. + behaviour of calling this subroutine is undefined. - * If PSTATE.ZA is 0, the subroutine does nothing. + * For the address ``PTR->VALID`` at an unspecified offset in the buffer, + if the current thread does not have access to SME or if PSTATE.ZA is 0, + the value 0 is written to ``PTR->VALID`` and the subroutine returns. - * If bit 0 of ``OPTIONS`` is 0, then for addresses ``PTR->SAVED_ZA``, - ``PTR->BLK``, ``PTR->ZA`` and ``PTR->TPIDR2_EL0`` at unspecified offsets - in the buffer pointed to by ``PTR``: + * For addresses ``PTR->BLK``, ``PTR->ZA`` and ``PTR->TPIDR2_EL0`` at + unspecified offsets in the buffer pointed to by ``PTR``: - * The full contents of ``TPIDR2_EL0`` are written to ``PTR->TPIDR2_EL0``. + * The subroutine aborts in some platform-specific manner if the current + thread does not have access to ``TPIDR2_EL0``. + + * The contents of ``TPIDR2_EL0`` are written to ``PTR->TPIDR2_EL0``. * The address ``PTR->ZA`` is written to ``PTR->BLK.za_save_buffer``, the streaming vector length in bytes (``SVL.B``) is written to ``PTR->BLK.num_za_save_slices`` and the address ``PTR->BLK`` is written to ``TPIDR2_EL0``, thus setting up a lazy save. - * The value 1 is written to ``PTR->SAVED_ZA``. - - * If bit 1 of ``OPTIONS`` is 0 and ZT0 is available, then for the addresses - ``PTR->SAVED_ZT0`` and ``PTR->ZT0`` at unspecified offsets in the - buffer pointed to by ``PTR``: + * If ZT0 is available, then for the address ``PTR->ZT0`` at an + unspecified offset in the buffer pointed to by ``PTR``: - * The full contents of ZT0 are written to ``PTR->ZT0``. + * The contents of ZT0 are written to ``PTR->ZT0``. - * The value 1 is written to ``PTR->SAVED_ZT0``. + * The value 1 is written to ``PTR->VALID``. ``__arm_sme_restore`` ^^^^^^^^^^^^^^^^^^^^^ @@ -2477,66 +2440,61 @@ enabled by PSTATE.ZA. * The subroutine has a custom ``ZA`` `streaming-compatible interface`_ with the following properties: - * X2-X15, X19-X29 and SP are call-preserved. + * X1-X15, X19-X29 and SP are call-preserved. * Z0-Z31 are call-preserved. * P0-P15 are call-preserved. -* The subroutine takes the following arguments: - - OPTIONS - a 64-bit value passed in X0 describing the following options: +* The custom ``ZA`` interface has the following properties: - +--------+-----------------------------------------+ - | bits | Options | - +========+=========================================+ - | 63 - 2 | Zero for this revision of the AAPCS64, | - | | but reserved for future expansion | - +--------+-----------------------------------------+ - | 1 | Exclude ZT0 | - +--------+-----------------------------------------+ - | 0 | Exclude ZA | - +--------+-----------------------------------------+ + * If ZA state is 'off' on entry and SME state needs restoring, then it is + 'active' on normal return. + * If ZA state is 'off' on entry and SME state does not need restoring, then + it is 'off' on normal return. + * If ZA state is 'dormant' on entry, then it is 'active' on normal return. - A value of 0 means all SME state will be restored. +* The subroutine takes the following arguments: PTR - a 64-bit data pointer passed in X1 that points to a buffer which is - guaranteed to be large enough to represent all SME state for the - requirements specified by ``OPTIONS``. + a 64-bit data pointer passed in X0 that points to a buffer which + is guaranteed to have a size that is equal to or larger than the size + returned by `__arm_sme_state_size`_. * The subroutine does not return a value. * The subroutine behaves as follows: - * The subroutine aborts in some platform-specific manner if either of the - following conditions is true: + * If ``PTR`` does not point to a valid buffer with the required size, the + behaviour of calling this routine is undefined. + + * For the address ``PTR->VALID`` at an unspecified offset in the buffer, + if the value stored at address ``PTR->VALID`` is 0, then the subroutine does + nothing. + + * Otherwise, the subroutine aborts in some platform-specific manner if + either of the following conditions is true: * The current thread does not have access to SME. - * The current thread does not have access to ``TPIDR2_EL0`` when - PSTATE.ZA is 1. + * The current thread does not have access to ``TPIDR2_EL0`` when PSTATE.ZA + is enabled. - * If ``PTR`` does not point to a valid buffer with the required size, the - behaviour of calling this routine is undefined. + * ZA state on entry is 'active', meaning that PSTATE.ZA is enabled and + ``TPIDR2_EL0`` is a NULL pointer. - * For addresses ``PTR->SAVED_ZA``, ``PTR->BLK`` and ``PTR->TPIDR2_EL0`` - at unspecified offsets in the buffer pointed to by ``PTR``, if - ``PTR->SAVED_ZA`` is 1 and bit 0 of ``OPTIONS`` is 0, then: + * If PSTATE.ZA is disabled, the subroutine enables PSTATE.ZA. - * If PSTATE.ZA is 0, the subroutine enables PSTATE.ZA. + * For addresses ``PTR->BLK`` and ``PTR->TPIDR2_EL0`` + at unspecified offsets in the buffer pointed to by ``PTR``: * If ``TPIDR2_EL0`` is a NULL pointer, then the subroutine points X0 to ``PTR->BLK`` and calls ``__arm_tpidr2_restore``. * The contents of ``PTR->TPIDR2_EL0`` are copied to ``TPIDR2_EL0``. - * For addresses ``PTR->SAVED_ZT0`` and ``PTR->ZT0`` at unspecified - offsets in the buffer pointed to by ``PTR``, if ``PTR->SAVED_ZT0`` is 1 - and bit 1 of ``OPTIONS`` is 0, then: - - * If PSTATE.ZA is 0, the subroutine enables PSTATE.ZA. + * If ZT0 is available, then for the address ``PTR->ZT0`` at an + unspecified offset in the buffer pointed to by ``PTR``: - * The full contents of ``PTR->ZT0`` are copied to ZT0. + * The contents of ``PTR->ZT0`` are copied to ZT0. Pseudo-code examples From 447378ad68c70d15a09bc3390fe7f55d1a854e03 Mon Sep 17 00:00:00 2001 From: Sander de Smalen Date: Wed, 28 Aug 2024 09:15:01 +0100 Subject: [PATCH 5/6] Address review comments. This changes the 'dormant on entry' case to be 'no change'. --- aapcs64/aapcs64.rst | 99 ++++++++++++++++++++------------------------- 1 file changed, 44 insertions(+), 55 deletions(-) diff --git a/aapcs64/aapcs64.rst b/aapcs64/aapcs64.rst index 2286190..ce20cd3 100644 --- a/aapcs64/aapcs64.rst +++ b/aapcs64/aapcs64.rst @@ -1762,8 +1762,9 @@ states of ZA on entry to a subroutine and the possible states of ZA on a +-------------------+-------------------+---------------------------+ | shared ZA | active | active | +-------------------+-------------------+---------------------------+ -| agnostic ZA | active, dormant | unchanged | -| | or off | | +| | active or off | unchanged | +| agnostic ZA +-------------------+---------------------------+ +| | dormant | unchanged or off | +-------------------+-------------------+---------------------------+ Every subroutine has exactly one ZA interface. A subroutine's ZA interface @@ -1780,9 +1781,12 @@ The shared-ZA interface is so called because it allows the subroutine to share ZA contents with its caller. This can be useful if an SME operation is split into several cooperating subroutines. -The agnostic-ZA interface is intended to be called from any function without -requiring a change to PSTATE.ZA and must preserve all state associated with -PSTATE.ZA. +The `agnostic-ZA`_ interface is intended to be called from any subroutine +without requiring a change to PSTATE.ZA. Subroutines with an `agnostic-ZA`_ +interface behave like subroutines with a `private-ZA`_ interface when ZA is +off or dormant on entry, but must additionally allow ZA to be active on +entry; in this case, the subroutine must preserve all state associated with +PSTATE.ZA when returning normally. Subroutines with a `private-ZA`_ interface and subroutines with a `shared-ZA`_ interface can both (at their option) choose to guarantee that they @@ -2341,7 +2345,6 @@ that is large enough to represent all state enabled by PSTATE.ZA. * X1-X15, X19-X29 and SP are call-preserved. * Z0-Z31 are call-preserved. * P0-P15 are call-preserved. - * the subroutine `preserves ZA`_. * The subroutine takes no arguments. @@ -2360,7 +2363,7 @@ that is large enough to represent all state enabled by PSTATE.ZA. * If the current thread has access to FEAT_SME and PSTATE.ZA is 1, X0 contains the total size required to save and restore all SME state - enabled under PSTATE.ZA. + enabled by PSTATE.ZA. * Otherwise, X0 contains a size large enough to represent internal state required for `__arm_sme_save`_ and `__arm_sme_restore`_. @@ -2383,12 +2386,6 @@ by PSTATE.ZA. * Z0-Z31 are call-preserved. * P0-P15 are call-preserved. -* The custom ``ZA`` interface has the following properties: - - * If ZA state is 'off' or 'dormant' on entry, then it is unchanged on normal - return. - * If ZA state is 'active' on entry, then it is 'dormant' on normal return. - * The subroutine takes the following arguments: PTR @@ -2401,29 +2398,31 @@ by PSTATE.ZA. * The subroutine behaves as follows: * If ``PTR`` does not point to a valid buffer with the required size, the - behaviour of calling this subroutine is undefined. + behavior of calling this subroutine is undefined. - * For the address ``PTR->VALID`` at an unspecified offset in the buffer, - if the current thread does not have access to SME or if PSTATE.ZA is 0, - the value 0 is written to ``PTR->VALID`` and the subroutine returns. + * If ZA state is 'active' on entry, then it is 'dormant' on normal return. + Otherwise the ZA state is unchanged. - * For addresses ``PTR->BLK``, ``PTR->ZA`` and ``PTR->TPIDR2_EL0`` at - unspecified offsets in the buffer pointed to by ``PTR``: + * For the address ``PTR->VALID`` at an unspecified offset in the buffer, + the value 0 is written to ``PTR->VALID`` and the subroutine returns, if + either of the following conditions is true: - * The subroutine aborts in some platform-specific manner if the current - thread does not have access to ``TPIDR2_EL0``. + * The current thread does not have access to SME. - * The contents of ``TPIDR2_EL0`` are written to ``PTR->TPIDR2_EL0``. + * PSTATE.ZA is 0. - * The address ``PTR->ZA`` is written to ``PTR->BLK.za_save_buffer``, - the streaming vector length in bytes (``SVL.B``) is written to - ``PTR->BLK.num_za_save_slices`` and the address ``PTR->BLK`` is - written to ``TPIDR2_EL0``, thus setting up a lazy save. + * TPIDR2_EL0 is not a NULL pointer. - * If ZT0 is available, then for the address ``PTR->ZT0`` at an - unspecified offset in the buffer pointed to by ``PTR``: + * For addresses ``PTR->BLK`` and ``PTR->ZA`` at unspecified offsets in + the buffer pointed to by ``PTR``, the address ``PTR->ZA`` is written to + ``PTR->BLK.za_save_buffer``, the streaming vector length in bytes + (``SVL.B``) is written to ``PTR->BLK.num_za_save_slices`` and the + address ``PTR->BLK`` is written to ``TPIDR2_EL0``, thus setting up a + lazy save. - * The contents of ZT0 are written to ``PTR->ZT0``. + * If ZT0 is available, then for the address ``PTR->ZT0`` at an unspecified + offset in the buffer pointed to by ``PTR``, the contents of ZT0 are written + to ``PTR->ZT0``. * The value 1 is written to ``PTR->VALID``. @@ -2444,57 +2443,47 @@ enabled by PSTATE.ZA. * Z0-Z31 are call-preserved. * P0-P15 are call-preserved. -* The custom ``ZA`` interface has the following properties: - - * If ZA state is 'off' on entry and SME state needs restoring, then it is - 'active' on normal return. - * If ZA state is 'off' on entry and SME state does not need restoring, then - it is 'off' on normal return. - * If ZA state is 'dormant' on entry, then it is 'active' on normal return. - * The subroutine takes the following arguments: PTR - a 64-bit data pointer passed in X0 that points to a buffer which - is guaranteed to have a size that is equal to or larger than the size - returned by `__arm_sme_state_size`_. + a 64-bit data pointer passed in X0 that points to a buffer that + is initialized by a call to `__arm_sme_save`_. * The subroutine does not return a value. * The subroutine behaves as follows: * If ``PTR`` does not point to a valid buffer with the required size, the - behaviour of calling this routine is undefined. + behavior of calling this routine is undefined. + + * The ZA state on normal return is the same as the ZA state on entry to the + call to `__arm_sme_save`_ that was used to initialize the buffer + pointed to by ``PTR``. * For the address ``PTR->VALID`` at an unspecified offset in the buffer, - if the value stored at address ``PTR->VALID`` is 0, then the subroutine does - nothing. + if the value stored at address ``PTR->VALID`` is 0, then the subroutine + does nothing. * Otherwise, the subroutine aborts in some platform-specific manner if either of the following conditions is true: * The current thread does not have access to SME. - * The current thread does not have access to ``TPIDR2_EL0`` when PSTATE.ZA - is enabled. + * ZA state is active on entry. - * ZA state on entry is 'active', meaning that PSTATE.ZA is enabled and - ``TPIDR2_EL0`` is a NULL pointer. + * If PSTATE.ZA is 0, the subroutine enables PSTATE.ZA. - * If PSTATE.ZA is disabled, the subroutine enables PSTATE.ZA. - - * For addresses ``PTR->BLK`` and ``PTR->TPIDR2_EL0`` - at unspecified offsets in the buffer pointed to by ``PTR``: + * For the address ``PTR->BLK`` at an unspecified offset in the buffer + pointed to by ``PTR``: * If ``TPIDR2_EL0`` is a NULL pointer, then the subroutine points X0 to ``PTR->BLK`` and calls ``__arm_tpidr2_restore``. - * The contents of ``PTR->TPIDR2_EL0`` are copied to ``TPIDR2_EL0``. + * The value 0 is written to ``TPIDR2_EL0``. * If ZT0 is available, then for the address ``PTR->ZT0`` at an - unspecified offset in the buffer pointed to by ``PTR``: - - * The contents of ``PTR->ZT0`` are copied to ZT0. + unspecified offset in the buffer pointed to by ``PTR``, the contents of + ``PTR->ZT0`` are copied to ZT0. Pseudo-code examples From 82c84bd0549ebad293fa376b34a50476a9b38cea Mon Sep 17 00:00:00 2001 From: Sander de Smalen Date: Mon, 2 Sep 2024 14:27:29 +0100 Subject: [PATCH 6/6] Add section about dynamic symbols to test for ZT0 --- aapcs64/aapcs64.rst | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/aapcs64/aapcs64.rst b/aapcs64/aapcs64.rst index ce20cd3..35341bf 100644 --- a/aapcs64/aapcs64.rst +++ b/aapcs64/aapcs64.rst @@ -2486,6 +2486,28 @@ enabled by PSTATE.ZA. ``PTR->ZT0`` are copied to ZT0. +Dynamic symbols for supported state +----------------------------------- + +A platform that supports SME may provide a set of dynamic symbols. + +The availability of these dynamic symbols indicates whether SME state is +supported by the routines provided by the platform. These symbols +can be used during dynamic linking to verify that SME state used in the +program will be handled correctly by the runtime. + +This is particularly relevant for calls to `agnostic-ZA`_ functions, which +can't make assumptions on PSTATE.ZA or what state is enabled by it. These +functions rely on the routines defined in `SME support routines`_ to preserve +all SME state that may be live in the caller. The level of support required +by the program must therefore match the level of support provided by the +runtime, which for dynamically linked executables can only be asserted +during dynamic linking. + +* ``__arm_sme_routines_support_zt0`` is available when the SME support routines + support ZT0. + + Pseudo-code examples ====================