[sysvabi64] document requirement for bti c in more detail #196

nsz-arm · 2023-03-27T14:06:37Z

the text currently has

"An executable or shared library that supports BTI must have a bti c instruction at the start of any entry that might be called indirectly."

but it's not clear if compilers should consider potential linker inserted veneers with indirect call/jump or if the linker should ensure that when a veneer is inserted it does not break bti compatibility.

(gcc+ld.bfd made different choice than llvm+lld)

see discussion at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671

smithp35 · 2023-03-27T14:46:32Z

Maybe worth ELF in addition to sysvabi as this would also affect bare-metal (pac-bti M) which would presumably also be affected if GCC was not emitting BTIs for functions that could require a stub.

My reading was that without a specific exception for linker created veneers/stubs code-generators had to assume that one might be created and generate code as if one could be inserted. I can remember clang always generating BTI instructions as it couldn't make an assumption that an indirect branch would be generated by the linker.

I think it is to the benefit of security to have fewer BTIs so having linker stubs that are BTI aware is an overall improvement so it is likely the preferred direction of travel. I think it is worth a wider discussion as IMO to make GCC behaviour not a bug, we would have to add a specific requirement for linkers to be BTI aware in the ABI and no such requirement exists at the moment.

Assuming we can get the agreement to add to the requirement, I'm thinking if there is anything that needs doing about transition. As I understand it:

GCC objects + (BFD prior to 106671 or LLD) are at risk of an indirect jump to a non-BTI compatible function.
Clang objects always have BTI so are safe with either linker.
I'm not sure if there is anything we can do as a BTI aware linker will work with both. The only failing case is an older linker with objects with non-BTI compatible functions.

The other thing we may want to address is whether there is any additional marking we can do to make your optimisation possible without disassembling the binary.

MaskRay · 2024-07-02T19:59:48Z

Functions with LR signing gets PACI[AB]SP{,PC}. They have an implicit BTI.
If PACI[AB]SP is absent (leaf functions, or when PAuth is not enabled), Clang adds "bti c" to every candidate function to be compatible with LLD and GNU ld before https://sourceware.org/bugzilla/show_bug.cgi?id=30076 in case range extension thunks (aka veneers aka stubs) are needed (https://reviews.llvm.org/D99417).

I assume that the LLD work is planned and Clang will eventually remove the "bti c" (BTW -fbinutils-version= exists if compatibility with older GNU ld versions is needed).
Is there more information about the double veneer scheme used by GNU ld. Do we need a new relocation to mark "bti c"?
(If there is concern with a new relocation type, NONE with a custom addend might be utilized.)

smithp35 · 2024-07-03T09:22:29Z

We've got an idea of where we want to go with this, I've been wanting to have an implementation in LLD ready before publishing and have not been able to find time to do this.

The change that needs making should make clear the requirements for code-generators and static linkers. The prevailing opinion within Arm is that we would like to enable code-generators to omit BTI if they can prove that the function will never be called indirectly (GCC behaviour). A static linker may therefore not assume that all indirect branch targets have a BTI compatible landing pad.

A "BTI compatible" thunk either doesn't use an indirect branch (chain of direct branches) or they are split up into two parts, the indirect branch, and a "header" that contains a BTI c, and ends with a direct branch. Something like:

caller:
  bl thunk_to_foo
  ...
thunk_to_foo:
  adrp x16, foo_bti_header
  add  x16, :lo12: foo_bti_header
  br   x16
  ...
foo_bti_header:
  bti  c
  b    foo
  ...
foo:

The "header" has a range limit (+-128Mib), and is essentially an alternative entry point for indirect calls. The presence of this alternative entry point undoes the compiler's hard work in omitting the BTI, but it will only be done if necessary.

As these "BTI compatible" thunks are larger and slower than normal we would want to only generate these when necessary. GNU ld has decided to disassemble the code at the destination. While this is an option, and is the most precise solution, if there are a lot of thunks then this could affect linker performance. If there are only a few then it probably doesn't matter.

I am hoping that I can find some heuristics that would let a linker decide based on symbol information so that the need for disassembly is lessened. Assuming GCCs implementation doesn't already break this, it could be possible to say that eliding BTI is only permitted for symbols with STB_LOCAL binding. This would reduce the number of candidates a static linker would need to disassemble to check for a BTI (or just assume it doesn't have one).

nsz-arm · 2024-07-03T10:09:34Z

additional details: multiple calls can share the same thunk and multiple thunks may share the same 'header'. and sometimes the header is already within reach of a call (even though the call target is not) and then the header is called directly (which actually would not even need a bti c, unless it is shared with an indirect thunk, bfd ld does not avoid bti c in this case). iirc the veneers are aligned up to 8byte boundary so branches and branch targets are not too close and thus a chain of single branches could take 8byte per veneer instead of just 4 (but such design would avoid any bti so could be safer and still less code if the distances are not too big: <= 3 direct jumps away. this was not tried in bfd ld).

Wilco1 · 2024-07-03T13:50:27Z

Yes if veneer insertion was a bit smarter, it could handle all ranges up to +-256MB using a single direct branch, or +-384MB using 2 direct branches. For even larger binaries it isn't worth worrying about avoiding the BTI header (since the extra size is negligible), and you could delay the final decision of the target of the indirect branch late during relocation when disassembly will be cheaply available.

smithp35 · 2024-07-03T15:00:32Z

LLD can do a limited form of inserting 1 direct branch, but due to restrictions on the placement of the branch it doesn't get the full 128 MiB extra range.

Inserting a chain of branches could be possible but it would add quite a bit of complexity to the existing implementation as there are limited points where the linker can insert the branch, as well as needing to insert thunks across output section boundaries.

The additional, unneeded BTI headers could be used as a landing pad by an attacker, but it would still be fewer landing pads than if the compiler always added BTI. I'll have a think about that when doing the LLD implementation.

Add requirements for when a tool must generate a BTI instruction. This permits tools to elide BTI instructions when they can prove that no indirect branch to that location is possible from local information available to the tool. Static linkers are not allowed to assume that all direct branch targets have a BTI instruction. If a veneer is required then the static linker must generate additional BTI instructions if needed. A static linker is allowed to assume that a symbol that is exported to the dynamic symbol table has a BTI instruction. In practice this will permit compilers to remove BTI instructions from static functions that do not have their address taken and that address escapes the function. This matches the behavior of the GNU toolchain. Fixes ARM-software#196

When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] ARM-software/abi-aa#196

smithp35 mentioned this issue Apr 14, 2023

LLD AArch64 range extension thunk to PLT entry is not generating BTI (when BTI enabled) llvm/llvm-project#62140

Closed

smithp35 mentioned this issue Dec 11, 2023

endbr64 removal llvm/llvm-project#74400

Open

smithp35 linked a pull request Sep 17, 2024 that will close this issue

[sysvabi64] Document requirements for tools wrt BTI #282

Open

smithp35 mentioned this issue Sep 17, 2024

[LLD][ELF][AArch64] Add BTI Aware long branch thunks llvm/llvm-project#108989

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sysvabi64] document requirement for bti c in more detail #196

[sysvabi64] document requirement for bti c in more detail #196

nsz-arm commented Mar 27, 2023

smithp35 commented Mar 27, 2023

MaskRay commented Jul 2, 2024 •

edited

Loading

smithp35 commented Jul 3, 2024

nsz-arm commented Jul 3, 2024

Wilco1 commented Jul 3, 2024

smithp35 commented Jul 3, 2024

[sysvabi64] document requirement for bti c in more detail #196

[sysvabi64] document requirement for bti c in more detail #196

Comments

nsz-arm commented Mar 27, 2023

smithp35 commented Mar 27, 2023

MaskRay commented Jul 2, 2024 • edited Loading

smithp35 commented Jul 3, 2024

nsz-arm commented Jul 3, 2024

Wilco1 commented Jul 3, 2024

smithp35 commented Jul 3, 2024

MaskRay commented Jul 2, 2024 •

edited

Loading