-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sysvabi64] document requirement for bti c in more detail #196
Comments
Maybe worth ELF in addition to sysvabi as this would also affect bare-metal (pac-bti M) which would presumably also be affected if GCC was not emitting BTIs for functions that could require a stub. My reading was that without a specific exception for linker created veneers/stubs code-generators had to assume that one might be created and generate code as if one could be inserted. I can remember clang always generating BTI instructions as it couldn't make an assumption that an indirect branch would be generated by the linker. I think it is to the benefit of security to have fewer BTIs so having linker stubs that are BTI aware is an overall improvement so it is likely the preferred direction of travel. I think it is worth a wider discussion as IMO to make GCC behaviour not a bug, we would have to add a specific requirement for linkers to be BTI aware in the ABI and no such requirement exists at the moment. Assuming we can get the agreement to add to the requirement, I'm thinking if there is anything that needs doing about transition. As I understand it:
The other thing we may want to address is whether there is any additional marking we can do to make your optimisation possible without disassembling the binary. |
Functions with LR signing gets PACI[AB]SP{,PC}. They have an implicit BTI. I assume that the LLD work is planned and Clang will eventually remove the "bti c" (BTW |
We've got an idea of where we want to go with this, I've been wanting to have an implementation in LLD ready before publishing and have not been able to find time to do this. The change that needs making should make clear the requirements for code-generators and static linkers. The prevailing opinion within Arm is that we would like to enable code-generators to omit BTI if they can prove that the function will never be called indirectly (GCC behaviour). A static linker may therefore not assume that all indirect branch targets have a BTI compatible landing pad. A "BTI compatible" thunk either doesn't use an indirect branch (chain of direct branches) or they are split up into two parts, the indirect branch, and a "header" that contains a BTI c, and ends with a direct branch. Something like:
The "header" has a range limit (+-128Mib), and is essentially an alternative entry point for indirect calls. The presence of this alternative entry point undoes the compiler's hard work in omitting the BTI, but it will only be done if necessary. As these "BTI compatible" thunks are larger and slower than normal we would want to only generate these when necessary. GNU ld has decided to disassemble the code at the destination. While this is an option, and is the most precise solution, if there are a lot of thunks then this could affect linker performance. If there are only a few then it probably doesn't matter. I am hoping that I can find some heuristics that would let a linker decide based on symbol information so that the need for disassembly is lessened. Assuming GCCs implementation doesn't already break this, it could be possible to say that eliding BTI is only permitted for symbols with STB_LOCAL binding. This would reduce the number of candidates a static linker would need to disassemble to check for a BTI (or just assume it doesn't have one). |
additional details: multiple calls can share the same thunk and multiple thunks may share the same 'header'. and sometimes the header is already within reach of a call (even though the call target is not) and then the header is called directly (which actually would not even need a bti c, unless it is shared with an indirect thunk, bfd ld does not avoid bti c in this case). iirc the veneers are aligned up to 8byte boundary so branches and branch targets are not too close and thus a chain of single branches could take 8byte per veneer instead of just 4 (but such design would avoid any bti so could be safer and still less code if the distances are not too big: <= 3 direct jumps away. this was not tried in bfd ld). |
Yes if veneer insertion was a bit smarter, it could handle all ranges up to +-256MB using a single direct branch, or +-384MB using 2 direct branches. For even larger binaries it isn't worth worrying about avoiding the BTI header (since the extra size is negligible), and you could delay the final decision of the target of the indirect branch late during relocation when disassembly will be cheaply available. |
LLD can do a limited form of inserting 1 direct branch, but due to restrictions on the placement of the branch it doesn't get the full 128 MiB extra range. Inserting a chain of branches could be possible but it would add quite a bit of complexity to the existing implementation as there are limited points where the linker can insert the branch, as well as needing to insert thunks across output section boundaries. The additional, unneeded BTI headers could be used as a landing pad by an attacker, but it would still be fewer landing pads than if the compiler always added BTI. I'll have a think about that when doing the LLD implementation. |
Add requirements for when a tool must generate a BTI instruction. This permits tools to elide BTI instructions when they can prove that no indirect branch to that location is possible from local information available to the tool. Static linkers are not allowed to assume that all direct branch targets have a BTI instruction. If a veneer is required then the static linker must generate additional BTI instructions if needed. A static linker is allowed to assume that a symbol that is exported to the dynamic symbol table has a BTI instruction. In practice this will permit compilers to remove BTI instructions from static functions that do not have their address taken and that address escapes the function. This matches the behavior of the GNU toolchain. Fixes ARM-software#196
When Branch Target Identification BTI is enabled all indirect branches must target a BTI instruction. A long branch thunk is a source of indirect branches. To date LLD has been assuming that the object producer is responsible for putting a BTI instruction at all places the linker might generate an indirect branch to. This is true for clang, but not for GCC. GCC will elide the BTI instruction when it can prove that there are no indirect branches from outside the translation unit(s). GNU ld was fixed to generate a landing pad stub (gnu ld speak for thunk) for the destination when a long range stub was needed [1]. This means that using GCC compiled objects with LLD may lead to LLD generating an indirect branch to a location without a BTI. The ABI [2] has also been clarified to say that it is a static linker's responsibility to generate a landing pad when the target does not have a BTI. This patch implements the same mechansim as GNU ld. When the output ELF file is setting the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property, then we check the destination to see if it has a BTI instruction. If it does not we generate a landing pad consisting of: BTI c B <destination> The B <destination> can be elided if the thunk can be placed so that control flow drops through. For example: BTI c <destination>: This will be common when -ffunction-sections is used. The landing pad thunks are effectively alternative entry points for the function. Direct branches are unaffected but any linker generated indirect branch needs to use the alternative. We place these as close as possible to the destination section. There is some further optimization possible. Consider the case: .text fn1 ... fn2 ... If we need landing pad thunks for both fn1 and fn2 we could order them so that the thunk for fn1 immediately precedes fn1. This could save a single branch. However I didn't think that would be worth the additional complexity. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 [2] ARM-software/abi-aa#196
the text currently has
"An executable or shared library that supports BTI must have a bti c instruction at the start of any entry that might be called indirectly."
but it's not clear if compilers should consider potential linker inserted veneers with indirect call/jump or if the linker should ensure that when a veneer is inserted it does not break bti compatibility.
(gcc+ld.bfd made different choice than llvm+lld)
see discussion at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671
The text was updated successfully, but these errors were encountered: