Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLD AArch64 range extension thunk to PLT entry is not generating BTI (when BTI enabled) #62140

Closed
smithp35 opened this issue Apr 14, 2023 · 7 comments
Assignees
Labels

Comments

@smithp35
Copy link
Collaborator

smithp35 commented Apr 14, 2023

LLD will produce BTI compliant PLT entries when -zforce-bti is used or when all input objects are marked with the BTI property.

Currently LLD will only put a BTI at the top of a PLT[N] entry in limited circumstances, the comment from the source file says:

// A BTI (Branch Target Indicator) Plt Entry is only required if the
  // address of the PLT entry can be taken by the program, which permits an
  // indirect jump to the PLT entry. This can happen when the address
  // of the PLT entry for a function is canonicalised due to the address of
  // the function in an executable being taken by a shared library, or
  // non-preemptible ifunc referenced by non-GOT-generating, non-PLT-generating
  // relocations.
  // The PAC PLT entries require dynamic loader support and this isn't known
  // from properties in the objects, so we use the command line flag.

There is also the rare, but can happen with large programs like Chrome built with instrumentation, case of a range extension thunk to the PLT entry. A contrived example:

        .text
        .global foo
func:
        .type func, %function
        bl foo
        ret

With linker script:

SECTIONS {
  .plt : { *(.plt) }
  .text 0xf0000000 : AT(0xf0000000) { *(.text) }
}

Using:

clang --target=aarch64-linux-gnu btiplt.s -c
ld.lld --shared btiplt.o -T btiplt.lds -zforce-bti -o btiplt

Has the following disassembly:

Disassembly of section .plt:

00000000000000b0 <.plt>:
      b0: d503245f      bti     c
      b4: a9bf7bf0      stp     x16, x30, [sp, #-0x10]!
      b8: 90780010      adrp    x16, 0xf0000000 <func>
      bc: f9407611      ldr     x17, [x16, #0xe8]
      c0: 9103a210      add     x16, x16, #0xe8
      c4: d61f0220      br      x17
      c8: d503201f      nop
      cc: d503201f      nop

00000000000000d0 <foo@plt>:
      d0: 90780010      adrp    x16, 0xf0000000 <func>
      d4: f9407a11      ldr     x17, [x16, #0xf0]
      d8: 9103c210      add     x16, x16, #0xf0
      dc: d61f0220      br      x17
      e0: d503201f      nop
      e4: d503201f      nop

Disassembly of section .text:

00000000f0000000 <func>:
f0000000: 94000002      bl      0xf0000008 <__AArch64ADRPThunk_foo>
f0000004: d65f03c0      ret

00000000f0000008 <__AArch64ADRPThunk_foo>:
f0000008: 90880010      adrp    x16, 0x0 <foo>
f000000c: 91034210      add     x16, x16, #0xd0
f0000010: d61f0200      br      x16

Note the indirect jumpt to the PLT entry, which doesn't have a BTI c landing pad so the program will crash if BTI is enabled.

The simplest way to fix this is to track which symbols with isInPlt() == true are the targets of range extension thunks. We can then extend the existing canonical PLT check to include thunks.

Other possibilities include BTI aware range extension thunks see AArch64 ABI issue for more details ARM-software/abi-aa#196

@smithp35 smithp35 added the lld label Apr 14, 2023
@DanielKristofKiss DanielKristofKiss self-assigned this Apr 14, 2023
@pcc
Copy link
Contributor

pcc commented Apr 14, 2023

@eugenis

Doesn't this also affect direct branches to local functions whose address cannot be taken?

I think the most straightforward fix would be use ret x16 in the range extension thunk instead of br x16.

@smithp35
Copy link
Collaborator Author

It does affect direct branches to functions without a landing pad. Clang adds BTI even for local functions whose address cannot be taken, but GCC does not (ARM-software/abi-aa#196). We hope to update the ABI on what the requirements for static-linkers and code-generators are shortly. Current plans are to put up a design doc with the options and their consequences.

using ret x16 is possible today, although it may not be the best long term option due to interactions with the forthcoming Guarded Control Stacks https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-2022 I don't think the LR is a special case in the implicit GCS processing for ret outlined in https://developer.arm.com/documentation/ddi0602/2022-12/Base-Instructions/RET--Return-from-subroutine-

Another adjacent change that could help is to implement short Thunks for AArch64. This would use a B, and would extend the branch range by another 128 MiB which should handle almost all user-space programs without needing an indirect branch.

@smithp35
Copy link
Collaborator Author

I've submitted a patch to add short range thunk support https://reviews.llvm.org/D148701 for AArch64. This should lower the probability that we get an indirect branch to either the PLT or a function without a BTI from a thunk.

Not a fix for this problem, but an improvement that is worth making that should help.

@DanielKristofKiss
Copy link
Member

@smithp35 I come up with this one: D148704 which works with Chromium.

@smithp35
Copy link
Collaborator Author

Thank you. That should cover us for clang compiled objects.

@MaskRay
Copy link
Member

MaskRay commented Apr 19, 2023

I left a comment on https://reviews.llvm.org/D148701#4281776 that short range thunks may not work well with the current thunk selection code. D148704 looks close. Thank you!

@EugeneZelenko EugeneZelenko added lld:ELF and removed lld labels Apr 19, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 19, 2023

@llvm/issue-subscribers-lld-elf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants