Preserve symlink structure of remote execution inputs #23620
Labels
P3
We're not considering working on this, but happy to review a PR. (No assignee)
team-Remote-Exec
Issues and PRs for the Execution (Remote) team
type: feature request
Description of the feature request:
If source artifact inputs of build actions include symlinks, these symlinks are represented as regular files when the build action is executed remotely. This can break certain inputs, in particular LLVM built in the "busybox" configuration. The FR is to preserve the symlink structure instead.
Let me unpack this a little bit.
Current Bazel behavior
Let me quote @tjgq from an internal conversation we've had about this:
How this breaks LLVM toolchains
We use a hermetic LLVM toolchain, and that toolchain is part of the build inputs. The toolchain includes a bunch of "binaries" like
bin/clang
,bin/clang++
,bin/lld
, etc. But in fact, the LLVM version we use employs a "busybox" architecture, where these binaries are all symlinks tobin/llvm
. However! Invokingbin/clang
is not actually equivalent to invokingbin/llvm
: the binary examines itsargv[0]
, and behaves differently when invoked via symlink.What is more, in some situations llvm will re-invoke itself. On the first invocation, we need the
argv[0]
to beclang
. On the re-invocation, llvm will use the path from/proc/self/exe
, which needs to end inllvm
. If we merely have a copy,argv[0]
isclang
both times, producing errors like https://pwbug.dev/issues/364781685. I am not a toolchain expert, but I discussed this with some, and they assure me this behavior (i.e., reading/proc/self/exe
an assuming it points tollvm
and not e.g.clang
, rather than just setting it tollvm
) is unfortunately necessary due to the treatment of Clang reproducers and-canonical-prefixes
(although I confess I could not follow their explanation).Workarounds
There are workarounds for this issue:
Replace
bin/clang
(etc) with symlinks created byctx.actions.declare_symlink
. Such Bazel-created symlinks will be faithfully sent to RBE.Wrap
bin/clang
(etc) in bash scripts like,This has the advantage that no custom rules are required, you just genrule the wrapper scripts into existence. These wrapper scripts (thanks to the
exec -a clang
) have the same magic property as the symlinks, i.e. thatargv[0]
is different from the actual executed binary basename. However, this requires bash (/bin/sh
doesn't support the-a
flag).However, this is definitely a sharp edge and it would be nice to remove it.
Further reading for Googlers
See internal discussions of this problem for more details:
Which category does this issue belong to?
Remote Execution
What underlying problem are you trying to solve with this feature?
No response
Which operating system are you running Bazel on?
No response
What is the output of
bazel info release
?development version
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.I'm on d62e0a0, fetched by Bazelisk (so, Bazel 8 pre-release).
What's the output of
git remote get-url origin; git rev-parse HEAD
?No response
Have you found anything relevant by searching the web?
Remarkably, not really, this seems to be a pretty edge-case issue!
Any other information, logs, or outputs that you want to share?
For folks who run into similar issues in the future to find this: the cryptic errors produced by clang are,
To make progress debugging this issue, you need to run clang under
strace
.The text was updated successfully, but these errors were encountered: