-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-119726: generate and patch AArch64 trampolines #123872
base: main
Are you sure you want to change the base?
gh-119726: generate and patch AArch64 trampolines #123872
Conversation
a7a04fc
to
8cbca05
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! I'd like to see this mechanism used across all platforms for symbols in the "global offset table" too (but that can be a separate project).
It's a clear improvement for the affected platforms:
aarch64-unknown-linux-gnu
: 0.8% faster, 0.6% less memory.aarch64-apple-darwin
: 1.5% faster, 0.8% less memory.
Here's an initial review of the Python code. I'll dig into the C code tomorrow.
Tools/jit/_stencils.py
Outdated
def as_c(self, opname: str) -> str: | ||
"""Dump this hole as a StencilGroup initializer.""" | ||
return f"{{emit_{opname}, {len(self.code.body)}, {len(self.data.body)}}}" | ||
return f"{{emit_{opname}, {len(self.code.body)}, {len(self.data.body)}, {{{self._get_trampoline_mask()}}}}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of braces, haha. Maybe move the braces to _get_trampoline_mask
?
return f"{{emit_{opname}, {len(self.code.body)}, {len(self.data.body)}, {{{self._get_trampoline_mask()}}}}}" | |
return f"{{emit_{opname}, {len(self.code.body)}, {len(self.data.body)}, {self._get_trampoline_mask()}}}" |
...and then turn ", ".join(trampoline_mask)
to "{" + ", ".join(trampoline_mask) + "}"
above. That makes it a bit clearer that it's returning valid C code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Tools/jit/_stencils.py
Outdated
if bitmask: | ||
trampoline_mask = [ | ||
f"0x{(bitmask >> i*32) & ((1 << 32) - 1):x}" | ||
for i in range(0, SYMBOL_MASK_SIZE) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a bit of trouble following this, so I tried rewriting it. What do you think? This way we don't need to hardcode SYMBOL_MASK_SIZE
, and can just delete it.
if bitmask: | |
trampoline_mask = [ | |
f"0x{(bitmask >> i*32) & ((1 << 32) - 1):x}" | |
for i in range(0, SYMBOL_MASK_SIZE) | |
] | |
while bitmask: | |
word = bitmask & ((1 << 32) - 1) | |
trampoline_mask.append(f"{word:#04x}") | |
bitmask >>= 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Tools/jit/_stencils.py
Outdated
@@ -282,6 +252,7 @@ class StencilGroup: | |||
default_factory=dict, init=False | |||
) | |||
_got: dict[str, int] = dataclasses.field(default_factory=dict, init=False) | |||
trampolines: set[int] = dataclasses.field(default_factory=set, init=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this accessed from outside of the class itself? If not, let's add an underscore:
trampolines: set[int] = dataclasses.field(default_factory=set, init=False) | |
_trampolines: set[int] = dataclasses.field(default_factory=set, init=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Tools/jit/_stencils.py
Outdated
# Number of 32-bit words needed to store the bit mask of external symbols | ||
SYMBOL_MASK_SIZE: int = 4 | ||
|
||
known_symbols: dict[str | None, int] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to get rid of this global cache, since I would ideally like the classes to be able to used multiple times in the same process without sharing state like this. I'd need to think it through more, but maybe it should be moved to the _Target
class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me see what I can place it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it in the Target class. Have a look.
Tools/jit/_stencils.py
Outdated
# Number of 32-bit words needed to store the bit mask of external symbols | ||
SYMBOL_MASK_SIZE: int = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not a huge fan of this magic constant, since it'll almost certainly need to be updated in the future (and may even be platform-dependent). This should be the same as math.ceil(len(known_symbols) / 32)
once known_symbols
is populated, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. With time this changes (likely increases) and we need to keep updating this number. Also I agree, this could be platform dependent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
Well better than nothing :) |
AArch64 trampolines are now generated at runtime at the end of every trace.
8cbca05
to
7bdf9f0
Compare
7bdf9f0
to
ffd74bd
Compare
AArch64 trampolines are now generated at runtime at the end of every trace.
In the PR there is a TODO which I'd like to have an opinion of.