Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-119726: generate and patch AArch64 trampolines #123872

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

diegorusso
Copy link
Contributor

@diegorusso diegorusso commented Sep 9, 2024

AArch64 trampolines are now generated at runtime at the end of every trace.

In the PR there is a TODO which I'd like to have an opinion of.

@diegorusso diegorusso marked this pull request as draft September 9, 2024 16:16
@diegorusso diegorusso force-pushed the gh-119726-generate-and-patch-AArch64-trampolines branch 2 times, most recently from a7a04fc to 8cbca05 Compare September 10, 2024 15:37
@brandtbucher brandtbucher self-assigned this Sep 18, 2024
@brandtbucher brandtbucher added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-JIT labels Sep 18, 2024
Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I'd like to see this mechanism used across all platforms for symbols in the "global offset table" too (but that can be a separate project).

It's a clear improvement for the affected platforms:

Here's an initial review of the Python code. I'll dig into the C code tomorrow.

def as_c(self, opname: str) -> str:
"""Dump this hole as a StencilGroup initializer."""
return f"{{emit_{opname}, {len(self.code.body)}, {len(self.data.body)}}}"
return f"{{emit_{opname}, {len(self.code.body)}, {len(self.data.body)}, {{{self._get_trampoline_mask()}}}}}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of braces, haha. Maybe move the braces to _get_trampoline_mask?

Suggested change
return f"{{emit_{opname}, {len(self.code.body)}, {len(self.data.body)}, {{{self._get_trampoline_mask()}}}}}"
return f"{{emit_{opname}, {len(self.code.body)}, {len(self.data.body)}, {self._get_trampoline_mask()}}}"

...and then turn ", ".join(trampoline_mask) to "{" + ", ".join(trampoline_mask) + "}" above. That makes it a bit clearer that it's returning valid C code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 333 to 337
if bitmask:
trampoline_mask = [
f"0x{(bitmask >> i*32) & ((1 << 32) - 1):x}"
for i in range(0, SYMBOL_MASK_SIZE)
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a bit of trouble following this, so I tried rewriting it. What do you think? This way we don't need to hardcode SYMBOL_MASK_SIZE, and can just delete it.

Suggested change
if bitmask:
trampoline_mask = [
f"0x{(bitmask >> i*32) & ((1 << 32) - 1):x}"
for i in range(0, SYMBOL_MASK_SIZE)
]
while bitmask:
word = bitmask & ((1 << 32) - 1)
trampoline_mask.append(f"{word:#04x}")
bitmask >>= 32

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -282,6 +252,7 @@ class StencilGroup:
default_factory=dict, init=False
)
_got: dict[str, int] = dataclasses.field(default_factory=dict, init=False)
trampolines: set[int] = dataclasses.field(default_factory=set, init=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this accessed from outside of the class itself? If not, let's add an underscore:

Suggested change
trampolines: set[int] = dataclasses.field(default_factory=set, init=False)
_trampolines: set[int] = dataclasses.field(default_factory=set, init=False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Tools/jit/_stencils.py Show resolved Hide resolved
# Number of 32-bit words needed to store the bit mask of external symbols
SYMBOL_MASK_SIZE: int = 4

known_symbols: dict[str | None, int] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to get rid of this global cache, since I would ideally like the classes to be able to used multiple times in the same process without sharing state like this. I'd need to think it through more, but maybe it should be moved to the _Target class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me see what I can place it...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it in the Target class. Have a look.

Comment on lines 9 to 10
# Number of 32-bit words needed to store the bit mask of external symbols
SYMBOL_MASK_SIZE: int = 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not a huge fan of this magic constant, since it'll almost certainly need to be updated in the future (and may even be platform-dependent). This should be the same as math.ceil(len(known_symbols) / 32) once known_symbols is populated, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. With time this changes (likely increases) and we need to keep updating this number. Also I agree, this could be platform dependent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Tools/jit/_stencils.py Show resolved Hide resolved
@diegorusso
Copy link
Contributor Author

It's a clear improvement for the affected platforms:

Well better than nothing :)

@diegorusso diegorusso force-pushed the gh-119726-generate-and-patch-AArch64-trampolines branch from 8cbca05 to 7bdf9f0 Compare September 19, 2024 16:29
@diegorusso diegorusso force-pushed the gh-119726-generate-and-patch-AArch64-trampolines branch from 7bdf9f0 to ffd74bd Compare September 19, 2024 16:37
@diegorusso diegorusso marked this pull request as ready for review September 19, 2024 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-JIT
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants