-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
krun: set the vRAM size for the microVM #69
Conversation
this seems potentially dicey, although I guess it's not really worse than the non-VM case |
This is breaking Control on a 32GB machine, kcompactd gets very busy trying to get in game. |
Dropping to 40% lets Control get in game and still exposes 12GB of VRAM. BTW, how can the guest know the VRAM size? Honeykrisp currently assumes all RAM is VRAM, which is wrong for the VM apparently. so that's going to cause problems if it isn't already. |
I'm confused, why does this need to be a heuristic? Isn't this just a window? It's not like VRAM is pre-allocated. Something is fishy here if we need to set a conservative value to make things work. It should be fine to set this to like, a terabyte, as long as the physical AS exists. There's no reason to "limit" VRAM via krun just like the host makes no attempt to "limit" VRAM for host apps. Once you run out of physical RAM + non-VRAM swap things will start to fail either way. Also I expect there will be fragmentation problems here, so it's not strictly a VRAM limit, we should oversize it anyway to reduce the impact of fragmentation. |
Yeah, I was giving the VMM the role of restricting VRAM consumption, and that's probably not its role, as the host wouldn't do that for other GPU userspace apps running bare metal. Now the question is who should be taking that role, since both my experience on the M1 and @alyssarosenzweig tests with Control seem to point that some applications may attempt to use as much VRAM as its available (which seems like the right thing for them to do). Should the Mesa driver somehow try to limit VRAM ram usage (actual usage, not the size of SHM window) to some value? How is this done in bare-metal to avoid running out of physical RAM to be used as actual RAM? |
That's why I'm saying something is fishy, this isn't supposed to happen. I guess some games might try to dynamically control VRAM usage, but I'm not sure how common that is, and if anything that should be based on whatever some userspace API (GL/Vulkan) reports as "VRAM size", not anything under the hood. I'm pretty sure they wouldn't just allocate until allocations fail, that could cause everything to crash for other reasons since the game itself isn't the only thing doing VRAM allocations. Given what I saw with Stray yesterday, where it exhausted the 8GB VRAM window before even going in-game while the game itself claimed it was using 0.2GB of VRAM, I suspect we have a leak or bloat issue somewhere...
It just isn't...
This tells me that either something is broken and having a larger window is somehow consuming resources on its own (beyond the overhead of page tables), or Control is doing something crazy and trying to allocate all VRAM to exhaustion by trial and error. I cannot think of a reasonable scenario where reducing the VRAM guest window size is the "correct" thing to do here. If it's fixing things then something else is broken. |
It looks like this it's actually done on macOS via the We can add logic in libkrun to keep track of memory allocations in the SHM region and cap it to some limit, independently of the window size, but probably userspace should also be made aware of this limit. Perhaps we could return the adjusted VRAM value as part of the DRM native context capabilities? |
If we want to limit global VRAM size, this should be done in the host driver (in fact, probably in shared drm-shmem infrastructure), not in the guest. Though I'm wary of the whole concept because it sounds like it will strictly make things worse in most cases (if there is a global VRAM limit, then running out is likely to crash the compositor and random apps). This needs to be carefully considered before we introduce such a concept. Something like "90% of total RAM" makes sense IMO as a final safety net, anything less and I wouldn't be surprised if it hurts more than helps some use cases. A per app VRAM limit would make more sense as a safety net, but then it should rarely be reached. That might make sense inside the VM too, though it's less important because the VM will usually be running a single game anyway (plus Steam which is a memory hog, but you don't care if the whole VM crashes if the game crashes most of the time anyway). Part of the issue is GPU memory is not swappable right now, and that is a known limitation of the host driver. It'll be fixed at some point (that's one thing we need the non-shared BO fix for!) but we shouldn't paper over this with arbitrary VRAM limits. But all that aside... whatever is happening with Control isn't something we should fix by reducing the window size here. We need to figure out exactly what is going on there. |
From the fact that Control was working fine without this PR (and thus, with libkrun's default limit of 8GB for the SHM window), we can infer that:
With this in mind, I'd said the most straightforward approach would be:
With the default's krun behavior, which sets guest RAM to be 80% of host RAM, and with a VRAM of, let's say, 70% of the guest RAM, we would be overcommitting to the host RAM to ~140%, which is pretty reasonable having some swap configured in the system. |
I've adjusted things on the honeykrisp side, control seems to work again. No objection to merging this. |
Starting v1.9.5, libkrun allows configuring the size of the SHM window for virtio-gpu, which acts as a pseudo-vram (previously it was hardcoded to a conservative 8GB). Let's use that feature to configure the vRAM of the microVM to a reasonable value. Since we expect userspace drivers in the guest to be conservative for us, telling appliations that vRAM is just a percentage of the guest's RAM, let's just set vRAM to be the same as the total amount of RAM in the system, leaving some room for potential fragmentation. Also, introduce the "--vram" command flag to allow users to override the default value. While there, also remove the 32 GB limit, since libkrun does support guests larger than that. Signed-off-by: Sergio Lopez <[email protected]>
Leaving less than 3072MiB for the host can compromise the system's stability, so ensure we do that unless instructed by the user. This mainly applies to 8GB systems. Signed-off-by: Sergio Lopez <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
Starting v1.9.5, libkrun allows configuring the size of the SHM window
for virtio-gpu, which acts as a pseudo-vram (previously it was hardcoded
to a conservative 8GB).
Let's use that feature to configure the vRAM of the microVM to a
reasonable value. Since we expect userspace drivers in the guest to be
conservative for us, telling appliations that vRAM is just a percentage
of the guest's RAM, let's just set vRAM to be the same as the total
amount of RAM in the system, leaving some room for potential
fragmentation.
Also, introduce the "--vram" command flag to allow users to override the
default value.
While there, also remove the 32 GB limit, since libkrun does support
guests larger than that.