Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

krun: set the vRAM size for the microVM #69

Merged
merged 2 commits into from
Oct 1, 2024

Conversation

slp
Copy link
Collaborator

@slp slp commented Sep 28, 2024

Starting v1.9.5, libkrun allows configuring the size of the SHM window
for virtio-gpu, which acts as a pseudo-vram (previously it was hardcoded
to a conservative 8GB).

Let's use that feature to configure the vRAM of the microVM to a
reasonable value. Since we expect userspace drivers in the guest to be
conservative for us, telling appliations that vRAM is just a percentage
of the guest's RAM, let's just set vRAM to be the same as the total
amount of RAM in the system, leaving some room for potential
fragmentation.

Also, introduce the "--vram" command flag to allow users to override the
default value.

While there, also remove the 32 GB limit, since libkrun does support
guests larger than that.

@alyssarosenzweig
Copy link
Member

Instead of trying to be clever, just configure the vRAM aperture to be 80% of the total amount of RAM in the host, independently of the amount of guest RAM configured, hoping the latter can be swapped out by the host if we run out of memory.

this seems potentially dicey, although I guess it's not really worse than the non-VM case

@alyssarosenzweig
Copy link
Member

This is breaking Control on a 32GB machine, kcompactd gets very busy trying to get in game.

@alyssarosenzweig
Copy link
Member

This is breaking Control on a 32GB machine, kcompactd gets very busy trying to get in game.

Dropping to 40% lets Control get in game and still exposes 12GB of VRAM.

BTW, how can the guest know the VRAM size? Honeykrisp currently assumes all RAM is VRAM, which is wrong for the VM apparently. so that's going to cause problems if it isn't already.

@asahilina
Copy link
Member

asahilina commented Sep 29, 2024

I'm confused, why does this need to be a heuristic? Isn't this just a window? It's not like VRAM is pre-allocated. Something is fishy here if we need to set a conservative value to make things work. It should be fine to set this to like, a terabyte, as long as the physical AS exists. There's no reason to "limit" VRAM via krun just like the host makes no attempt to "limit" VRAM for host apps. Once you run out of physical RAM + non-VRAM swap things will start to fail either way.

Also I expect there will be fragmentation problems here, so it's not strictly a VRAM limit, we should oversize it anyway to reduce the impact of fragmentation.

@slp
Copy link
Collaborator Author

slp commented Sep 29, 2024

I'm confused, why does this need to be a heuristic? Isn't this just a window? It's not like VRAM is pre-allocated. Something is fishy here if we need to set a conservative value to make things work. It should be fine to set this to like, a terabyte, as long as the physical AS exists. There's no reason to "limit" VRAM via krun just like the host makes no attempt to "limit" VRAM for host apps. Once you run out of physical RAM + non-VRAM swap things will start to fail either way.

Also I expect there will be fragmentation problems here, so it's not strictly a VRAM limit, we should oversize it anyway to reduce the impact of fragmentation.

Yeah, I was giving the VMM the role of restricting VRAM consumption, and that's probably not its role, as the host wouldn't do that for other GPU userspace apps running bare metal.

Now the question is who should be taking that role, since both my experience on the M1 and @alyssarosenzweig tests with Control seem to point that some applications may attempt to use as much VRAM as its available (which seems like the right thing for them to do). Should the Mesa driver somehow try to limit VRAM ram usage (actual usage, not the size of SHM window) to some value? How is this done in bare-metal to avoid running out of physical RAM to be used as actual RAM?

@asahilina
Copy link
Member

asahilina commented Sep 29, 2024

That's why I'm saying something is fishy, this isn't supposed to happen. I guess some games might try to dynamically control VRAM usage, but I'm not sure how common that is, and if anything that should be based on whatever some userspace API (GL/Vulkan) reports as "VRAM size", not anything under the hood. I'm pretty sure they wouldn't just allocate until allocations fail, that could cause everything to crash for other reasons since the game itself isn't the only thing doing VRAM allocations.

Given what I saw with Stray yesterday, where it exhausted the 8GB VRAM window before even going in-game while the game itself claimed it was using 0.2GB of VRAM, I suspect we have a leak or bloat issue somewhere...

How is this done in bare-metal to avoid running out of physical RAM to be used as actual RAM?

It just isn't...

Dropping to 40% lets Control get in game and still exposes 12GB of VRAM.

This tells me that either something is broken and having a larger window is somehow consuming resources on its own (beyond the overhead of page tables), or Control is doing something crazy and trying to allocate all VRAM to exhaustion by trial and error. I cannot think of a reasonable scenario where reducing the VRAM guest window size is the "correct" thing to do here. If it's fixing things then something else is broken.

@slp
Copy link
Collaborator Author

slp commented Sep 29, 2024

How is this done in bare-metal to avoid running out of physical RAM to be used as actual RAM?

It just isn't...

It looks like this it's actually done on macOS via the iogpu.wired_limit_mb sysctl tunable, which if left untouched will cap VRAM to ~65% of RAM. I don't know if the way this works is by returning the adjusted value to Metal userspace apps, or by denying allocations after reaching the limit (or both).

We can add logic in libkrun to keep track of memory allocations in the SHM region and cap it to some limit, independently of the window size, but probably userspace should also be made aware of this limit. Perhaps we could return the adjusted VRAM value as part of the DRM native context capabilities?

@asahilina
Copy link
Member

asahilina commented Sep 29, 2024

If we want to limit global VRAM size, this should be done in the host driver (in fact, probably in shared drm-shmem infrastructure), not in the guest. Though I'm wary of the whole concept because it sounds like it will strictly make things worse in most cases (if there is a global VRAM limit, then running out is likely to crash the compositor and random apps). This needs to be carefully considered before we introduce such a concept. Something like "90% of total RAM" makes sense IMO as a final safety net, anything less and I wouldn't be surprised if it hurts more than helps some use cases.

A per app VRAM limit would make more sense as a safety net, but then it should rarely be reached. That might make sense inside the VM too, though it's less important because the VM will usually be running a single game anyway (plus Steam which is a memory hog, but you don't care if the whole VM crashes if the game crashes most of the time anyway).

Part of the issue is GPU memory is not swappable right now, and that is a known limitation of the host driver. It'll be fixed at some point (that's one thing we need the non-shared BO fix for!) but we shouldn't paper over this with arbitrary VRAM limits.

But all that aside... whatever is happening with Control isn't something we should fix by reducing the window size here. We need to figure out exactly what is going on there.

@slp
Copy link
Collaborator Author

slp commented Sep 29, 2024

From the fact that Control was working fine without this PR (and thus, with libkrun's default limit of 8GB for the SHM window), we can infer that:

  • Control's engine will try to use as much VRAM as its available (and per @alyssarosenzweig's comment that's now the same as the amount of RAM in the system).
  • Control's engine is clever enough to backtrack if an allocation fails.

With this in mind, I'd said the most straightforward approach would be:

  1. In krun set the SHM window size to an arbitrarily large value.
  2. In Mesa, adjust VRAM to be a percentage of the RAM (which would actually be guest RAM, not the actual RAM of the system).

With the default's krun behavior, which sets guest RAM to be 80% of host RAM, and with a VRAM of, let's say, 70% of the guest RAM, we would be overcommitting to the host RAM to ~140%, which is pretty reasonable having some swap configured in the system.

@alyssarosenzweig
Copy link
Member

I've adjusted things on the honeykrisp side, control seems to work again. No objection to merging this.

Starting v1.9.5, libkrun allows configuring the size of the SHM window
for virtio-gpu, which acts as a pseudo-vram (previously it was hardcoded
to a conservative 8GB).

Let's use that feature to configure the vRAM of the microVM to a
reasonable value. Since we expect userspace drivers in the guest to be
conservative for us, telling appliations that vRAM is just a percentage
of the guest's RAM, let's just set vRAM to be the same as the total
amount of RAM in the system, leaving some room for potential
fragmentation.

Also, introduce the "--vram" command flag to allow users to override the
default value.

While there, also remove the 32 GB limit, since libkrun does support
guests larger than that.

Signed-off-by: Sergio Lopez <[email protected]>
Leaving less than 3072MiB for the host can compromise the system's
stability, so ensure we do that unless instructed by the user. This
mainly applies to 8GB systems.

Signed-off-by: Sergio Lopez <[email protected]>
Copy link
Member

@alyssarosenzweig alyssarosenzweig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@alyssarosenzweig alyssarosenzweig merged commit 572f276 into AsahiLinux:main Oct 1, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants