Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setfiles: Could not set context on SELinux-enforcing systems #6

Closed
vrothberg opened this issue Nov 21, 2023 · 17 comments · Fixed by #8
Closed

setfiles: Could not set context on SELinux-enforcing systems #6

vrothberg opened this issue Nov 21, 2023 · 17 comments · Fixed by #8
Assignees

Comments

@vrothberg
Copy link
Contributor

Tried with quay.io/centos-boot/fedora-tier-1:eln and quay.io/centos-boot/centos-tier-1:stream9.

sudo podman run --rm -it --privileged -v $(pwd)/images:/output ghcr.io/osbuild/osbuild-deploy-container -imageref quay.io/centos-boot/centos-tier-1:stream9
[...]
org.osbuild.selinux: c73ddc1b46d5d88c144b1b185cf2559477ea8bcb72f87365ce5fbc02d4625ef3 {
  "file_contexts": "etc/selinux/targeted/contexts/files/file_contexts",
  "labels": {
    "/usr/bin/cp": "system_u:object_r:install_exec_t:s0"
  }
}
/usr/lib/tmpfiles.d/journal-nocow.conf:26: Failed to resolve specifier: uninitialized /etc/ detected, skipping.
All rules containing unresolvable specifiers will be skipped.
setfiles: Could not set context for /run/osbuild/tree/usr/lib/systemd/system-generators/systemd-fstab-generator:  Invalid argument
setfiles: Could not set context for /run/osbuild/tree/usr/lib/systemd/system-generators/systemd-rc-local-generator:  Invalid argument
setfiles: Could not set context for /run/osbuild/tree/usr/lib/systemd/system-generators/systemd-sysv-generator:  Invalid argument
Traceback (most recent call last):
  File "/run/osbuild/bin/org.osbuild.selinux", line 75, in <module>
    r = main(args["tree"], args["options"])
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/osbuild/bin/org.osbuild.selinux", line 62, in main
    subprocess.run(["setfiles", "-F", "-r", f"{tree}", f"{file_contexts}", f"{tree}"], check=True)
  File "/usr/lib64/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['setfiles', '-F', '-r', '/run/osbuild/tree', '/run/osbuild/tree/etc/selinux/targeted/contexts/files/file_contexts', '/run/osbuild/tree']' returned non-zero exit status 255.

⏱  Duration: 4s

Failed
running osbuild failed: exit status 1
@ondrejbudai
Copy link
Member

Yep, we accidentally tested this tool only on selinux-disabled systems. :( Looking into this is my highest priority.

@ondrejbudai ondrejbudai self-assigned this Nov 21, 2023
@vrothberg
Copy link
Contributor Author

Thanks for checking, @ondrejbudai !

@vrothberg
Copy link
Contributor Author

@rhatdan may be able to help :)

@rhatdan
Copy link
Contributor

rhatdan commented Nov 21, 2023

Any AVC's?

@rhatdan
Copy link
Contributor

rhatdan commented Nov 21, 2023

It would be nice if the context below was actually displayed. Most likely the context does not exist on the host system.

setfiles: Could not set context for /run/osbuild/tree/usr/lib/systemd/system-generators/systemd-fstab-generator:  Invalid argument
setfiles: Could not set context for /run/osbuild/tree/usr/lib/systemd/system-generators/systemd-rc-local-generator:  Invalid argument
setfiles: Could not set context for /run/osbuild/tree/usr/lib/systemd/system-generators/systemd-sysv-generator:  Invalid argument

@rhatdan
Copy link
Contributor

rhatdan commented Nov 21, 2023

From the AVC's I see that the script is attempting to set unconfined_u:unconfined_r:spc_t:s0 label on these files, as I understand it.

time->Tue Nov 21 07:24:17 2023
type=AVC msg=audit(1700569457.111:7872): avc: denied { mac_admin } for pid=2428594 comm="setfiles" capability=33 scontext=unconfined_u:unconfined_r:spc_t:s0 tcontext=unconfined_u:unconfined_r:spc_t:s0 tclass=capability2 permissive=0

The issue here is this a process label not a file label, so the script is rightfully blocked. I think there is a bug in osbuild when attempting to set this label.

@ondrejbudai ondrejbudai changed the title example does not work setfiles: Could not set context on SELinux-enforcing systems Nov 22, 2023
@ondrejbudai
Copy link
Member

These are the messages from audit:

lis 22 09:15:32 dalaran audit[175162]: AVC avc:  denied  { mac_admin } for  pid=175162 comm="setfiles" capability=33  scontext=unconfined_u:unconfined_r:spc_t:s0 tcontext=unconfined_u:unconfined_r:spc_t:s0 tclass=capability2 permissive=0
lis 22 09:15:32 dalaran audit: SELINUX_ERR op=setxattr invalid_context="system_u:object_r:systemd_sysv_generator_exec_t:s0"

This issue is not new to osbuild. We were able to resolve these issues by shipping a custom policy. See osbuild/osbuild#400, osbuild/osbuild#442, osbuild/osbuild#495 and https://github.com/osbuild/osbuild/blob/016407284a307e23f0b2c86b946cf24c85babb36/selinux/README.md for reference.

The root cause lies in the fact that osbuild needs to apply SELinux labels on the filesystem being built. This wouldn't be a big issue if it builds the same distribution as it runs on. However, when osbuild deals with cross-distribution building, it can happen that it needs to apply a label that the host doesn't know. In this particular case, we build Fedora ELN on Fedora 38/39. Fedora ELN uses systemd_sysv_generator_exec_t, but Fedora 38 nor 39 know this label. To apply such a label, the process doing the labeling needs CAP_MAC_ADMIN.

CAP_MAC_ADMIN is a tricky one: Not even unconfined_t has this capability. By default, not even setfiles (which osbuild uses) has this capability, because its default domain is setfiles_t. In order to give setfiles CAP_MAC_ADMIN, it's somehow needed to have setfiles running in the setfiles_mac_t domain. (another example of a domain with CAP_MAC_ADMIN is install_t).

Let's take a look at how osbuild running directly on the host handles this:

osbuild itself runs in its own domain osbuild_t (provided by osbuild-selinux). Currently, basically all the operations that affect the resulting image are run in a bwrap container to help security and isolation from the host. When osbuild spawns bwrap, bwrap sets nosuid and no_new_privs and drops certain capabilities (CAP_MAC_ADMIN is not dropped, though, at least for the selinux stage. This is weird because I don't think that osbuild_t has CAP_MAC_ADMIN). Then, in bwrap, an osbuild stage is run. To recap, we are still in the osbuild_t domain, but with certain caps dropped and nosuid and no_new_privs are in effect. However, the org.osbuild.selinux needs to run setfiles with CAP_MAC_ADMIN, so how can we achieve this? The custom osbuild selinux policy allows transitioning from osbuild_t to setfiles_mac_t. Additionally, the policy explicitly allows the process to gain more privileges during the transition (which is directly against no_new_privs).

To sum it up, even though osbuild does quite a lot to run in a well-confined environment, it's able to grant CAP_MAC_ADMIN to setfiles, which is what matters in this case. Note that the policy also allows transitioning from osbuild_t to install_t, so ostree can run with CAP_MAC_ADMIN as well.


Now lt's look at our options inside a --privileged container. Note that the default domain for a privileged container is spc_t, but let's say we are in unconfined_t (--security-opt label=type:unconfined_t) to make our lives simpler.

If the host has osbuild-selinux installed, I can just label all osbuild files as osbuild_exec_t, label setfiles as setfiles_exec_t and everything would work as outside the container (I tested this). However, I don't think we want to rely on the fact that everyone has osbuild-selinux installed.

Without osbuild-selinux I can definitely label setfiles inside the container as setfiles_exec_t, or even install_exec_t. I should be able to transition from unconfined_t to install_t, right? The issue is in bwrap, though. Since it sets nosuid and no_new_privs, the transition from unconfined_t to install_t is not allowed because getting extra privileges is AFAIK not explicitly allowed (unlike for the transition from osbuild_t to install_t).

I was thinking that a simple fix would be to run bwrap in such a way that it doesn't no_new_privs. Then, we can label setfiles as install_exec_t and a transition from unconfined_t to install_t should give setfiles CAP_MAC_ADMIN. Note that setfiles_exec_t would be certainly better for setfiles, but I'm not sure what are exactly the rules for transitioning from unconfined_t to setfiles_mac_t (instead of the default setfiles_t). However, this plan fails on the fact that bwrap hardcodes setting no_new_privs. Thus, I don't think this is possible.

Another option for labeling the filesystem without our custom selinux policy is to move the labeling process out of bwrap. My rough plan would be to introduce a labeling service in osbuild. The org.osbuild.selinux would call the service from inside bwrap. The labeling itself would be done using a child process of osbuild itself running in install_t. This might have one major flaw, though. I think that rpm-ostree itself is sometimes setting SELinux labels. Firstly, we need to investigate whether this is true. If it is, would it be possible to disable this behaviour and instead use the new labeling service? If yes, this might be the safest option.

The last-resort option is AFAIK to run certain stages without bwrap. We can introduce a different buildroot implementation (systemd-nspawn? We had it before.) that doesn't force no_new_privs. This feels like a lot of work and duplication, though.

I must admit though, that I might be missing something obvious and that I might have completely misunderstood the relationship between SELinux domain transitions and no_new_privs. I'm still an SELinux noob.

@cgwalters
Copy link
Contributor

I think that rpm-ostree itself is sometimes setting SELinux labels.

The hyperlink seems wrong?

Firstly, we need to investigate whether this is true. If it is, would it be possible to disable this behaviour and instead use the new labeling service?

Please, no. I've spent sooooo much time ensuring SELinux labeling in ostree works well, having something else try to override/recompute all the labels is really unappealing.

It's also really important to note that the use case here is deploying a pre-labeled (ostree) container image; we don't want to go in and potentially change those labels. (Also with composefs enabled, one can't)

That said...even today we have a "bootstrap" problem for the labels on the files/directories necessary for all the files/directories leading up to the ostree root. In coreos-assembler's create_disk.sh we basically cheat because the labels come from the policy of the build container.

In the way bootc works, we have the target OS image running as a container and so it's more natural to use that for labeling. But...doing this here does require fetching and unpacking the container (or at least /etc/selinux) and then using its content to compute labels for that "bootstrap on disk structure" before deploying it.


Another option for labeling the filesystem without our custom selinux policy is to move the labeling process out of bwrap. My rough plan would be to introduce a labeling service in osbuild.

Well, yes. Ultimately because there's no namespacing for SELinux, anything writing arbitrary labels needs full privileges.

That said, what bootc tries today is to ensure the entire process runs as install_t from the start: https://github.com/containers/bootc/blob/1db8a4c18e8d01a0b45edba053d9bfca769f24f4/lib/src/install.rs#L741

So we could try that here too. Note that doing this I think requires today doing --security-opt label=type:unconfined_t - which is why the bootc install docs include it. Now that sub-thread does still point to larger questions around whether we should really focus on an architecture that has osbuild create disk partitions, and target bootc install-to-filesystem where bootc owns all things related to the filesystem, including SELinux labeling concerns. If we go down that route (which I find pretty appealing) it would then lead towards something like this bootc issue for injecting extra filesystem state from blueprints (users, ssh keys) passed to bootc as a container image instead of osbuild writing files itself directly to the filesystem.

@achilleas-k achilleas-k pinned this issue Nov 22, 2023
@cgwalters
Copy link
Contributor

cgwalters commented Nov 22, 2023

In coreos-assembler's create_disk.sh we basically cheat because the labels come from the policy of the build container.

Also with Anaconda today it's similar; these labels use the policy embedded in the Anaconda ISO. In general this doesn't really matter because the labels for the "bootstrap state" haven't changed in forever. Looks like /ostree is usr_t and that's pretty standard.

@rhatdan
Copy link
Contributor

rhatdan commented Nov 22, 2023

The process does not need MAC_ADMIN, this is happening because it is trying to pull a processes label (current label) on to the file system. SELinux does not allow process labels to be applied to file system, because they make no sense.
The key issue is that somehow setfiles is attempting to set a process label on /run/osbuild/tree/usr/lib/systemd/system-generators/*. I don't see where in the code this comes from. I see no place where setfiles is run on /run directory other then perhaps through /var/run.

I ran the centos-bootc image and searched for spc_t in the /etc/selinux/targeted/context/files/* directory and there is nothing there, so I have no idea what is going on here.

@rhatdan
Copy link
Contributor

rhatdan commented Nov 22, 2023

We could try running

--security-opt label=type:unconfined_t or --security-opt label=type:install_t to run with a type other then spc_t, BUT if the spc_t, or unconfined_t or install_t type gets put on disk, then something is broken.

@achilleas-k
Copy link
Member

I see no place where setfiles is run on /run directory other then perhaps through /var/run.

The org.osbuild.selinux stage is running setfiles on the tree to ensure the system policies of the target system are correct.

Could not set context for /run/osbuild/tree/usr/lib/systemd/system-generators/systemd-sysv-generator

is coming from https://github.com/osbuild/osbuild/blob/f982b1f61af012dfdf7addf78c363f8d293638de/stages/org.osbuild.selinux#L62
where we do two things:

  1. Run setfiles on the build root using the contexts of the build root itself, so that all tools have the appropriate labels.
  2. Add extra labels to certain binaries that will need them later in the pipeline, in this case "/usr/bin/cp": "system_u:object_r:install_exec_t:s0", so that it can copy files from the deployment to the final image without encountering issues with unknown (to the host) labels.

@achilleas-k
Copy link
Member

It's also really important to note that the use case here is deploying a pre-labeled (ostree) container image; we don't want to go in and potentially change those labels.

But we do want to ensure that the labels are retained and so our tools inside the build root need to be correctly labelled and have the appropriate policy to write those labels (see previous comment about cp).

@rhatdan
Copy link
Contributor

rhatdan commented Nov 22, 2023

I am looking for where that main is exec'd Somewhere tree is set and file_contexts is being set, the content of the file_context looks wrong, at least pulling in the spc_t label.

["setfiles", "-F", "-r", f"{tree}", f"{file_contexts}", f"{tree}"], check=True)

@achilleas-k
Copy link
Member

achilleas-k commented Nov 22, 2023

tree comes from the osbuild pipeline runner: https://github.com/osbuild/osbuild/blob/4b69d2e1c46c26812425b0b604d52be9e13aef65/osbuild/pipeline.py#L178

file_contexts is set through the stage options. We always set it to tree + "etc/selinux/targeted/contexts/files/file_contexts" for the build root (for example in this manifest: https://github.com/osbuild/osbuild/blob/9d7bbd674fd1cd98243830280b8a865e83efb1dc/test/data/manifests/fedora-coreos-container.json#L435).

@rhatdan
Copy link
Contributor

rhatdan commented Nov 22, 2023

What exactly is osbuild doing with the skopeo command? Does it just assemble a rootfs? If so, then we might be able to speed this up buy pulling the image and then using it from local containers/storage. Rather then pulling it everytime.

@achilleas-k
Copy link
Member

It uses skopeo to download the container to a local cache, which inside the container is at /store, so -v osbuild-store:/store should speed up rebuilds.

ondrejbudai added a commit that referenced this issue Nov 24, 2023
When building an image, we need to make sure that the target system is
correctly labeled. This becomes challenging if the target system contains
labels that are unknown to the host because the process setting the label
needs to have CAP_MAC_ADMIN if the host is SELinux-enforcing.

CAP_MAC_ADMIN isn't a common capability on a SELinux-enforcing system.
Even unconfined_t doesn't have it (same for spc_t - label used by
--privileged containers). Thus, we need to ensure that we transition to
a domain that actually has it.

This commit relabels osbuild as install_t, a domain that has CAP_MAP_ADMIN.
A bit of mount-dancing is needed in order to achieve that, see prepare.sh.

I decided to make prepare.sh a separate script. This is useful for debugging:

host # podman run -it \
  --privileged \
  --security-opt label=type:unconfined_t \
  --entrypoint bash \
  localhost/osbuild-deploy-container
container # ./prepare.sh

This way, you get the same environment as if you run the container the
default way.

See #6 (comment)
and links in this comment for further information.
achilleas-k pushed a commit that referenced this issue Nov 24, 2023
When building an image, we need to make sure that the target system is
correctly labeled. This becomes challenging if the target system contains
labels that are unknown to the host because the process setting the label
needs to have CAP_MAC_ADMIN if the host is SELinux-enforcing.

CAP_MAC_ADMIN isn't a common capability on a SELinux-enforcing system.
Even unconfined_t doesn't have it (same for spc_t - label used by
--privileged containers). Thus, we need to ensure that we transition to
a domain that actually has it.

This commit relabels osbuild as install_t, a domain that has CAP_MAP_ADMIN.
A bit of mount-dancing is needed in order to achieve that, see prepare.sh.

I decided to make prepare.sh a separate script. This is useful for debugging:

host # podman run -it \
  --privileged \
  --security-opt label=type:unconfined_t \
  --entrypoint bash \
  localhost/osbuild-deploy-container
container # ./prepare.sh

This way, you get the same environment as if you run the container the
default way.

See #6 (comment)
and links in this comment for further information.
@achilleas-k achilleas-k unpinned this issue Nov 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants