From 50f6638ff9a14248cca46843c194c239b28536a5 Mon Sep 17 00:00:00 2001 From: Marcin Franczyk Date: Mon, 19 Aug 2024 14:01:50 +0200 Subject: [PATCH 1/3] NFD image compatibility proposal Signed-off-by: Marcin Franczyk --- .../1845-nfd-image-compatibility/README.md | 184 ++++++++++++++++++ 1 file changed, 184 insertions(+) create mode 100644 enhancements/1845-nfd-image-compatibility/README.md diff --git a/enhancements/1845-nfd-image-compatibility/README.md b/enhancements/1845-nfd-image-compatibility/README.md new file mode 100644 index 0000000000..9a4cbbbdf7 --- /dev/null +++ b/enhancements/1845-nfd-image-compatibility/README.md @@ -0,0 +1,184 @@ +# KEP-1845: Image Compatibility with NFD + +## Summary + +Today, there is no standard solution of describing container image requirements against hardware or operating systems. +Cloud-native is being adapted by highly demanded industries where container compatibility plays a critical role to services in terms of cluster preparation and overall performance. +This proposal introduces an idea of NFD image compatibility metadata. +NFD labels can successfully be added to image to describe the requirements against a host or OS. + +The document has been prepared based on the experience and progress of the [OCI Image Compatibility working group](https://github.com/opencontainers/wg-image-compatibility/tree/main/docs/proposals). + +## Motivation + +Image compatibility metadata will help container image authors describe compatibility requirements in a standard way. +Metadata will be uploaded with the image to the image registry. +This makes hard container compatibility requirements discoverable, programmable, and will support different consumers and cover use cases where the application requires a specific compatible environment. + +### Goals + +#### Phase 1 + +- Use existing NFD labels (keys) to describe container image requirements. +- Create a notation that will describe values including list of ranges and literal. +- Create a new OCI artifact type for compatibility metadata. +Allow to verify node compatibility - including nodes that are not part of k8s cluster yet. + +#### Phase 2 + +Phase 2 is the future prediction, it shows the general direction. +After the phase 1 accomplishment, either this document should be updated or a new proposal created that considers the following points: + +- Dynamically create family of node feature CRs (NodeFeature, NodeFeatureRule etc.) based on compatibility metadata. +- Update/generate pods with appropriate node selectors based on node feature CRs or by mutation the pod with NFD labels directly. + +### Non-Goals + +- Make image compatibility a hard requirement for the NFD installation/usage. +- Cover applications ABI compatibility. + +## Proposal + +Build a new nfd client tool with the following initial scope: + +- CRUD OCI artifact. +- Validate nodes based on provided metadata. +- Run directly on a host which is not part of the Kubernetes cluster, or run as a Kubernetes job on a Kubernetes node. + +Create a notation for NFD key value on the image side. +The notation should allow to describe range and literal values. + +### Design Details + +#### OCI Artifact + +[An OCI artifact](https://github.com/opencontainers/image-spec/blob/main/manifest.md#guidelines-for-artifact-usage) should be created to store image compatibility metadata on the image side. +The artifact can be connected with an image over [the subject field](https://github.com/opencontainers/distribution-spec/blob/11b8e3fba7d2d7329513d0cff53058243c334858/spec.md#pushing-manifests-with-subject). + +##### Manifest + +```json +{ + "schemaVersion": 2, + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "artifactType": "application/vnd.k8s.nfd.image-compatibility.v1", + "config": { + "mediaType": "application/vnd.oci.empty.v1+json", + "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a", + "size": 2 + }, + "layers": [ + { + "mediaType": "application/vnd.k8s.nfd.image-compatibility.spec.v1+json", + "digest": "sha256:4a47f8ae4c713906618413cb9795824d09eeadf948729e213a1ba11a1e31d052", + "size": 1710 + } + ], + "subject": { + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "digest": "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270", + "size": 7682 + }, + "annotations": { + "oci.opencontainers.image.created": "2024-03-27T08:08:08Z" + } +} +``` + +##### Artifact Payload (Schema) + +- **compatibilities** - *array of object* +This REQUIRED property is a list of compatibility sets. + + - **labels** - *object* + This REQUIRED property contains NFD labels. + + - **\** - *string key value pair*. + This REQUIRED property describes an image requirement against the running host OS and hardware. + The value accepts: range and a literal value. + + - **tags** - *array of string* + This OPTIONAL property allows to group compatibility sets by tags. + + - **description** - *string* + This OPTIONAL property is used for a short description on a compatibility set. + +Example + +```json +{ + "compatibilities": [ + { + "labels": { + "cpu-model.vendor_id": "GenuineIntel", + "cpu-cpuid.AVX512": true, + "kernel-selinux.enabled": true, + "kernel-config.PREEMPT": true, + "kernel-version.full": ">=4.19,<5.16; >=5.23", + "custom.glibc": ">=2.31,<=2.37" + }, + "tags": ["intel"] + }, + { + "labels": { + "cpu-model.vendor_id": "AuthenticAMD", + "cpu-cpuid.FPHP": true, + "kernel-selinux.enabled": true, + "kernel-config.PREEMPT": true, + "kernel-version.full": ">=4.19,<5.16; >=5.23", + "pci-1002_67ff.present": true, + "custom.glibc": ">=2.31,<=2.37" + }, + "tags": ["amd"], + "description": "works only with AMD CPU and AMD GPU" + } + ] +} +``` + +##### Discovery + +The subject field shall be used to associate the compatibility artifact with the target image. +The referrers API should be used to discover artifacts. +If one image has multiple artifacts, it is up to a client to choose the correct one. +As a default behavior, it is recommended to get the most recent over "created" timestamp. + +#### NFD Key Value Notation + +NFD key on the image side should accept ranges and literal values. The following notation should be implemented (to describe ranges a standard comparison operators must be used): + +- `>, <, <=, >=` +- `string` +- ranges must be separated by semicolon character `;` + +Examples: + +- `"kernel-version.full": ">=4.15,<5.30"` - greater or equal to 4.15 and lower than 5.30. +- `"kernel-version.full": ">=4.15,<5.25; >5.30"` - greater or equal to 4.15 and lower than 5.25 or greater than 5.30. +- `"kernel-version.full": ">=5.15.x-x-generic"` - great or equal to 5.15 generic kernel flavor. +- `"kernel-version.full": ">=5.15.x-x-generic; >=5.30"` - great or equal to 5.15 generic kernel flavor or greater or equal to any 5.30 kernel version. +- `"kernel-version.full": "5.15"` - must equal to 5.15. + +#### NFD client + +A new standalone cmd line utility should be implemented for the NFD project that shares the same functionality as [nfd kubectl plugin](https://nfd.sigs.k8s.io/usage/kubectl-plugin). + +Both clients should implemented the following commands: + +- `validate` - validate a NodeFeatureRule object (implemented in kubectl plugin). +- `test` - test a NodeFeatureRule object against a node (implemented in kubectl plugin). +- `dryrun` - DryRun a NodeFeatureRule object against a NodeFeature file (implemented in kubectl plugin). +- `compat` - compatibility command with the following subcommands: + - `create-artifact` - create artifact manifest with spec and optionally attach to image (initially users have to create the spec by hand). + - `attach-artifact` - attach artifact to image. + - `push-artifact` - push artifact to OCI repo. + - `delete-artifact` - delete artifact from OCI repo. + - `validate-artifact` - validate the artifact and spec. + - `validate-node` - validate artifact against a node. + +### Test Plan + +To ensure the proper functioning of the nfd client, the following test plan should be executed: + +- **Unit Tests:** Write unit tests for the client. +- **Manual e2e Tests:** Run nfd client with sample data to CRUD artifact and validate a local host. From 49149bc5bb728c61d61d8cc48bdda6df385cfacd Mon Sep 17 00:00:00 2001 From: Marcin Franczyk Date: Fri, 6 Sep 2024 09:21:18 +0200 Subject: [PATCH 2/3] Update enhancements/1845-nfd-image-compatibility/README.md Co-authored-by: Carlos Eduardo Arango Gutierrez --- enhancements/1845-nfd-image-compatibility/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/enhancements/1845-nfd-image-compatibility/README.md b/enhancements/1845-nfd-image-compatibility/README.md index 9a4cbbbdf7..30904c7d5e 100644 --- a/enhancements/1845-nfd-image-compatibility/README.md +++ b/enhancements/1845-nfd-image-compatibility/README.md @@ -22,7 +22,7 @@ This makes hard container compatibility requirements discoverable, programmable, - Use existing NFD labels (keys) to describe container image requirements. - Create a notation that will describe values including list of ranges and literal. - Create a new OCI artifact type for compatibility metadata. -Allow to verify node compatibility - including nodes that are not part of k8s cluster yet. +- Allow to verify node compatibility - including nodes that are not part of k8s cluster yet. #### Phase 2 From 5b8206aeae1fa22dedee64f502b7d3b8378afdff Mon Sep 17 00:00:00 2001 From: Marcin Franczyk Date: Fri, 6 Sep 2024 14:58:54 +0200 Subject: [PATCH 3/3] Change the scope of commands and improve their description Signed-off-by: Marcin Franczyk --- enhancements/1845-nfd-image-compatibility/README.md | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/enhancements/1845-nfd-image-compatibility/README.md b/enhancements/1845-nfd-image-compatibility/README.md index 30904c7d5e..cbaaf1e148 100644 --- a/enhancements/1845-nfd-image-compatibility/README.md +++ b/enhancements/1845-nfd-image-compatibility/README.md @@ -167,14 +167,12 @@ Both clients should implemented the following commands: - `validate` - validate a NodeFeatureRule object (implemented in kubectl plugin). - `test` - test a NodeFeatureRule object against a node (implemented in kubectl plugin). -- `dryrun` - DryRun a NodeFeatureRule object against a NodeFeature file (implemented in kubectl plugin). +- `dryrun` - process a NodeFeatureRule file against a local NodeFeature file to dry run the rule against a node before applying it to a cluster (implemented in kubectl plugin). - `compat` - compatibility command with the following subcommands: - - `create-artifact` - create artifact manifest with spec and optionally attach to image (initially users have to create the spec by hand). - - `attach-artifact` - attach artifact to image. - - `push-artifact` - push artifact to OCI repo. - - `delete-artifact` - delete artifact from OCI repo. - - `validate-artifact` - validate the artifact and spec. - - `validate-node` - validate artifact against a node. + - `attach-spec` - create an artifact with image compatibility specification and attach to the image (initially users have to create the spec by hand). + - `remove-spec` - remove an artifact with image compatibility specification from the image. + - `validate-spec` - validate an artifact and image compatibility specification. + - `validate-node` - validate image compatibility against a node. ### Test Plan