Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NFD image compatibility proposal #1845

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
182 changes: 182 additions & 0 deletions enhancements/1845-nfd-image-compatibility/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
# KEP-1845: Image Compatibility with NFD

## Summary

Today, there is no standard solution of describing container image requirements against hardware or operating systems.
Cloud-native is being adapted by highly demanded industries where container compatibility plays a critical role to services in terms of cluster preparation and overall performance.
This proposal introduces an idea of NFD image compatibility metadata.
NFD labels can successfully be added to image to describe the requirements against a host or OS.

The document has been prepared based on the experience and progress of the [OCI Image Compatibility working group](https://github.com/opencontainers/wg-image-compatibility/tree/main/docs/proposals).

## Motivation

Image compatibility metadata will help container image authors describe compatibility requirements in a standard way.
Metadata will be uploaded with the image to the image registry.
This makes hard container compatibility requirements discoverable, programmable, and will support different consumers and cover use cases where the application requires a specific compatible environment.

### Goals

#### Phase 1

- Use existing NFD labels (keys) to describe container image requirements.
- Create a notation that will describe values including list of ranges and literal.
- Create a new OCI artifact type for compatibility metadata.
- Allow to verify node compatibility - including nodes that are not part of k8s cluster yet.

#### Phase 2

Phase 2 is the future prediction, it shows the general direction.
After the phase 1 accomplishment, either this document should be updated or a new proposal created that considers the following points:

- Dynamically create family of node feature CRs (NodeFeature, NodeFeatureRule etc.) based on compatibility metadata.
- Update/generate pods with appropriate node selectors based on node feature CRs or by mutation the pod with NFD labels directly.

### Non-Goals

- Make image compatibility a hard requirement for the NFD installation/usage.
- Cover applications ABI compatibility.

## Proposal

Build a new nfd client tool with the following initial scope:

- CRUD OCI artifact.
- Validate nodes based on provided metadata.
- Run directly on a host which is not part of the Kubernetes cluster, or run as a Kubernetes job on a Kubernetes node.

Create a notation for NFD key value on the image side.
The notation should allow to describe range and literal values.

### Design Details

#### OCI Artifact

[An OCI artifact](https://github.com/opencontainers/image-spec/blob/main/manifest.md#guidelines-for-artifact-usage) should be created to store image compatibility metadata on the image side.
The artifact can be connected with an image over [the subject field](https://github.com/opencontainers/distribution-spec/blob/11b8e3fba7d2d7329513d0cff53058243c334858/spec.md#pushing-manifests-with-subject).

##### Manifest

```json
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"artifactType": "application/vnd.k8s.nfd.image-compatibility.v1",
"config": {
"mediaType": "application/vnd.oci.empty.v1+json",
"digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
"size": 2
},
"layers": [
{
"mediaType": "application/vnd.k8s.nfd.image-compatibility.spec.v1+json",
"digest": "sha256:4a47f8ae4c713906618413cb9795824d09eeadf948729e213a1ba11a1e31d052",
"size": 1710
}
],
"subject": {
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270",
"size": 7682
},
"annotations": {
"oci.opencontainers.image.created": "2024-03-27T08:08:08Z"
}
}
```

##### Artifact Payload (Schema)

- **compatibilities** - *array of object*
This REQUIRED property is a list of compatibility sets.

- **labels** - *object*
This REQUIRED property contains NFD labels.

- **\<NFD label\>** - *string key value pair*.
This REQUIRED property describes an image requirement against the running host OS and hardware.
The value accepts: range and a literal value.

- **tags** - *array of string*
This OPTIONAL property allows to group compatibility sets by tags.

- **description** - *string*
This OPTIONAL property is used for a short description on a compatibility set.

Example

```json
{
"compatibilities": [
{
"labels": {
"cpu-model.vendor_id": "GenuineIntel",
"cpu-cpuid.AVX512": true,
"kernel-selinux.enabled": true,
"kernel-config.PREEMPT": true,
"kernel-version.full": ">=4.19,<5.16; >=5.23",
"custom.glibc": ">=2.31,<=2.37"
},
"tags": ["intel"]
},
{
"labels": {
"cpu-model.vendor_id": "AuthenticAMD",
"cpu-cpuid.FPHP": true,
"kernel-selinux.enabled": true,
"kernel-config.PREEMPT": true,
"kernel-version.full": ">=4.19,<5.16; >=5.23",
"pci-1002_67ff.present": true,
"custom.glibc": ">=2.31,<=2.37"
},
"tags": ["amd"],
"description": "works only with AMD CPU and AMD GPU"
}
]
}
```

##### Discovery

The subject field shall be used to associate the compatibility artifact with the target image.
The referrers API should be used to discover artifacts.
If one image has multiple artifacts, it is up to a client to choose the correct one.
As a default behavior, it is recommended to get the most recent over "created" timestamp.

#### NFD Key Value Notation

NFD key on the image side should accept ranges and literal values. The following notation should be implemented (to describe ranges a standard comparison operators must be used):

- `>, <, <=, >=`
- `string`
- ranges must be separated by semicolon character `;`

Examples:

- `"kernel-version.full": ">=4.15,<5.30"` - greater or equal to 4.15 and lower than 5.30.
- `"kernel-version.full": ">=4.15,<5.25; >5.30"` - greater or equal to 4.15 and lower than 5.25 or greater than 5.30.
- `"kernel-version.full": ">=5.15.x-x-generic"` - great or equal to 5.15 generic kernel flavor.
- `"kernel-version.full": ">=5.15.x-x-generic; >=5.30"` - great or equal to 5.15 generic kernel flavor or greater or equal to any 5.30 kernel version.
- `"kernel-version.full": "5.15"` - must equal to 5.15.

#### NFD client

A new standalone cmd line utility should be implemented for the NFD project that shares the same functionality as [nfd kubectl plugin](https://nfd.sigs.k8s.io/usage/kubectl-plugin).

Both clients should implemented the following commands:
mfranczy marked this conversation as resolved.
Show resolved Hide resolved

- `validate` - validate a NodeFeatureRule object (implemented in kubectl plugin).
- `test` - test a NodeFeatureRule object against a node (implemented in kubectl plugin).
- `dryrun` - process a NodeFeatureRule file against a local NodeFeature file to dry run the rule against a node before applying it to a cluster (implemented in kubectl plugin).
- `compat` - compatibility command with the following subcommands:
- `attach-spec` - create an artifact with image compatibility specification and attach to the image (initially users have to create the spec by hand).
- `remove-spec` - remove an artifact with image compatibility specification from the image.
- `validate-spec` - validate an artifact and image compatibility specification.
- `validate-node` - validate image compatibility against a node.

### Test Plan

To ensure the proper functioning of the nfd client, the following test plan should be executed:

- **Unit Tests:** Write unit tests for the client.
- **Manual e2e Tests:** Run nfd client with sample data to CRUD artifact and validate a local host.