Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-4816: DRA Prioritized Alternatives in Device Requests #4871

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

johnbelamaric
Copy link
Member

@johnbelamaric johnbelamaric commented Sep 24, 2024

  • One-line PR description: adding new KEP for Prioritized Alternatives in Device Requests
  • Other comments:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 24, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 24, 2024
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 24, 2024
@johnbelamaric
Copy link
Member Author

/hold for all approvals

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 24, 2024
@johnbelamaric
Copy link
Member Author

cc @klueska @pohly

/assign @mrunalp

@sftim
Copy link
Contributor

sftim commented Sep 25, 2024

@johnbelamaric can you redo the one-line PR description? It's a bit unclear.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 26, 2024
The proposal adds a new type, called `RankedDeviceRequest`, which allows the
user to list `DeviceRequest`s, exactly one of which must be satisfied. The
`DeviceClaim` then gets a new field listing all of these such requests that must
be satisfied. There is no change to the existing `DeviceRequest` type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach is the opposite of how we envisioned to add this feature. Adding a new field means that old consumers acting on a DeviceClaim will not see that new field. Instead of detecting that they are out-dated and cannot handle the claim, they will treat it like an empty claim.

Instead, the plan was to extend DeviceRequest with a OneOf slice and leave the rest of the fields unset, thus treating it like a "one-of": either the other fields are set or the new OneOf, but never both. This depends on DeviceClassName being required and current consumers checking that it is set, which they should.

We could reuse DeviceRequest and define that the OneOf slice must be empty when the DeviceRequest is used inside a OneOf (no nesting). The DeviceRequest.Name is meaningful the same way as you describe it in the KEP for RankedDeviceRequest.Name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I had a similar train of thought but wasn't sure how we could handle it because of the current request fields being required. Are you saying we can relax that in validation? Then I think we can nest the current DeviceRequest fields all under a DeviceRequestDetail which we inline. Or just duplicate the fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying we can relax that in validation?

I think we can. It's the "organic" evolution of the API that we discussed previously.

Then I think we can nest the current DeviceRequest fields all under a DeviceRequestDetail which we inline. Or just duplicate the fields.

Of just use DeviceRequest with the constraint that the nested instances must use the fields, not the OneOf slice:

type DeviceRequest {
    ...
    // +optional
    // +oneOf=deviceRequestType
    DeviceClassName string
    
    ...
    //
    // OneOf may only be set in the entries of DeviceClaim.Requests. It must not be set
    // in DeviceRequest instances that themselves are part of a OneOf.
    //
    // +optional
    // +oneOf=deviceRequestType
    OneOf []DeviceRequest
}

Reusing DeviceRequest might make some of the implementation simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual validation code will have to ensure that either DeviceClassName or OneOf are set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change, I like it. It makes the change even smaller. The one thing I did differently is use the field name FirstOf instead of OneOf, which I think is more descriptive and also won't be confused with our other use of the term "one of".

@k8s-ci-robot
Copy link
Contributor

@johnbelamaric: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-enhancements-verify 338dde9 link true /test pull-enhancements-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

CRI or CNI may require updating that component before the kubelet.
-->

## Production Readiness Review Questionnaire
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnbelamaric Would you ping me when this is ready for review?

`DeviceClassName` was required, were requested to include this logic, and the
in-tree components have been built in this way.

```go
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the diff, if it's useful during review:

diff --git a/pkg/apis/resource/types.go b/pkg/apis/resource/types.go
index 2c03a304fdf..93b96c3e4c8 100644
--- a/pkg/apis/resource/types.go
+++ b/pkg/apis/resource/types.go
@@ -397,7 +397,11 @@ type DeviceRequest struct {
        // additional configuration and selectors to be inherited by this
        // request.
        //
-       // A class is required. Which classes are available depends on the cluster.
+       // Either a class or FirstOf requests are required in DeviceClaim.Requests. When
+       // this request is part of the FirstOf list, a class is required. Nested FirstOf
+       // requests are not allowed
+       //
+       // Which classes are available depends on the cluster.
        //
        // Administrators may use this to restrict which devices may get
        // requested by only installing classes with selectors for permitted
@@ -405,9 +409,19 @@ type DeviceRequest struct {
        // then administrators can create an empty DeviceClass for users
        // to reference.
        //
-       // +required
+       // +optional
+       // +oneOf=deviceRequestType
        DeviceClassName string
 
+       // FirstOf contains subrequests, exactly one of which must be satisfied
+       // in order to satisfy this request. This field may only be set in the
+       // entries of DeviceClaim.Requests. It must not be set in DeviceRequest
+       // instances that themselves are part of a FirstOf.
+       //
+       // +optional
+       // +oneOf=deviceRequestType
+       FirstOf []DeviceRequest
+
        // Selectors define criteria which must be satisfied by a specific
        // device in order for that device to be considered for this
        // request. All selectors must be satisfied for a device to be
@@ -457,7 +471,8 @@ type DeviceRequest struct {
 }
 
 const (
-       DeviceSelectorsMaxSize = 32
+       DeviceSelectorsMaxSize      = 32
+       FirstOfDeviceRequestMaxSize = 8
 )

//
// +optional
// +oneOf=deviceRequestType
FirstOf []DeviceRequest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this call for a dedicated type that doesn't have this field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're enforcing that in validation. We could do it as a separate type, if that's the consensus.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested reusing the type because ... I am lazy. I expect that there will be code which needs to work with both DeviceRequest and such a new type (for example, validation). Go generics doesn't help here because it doesn't support accessing common fields.

But I agree that a separate type is better because then the OpenAPI spec will not include an inner FirstOf which can never be set.

used to circumvent quota. This reduces the usefulness of the feature, as it
means it will not serve as a quota management feature. However, the primary goal
of the feature is about flexibility across clusters and obtainability of
underlying devices, not quota management.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully kueue can help with that.
cc @kannon92

Consider including folks who also work outside the SIG or subproject.
-->

## Design Details
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs some implementation details for scheduler.

cc @dom4ha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Status: Needs Reviewer
Development

Successfully merging this pull request may close these issues.

9 participants