Add network topology information to nodes #1025

wwvela · 2024-09-19T00:32:10Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR is to Add network topology information to nodes for Issue #998.

Which issue(s) this PR fixes:
Fixes #998

Does this PR introduce a user-facing change?:
NONE

k8s-ci-robot · 2024-09-19T00:32:12Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2024-09-19T00:32:17Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kishorj for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2024-09-19T00:32:18Z

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2024-09-19T00:32:19Z

Hi @wwvela. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

cartermckinnon · 2024-09-19T04:49:14Z

pkg/providers/v1/instances_v2.go

+		topology, err := c.ec2.DescribeInstanceTopology(topologyRequest)
+		if err != nil || topology == nil {
+			klog.Infof("topology api not supported for instance type: %s or region: %s", instanceType, region)
+			return additionalLabels, nil


you shouldn't return here, you should wrap the following block in an else

As we will use the supportedInstanceType instead of instance prefix now, the error throw our there should not because not supported issue. I throw the error directly here in new version

cartermckinnon · 2024-09-19T04:49:32Z

pkg/providers/v1/instances_v2.go

+	if c.topologySupportedInstance(instanceType) && slices.Contains(c.cfg.Global.TopologySupportedRegions, region) {
+		topologyRequest := &ec2.DescribeInstanceTopologyInput{InstanceIds: []*string{&instanceID}}
+		topology, err := c.ec2.DescribeInstanceTopology(topologyRequest)
+		if err != nil || topology == nil {


we need to check what the err is before we say that the instance isn't supported

I revised the topologySupportedInstance to check the supported instance types instead of the instance prefix, so the DescribeInstanceTopology api should supported here. I will throw error directly

cartermckinnon · 2024-09-19T04:50:06Z

pkg/providers/v1/instances_v2.go

 	return additionalLabels, nil
 }

+// TopologySupportedInstancePrefixes is help to filter instance types supported by topology API
+func (c *Cloud) topologySupportedInstance(instanceType string) bool {


Suggested change

func (c *Cloud) topologySupportedInstance(instanceType string) bool {

func (c *Cloud) isTopologySupportedInstance(instanceType string) bool {

cartermckinnon · 2024-09-19T04:55:07Z

pkg/providers/v1/instances_v2.go

+// TopologySupportedInstancePrefixes is help to filter instance types supported by topology API
+func (c *Cloud) topologySupportedInstance(instanceType string) bool {
+	for _, prefix := range c.cfg.Global.TopologySupportedInstancePrefixes {
+		if strings.HasPrefix(instanceType, prefix) {


I think we probably just want to work with a list of fully-formed instance type names, prefix matching has some ambiguity and configuration footguns ("p4" would match all p4dn instances, for example). The list isn't going to change very often and we don't need the benefit of shorthand IMO

Do we want to be updating the cloud-provider each time there's a new AWS instance type, though?

would it be a bad idea to do some dynamic discover based on an DescribeInstanceTopology result? then cache the instance types' support with a TTL? from what i can tell the api just returns an empty list when the instance-id's type is not supported

I discussed this with Carter before. We don't want to trigger this api for all nodes, because some cx might have hundreds or thousands of nodes in their cluster, if we call this api for all nodes, it very likely trigger throttle issue.

And DescribeInstanceTopology only supported for very limited instances types which lots of cx might not use. check the supported instance type and regions here. https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeInstanceTopology.html

Do we want to be updating the cloud-provider each time there's a new AWS instance type, though?

We don't need to update cloud-provider, TopologySupportedInstanceTypes and TopologySupportedRegions is added in cloud-config where will accept the input in KCP CCM manifest. We only need to update the manifest with the cloud-config input.

cartermckinnon · 2024-09-19T05:02:34Z

pkg/providers/v1/instances_v2.go

+		for index, networkNode := range topology.NetworkNodes {
+			label := LabelNetworkNode + strconv.Itoa(index)
+			additionalLabels[label] = *networkNode
+		}


I'm not clear how you would use this. How would you "query" the labels using a pod's nodeSelector, or a telemetry system, if the index of the networkNode appears in the label key?

after reading through the instance topology docs i think i have an idea how this works?

Assuming the goal is to run workloads that share the same topology, you'd need to know the network nodes that your capacity sits on first, then use that in a nodeSelector for the corresponding layer index. AFAIK there's no way to apply a nodeSelector based on some heuristic like "schedule pods to nodes where the label topology.k8s.aws/network-node-layer-1 matches", so this logic would have to be handled by a high level system. the only guarentee is that there are always at least 3 layers.

However, this scheme breaks down if network nodes can be part of different layers, which i couldn't immediately grok from the docs but i assumed wasn't possible.

this scheme breaks down if network nodes can be part of different layers

yep, if trees with different roots can overlap (share network nodes).

The long form doc is helpful, the API doc doesn’t even say that order is meaningful in the networkNodeSet. 😄

I’m wondering if the “layer” number needs to be in the labels. Is it enough to just say “schedule this pod within this networkNode” instead of “within this networkNode which is the layer 2 networkNode for the instance”?

—-

A primary motivation for this feature was observability, being able to correlate issues with something more granular than AZ. If you want to slice and dice your instance metrics using network nodes, should you care about the layer of the network node?

If we inverted this, so that the label key contains the networkNode and the value is the layer number, that would allow you to use the Exists operator in a nodeAffinity to place the pod within a networkNode, without needing the layer to also match: https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/node-selector-requirement/#NodeSelectorRequirement

yep, if trees with different roots can overlap (share network nodes).

We can assume that's not the case for now.

A primary motivation for this feature was observability, being able to correlate issues with something more granular than AZ

Do you mean the feature to apply arbitrary labels from within the cloud provider? I think observability at the zone-id level is where we started, but I'd like to think we can help with scheduling here too. The next thing I'm thinking is a label that describes whether an instance has a GPU or not, so we don't need to schedule GPU pods (Device Plugin, EFA etc) on nodes that don't need it etc.

Assuming the goal is to run workloads that share the same topology, you'd need to know the network nodes that your capacity sits on first, then use that in a nodeSelector for the corresponding layer index.

Right that's what I was expecting you'd need to do. You don't know the topology until you launch the instance first, so it forces this ordering of sorts where you launch the instance first, you realize what your network nodes are, and you should ideally see a common network node across your instances. I think you could maybe define weighted node affinities across the network node layers so you pack it as close as possible?

cartermckinnon · 2024-09-19T05:05:15Z

/ok-to-test

cartermckinnon · 2024-09-19T05:05:57Z

/kind feature

ndbaker1 · 2024-09-19T07:19:53Z

pkg/providers/v1/aws_ec2.go

+	} else if len(resp.Instances) == 0 {
+		return nil, nil
+	}
+	return resp.Instances[0], err


this wrapper's logic seems strange, shouldn't we return the list of []*ec2.InstanceTopology instead of just the first item?

DescribeInstanceTopology can accept a list of instance IDs, if we pass only 1 instance ID, it will return 1 according topology info something like below.

"Instances": [ { "InstanceId": "i-xxx", "InstanceType": "p5e.48xlarge", "NetworkNodes": [ "nn-xxxxx", "nn-xxxxx", "nn-xxxxx" ], "AvailabilityZone": "us-west-2c", "ZoneId": "usw2-az3" } ]

But yeah, it seems more reasonable to return all instances in api and get the first instance in where trigger it and get result. Will revised in next

agreed, because we still allow the full ec2.DescribeInstanceTopologyInput payload which could specify more than one instance id

ndbaker1 · 2024-09-19T07:37:25Z

pkg/providers/v1/well_known_labels.go

@@ -20,4 +20,5 @@ const (
 	// LabelZoneID is a topology label that can be applied to any resource
 	// but will be initially applied to nodes.
 	LabelZoneID = "topology.k8s.aws/zone-id"
+	LabelNetworkNode = "topology.k8s.aws/network-node-"


Suggested change

LabelNetworkNode = "topology.k8s.aws/network-node-"

LabelNetworkNode = "topology.k8s.aws/network-node-layer-"

IIUC, the info you're embedding into the key is about the layer of the network node in the node set

revised in next version

ndbaker1 · 2024-09-19T07:47:30Z

pkg/providers/v1/instances_v2.go

+		for index, networkNode := range topology.NetworkNodes {
+			label := LabelNetworkNode + strconv.Itoa(index)
+			additionalLabels[label] = *networkNode
+		}


after reading through the instance topology docs i think i have an idea how this works?

Assuming the goal is to run workloads that share the same topology, you'd need to know the network nodes that your capacity sits on first, then use that in a nodeSelector for the corresponding layer index. AFAIK there's no way to apply a nodeSelector based on some heuristic like "schedule pods to nodes where the label topology.k8s.aws/network-node-layer-1 matches", so this logic would have to be handled by a high level system. the only guarentee is that there are always at least 3 layers.

However, this scheme breaks down if network nodes can be part of different layers, which i couldn't immediately grok from the docs but i assumed wasn't possible.

k8s-ci-robot · 2024-09-19T19:37:53Z

@wwvela: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-cloud-provider-aws-check	`9e72a88`	link	true	`/test pull-cloud-provider-aws-check`
pull-cloud-provider-aws-test	`9e72a88`	link	true	`/test pull-cloud-provider-aws-test`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

add topology labels

374e420

k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 19, 2024

k8s-ci-robot requested review from kmala and mmerkes September 19, 2024 00:32

k8s-ci-robot added the needs-kind Indicates a PR lacks a `kind/foo` label and requires one. label Sep 19, 2024

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 19, 2024

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 19, 2024

cartermckinnon reviewed Sep 19, 2024

View reviewed changes

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 19, 2024

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Sep 19, 2024

ndbaker1 reviewed Sep 19, 2024

View reviewed changes

Add network topology information to nodes

9e72a88

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add network topology information to nodes #1025

Add network topology information to nodes #1025

wwvela commented Sep 19, 2024 •

edited

Loading

k8s-ci-robot commented Sep 19, 2024

k8s-ci-robot commented Sep 19, 2024

k8s-ci-robot commented Sep 19, 2024

k8s-ci-robot commented Sep 19, 2024

cartermckinnon Sep 19, 2024

wwvela Sep 19, 2024

cartermckinnon Sep 19, 2024

wwvela Sep 19, 2024

cartermckinnon Sep 19, 2024

wwvela Sep 19, 2024

cartermckinnon Sep 19, 2024

suket22 Sep 19, 2024

ndbaker1 Sep 19, 2024 •

edited

Loading

wwvela Sep 19, 2024

wwvela Sep 19, 2024

cartermckinnon Sep 19, 2024 •

edited

Loading

ndbaker1 Sep 19, 2024

cartermckinnon Sep 19, 2024

cartermckinnon Sep 19, 2024

suket22 Sep 19, 2024

cartermckinnon commented Sep 19, 2024

cartermckinnon commented Sep 19, 2024

ndbaker1 Sep 19, 2024

wwvela Sep 19, 2024

ndbaker1 Sep 19, 2024

wwvela Sep 19, 2024

ndbaker1 Sep 19, 2024 •

edited

Loading

wwvela Sep 19, 2024

ndbaker1 Sep 19, 2024

k8s-ci-robot commented Sep 19, 2024

	func (c *Cloud) topologySupportedInstance(instanceType string) bool {
	func (c *Cloud) isTopologySupportedInstance(instanceType string) bool {

	LabelNetworkNode = "topology.k8s.aws/network-node-"
	LabelNetworkNode = "topology.k8s.aws/network-node-layer-"

Add network topology information to nodes #1025

Are you sure you want to change the base?

Add network topology information to nodes #1025

Conversation

wwvela commented Sep 19, 2024 • edited Loading

k8s-ci-robot commented Sep 19, 2024

k8s-ci-robot commented Sep 19, 2024

k8s-ci-robot commented Sep 19, 2024

k8s-ci-robot commented Sep 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndbaker1 Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cartermckinnon Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cartermckinnon commented Sep 19, 2024

cartermckinnon commented Sep 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndbaker1 Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Sep 19, 2024

wwvela commented Sep 19, 2024 •

edited

Loading

ndbaker1 Sep 19, 2024 •

edited

Loading

cartermckinnon Sep 19, 2024 •

edited

Loading

ndbaker1 Sep 19, 2024 •

edited

Loading