Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topologymanager: document topology manager policy options #37668

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,9 @@ For a reference to old feature gates that are removed, please refer to
| `TopologyAwareHints` | `true` | Beta | 1.24 | |
| `TopologyManager` | `false` | Alpha | 1.16 | 1.17 |
| `TopologyManager` | `true` | Beta | 1.18 | |
| `TopologyManagerPolicyAlphaOptions` | `false` | Alpha | 1.26 | |
| `TopologyManagerPolicyBetaOptions` | `false` | Beta | 1.26 | |
| `TopologyManagerPolicyOptions` | `false` | Alpha | 1.26 | |
| `UserNamespacesStatelessPodsSupport` | `false` | Alpha | 1.25 | |
| `VolumeCapacityPriority` | `false` | Alpha | 1.21 | - |
| `WinDSR` | `false` | Alpha | 1.14 | |
Expand Down Expand Up @@ -727,6 +730,15 @@ Each feature gate is designed for enabling/disabling a specific feature:
- `TopologyManager`: Enable a mechanism to coordinate fine-grained hardware resource
assignments for different components in Kubernetes. See
[Control Topology Management Policies on a node](/docs/tasks/administer-cluster/topology-manager/).
- `TopologyManagerPolicyAlphaOptions`: Allow fine-tuning of topology manager policies,
experimental, Alpha-quality options.
This feature gate guards *a group* of topology manager options whose quality level is alpha.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the features behind this gate are designed to be a mystery, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I've contribute a near-identical feature for cpumanager). The intention here is AFACT not to hide the specific names nor to confuse the users. However we need to deal with the fact that the set of features guarded by these FGs are designed to rotate. We describe the FG in detail and their maturity level in the component-specific docs: https://github.com/kubernetes/website/blob/main/content/en/docs/tasks/administer-cluster/cpu-management-policies.md#static-policy-options

We can replicate the details here if this is less confusing for users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now the only implemented policy option is prefer-closest-numa-nodes and it is an alpha option. I think the list of options and its stage should be documented in topology-manager specific docs not here. I will add a section which feature-gates has to be enabled to test out this option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

section added, please let me know if it helps 😄

This feature gate will never graduate to beta or stable.
- `TopologyManagerPolicyBetaOptions`: Allow fine-tuning of topology manager policies,
experimental, Beta-quality options.
This feature gate guards *a group* of topology manager options whose quality level is alpha.
This feature gate will never graduate to stable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. We don't want users to know what are behind the gate, and we want them to try them out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific options gated by this feature gate is described in the relevant documentation for the TopologyManager itself, e.g.
content/en/docs/tasks/administer-cluster/topology-manager.md.

There is no precedent for listing out the specific options here (if that is what you are suggesting).

- `TopologyManagerPolicyOptions`: Allow fine-tuning of topology manager policies,
- `UserNamespacesStatelessPodsSupport`: Enable user namespace support for stateless Pods.
- `VolumeCapacityPriority`: Enable support for prioritizing nodes in different
topologies based on available PV capacity.
Expand Down
23 changes: 22 additions & 1 deletion content/en/docs/tasks/administer-cluster/topology-manager.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,28 @@ reschedule the pod. It is recommended to use a Deployment with replicas to trigg
the Pod.An external control loop could be also implemented to trigger a redeployment of pods
that have the `Topology Affinity` error.

### Topology manager policy options

Support for the Topology Manager policy options requires `TopologyManagerPolicyOptions`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to be enabled.

You can toggle groups of options on and off based upon their maturity level using the following feature gates:
* `TopologyManagerPolicyBetaOptions` default disabled. Enable to show beta-level options. Currently there are no beta-level options.
* `TopologyManagerPolicyAlphaOptions` default disabled. Enable to show alpha-level options. You will still have to enable each option using the `TopologyManagerPolicyOptions` kubelet option.

The following policy options exists:
* `prefer-closest-numa-nodes` (alpha, invisible by default, `TopologyManagerPolicyOptions` and `TopologyManagerPolicyAlphaOptions` feature gates have to be enabled)(1.26 or higher)

If the `prefer-closest-numa-nodes` policy option is specified, the `best-effort` and `restricted`
policies will favor sets of NUMA nodes with shorter distance between them when making admission decisions.
You can enable this option by adding `prefer-closest-numa-nodes=true` to the Topology Manager policy options.
By default, without this option, Topology Manager aligns resources on either a single NUMA node or
the minimum number of NUMA nodes (in cases where more than one NUMA node is required). However,
the `TopologyManager` is not aware of NUMA distances and does not take them into account when making admission decisions.
This limitation surfaces in multi-socket, as well as single-socket multi NUMA systems,
and can cause significant performance degradation in latency-critical execution and high-throughput applications if the
Topology Manager decides to align resources on non-adjacent NUMA nodes.

### Pod Interactions with Topology Manager Policies

Consider the containers in the following pod specs:
Expand Down Expand Up @@ -330,4 +352,3 @@ assignments.

2. The scheduler is not topology-aware, so it is possible to be scheduled on a node and then fail
on the node due to the Topology Manager.