Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Expose NodePorts on cluster network instead of 0.0.0.0/0 #2128

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

huxcrux
Copy link
Contributor

@huxcrux huxcrux commented Jun 18, 2024

What this PR does / why we need it:

This PR changes NodePorts to only be open from cluster network.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2127

Special notes for your reviewer:

I assume this is something that would be in 0.11 or 1.0? If so where do we put the changelog?

I also need to make sure what IP ranged we potentially want to use since there is multiple ways of specifying them depending on of they are operator managed or not.

TODOs:

  • squashed commits
  • if necessary:
    • includes documentation
    • adds unit tests

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 18, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign cecilerobertmichon for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 18, 2024
Copy link

netlify bot commented Jun 18, 2024

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit 8230a26
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/66e00e8b3dd9ab00083552aa
😎 Deploy Preview https://deploy-preview-2128--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 19, 2024
Comment on lines 207 to 211
if openStackCluster.Spec.ManagedSubnets != nil && len(openStackCluster.Spec.ManagedSubnets) > 0 {
SubnetCIDR = openStackCluster.Spec.ManagedSubnets[0].CIDR
} else {
SubnetCIDR = "0.0.0.0/0"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not use remote group for this? This won't work on deployments which aren't using managed subnets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just changed the behaviour to make use of status.Network.Subnet instead to support use cases where bring your own network/subnet is used.

I do not think we could use remote group due to load balancers talking over nodeports?
Load Balancers are created under the "service" project in default openstack meaning it is not added to any security groups inside the project where the cluster is running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also left a question in the code that may be good to bring up here.

If no IPv4 subnet is found the code currently defaults to 0.0.0.0/0 to preserve current behaviour.

We could however return an error if no IPv4 subnet is found or potentially fallback to just allowing nodes by using remote groups (this will however break octavia load balancers created using the cloud provider which feels kinda bad).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other places we just add the rule twice with 2 remote groups, one for the control plane and one for workers. This covers machines even when they aren't using the default cluster network. Is there a reason we can't do that here?

I'd like to reduce our reliance on the cluster's status.Network over time. There are new, interesting deployments that it doesn't play nicely with, e.g. external control planes and spine/leaf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other places we just add the rule twice with 2 remote groups, one for the control plane and one for workers. This covers machines even when they aren't using the default cluster network. Is there a reason we can't do that here?

I'd like to reduce our reliance on the cluster's status.Network over time. There are new, interesting deployments that it doesn't play nicely with, e.g. external control planes and spine/leaf.

It would break load balancers deployed using the cloud provider (octavia) since they are not in any security group we control (know the ID of). I think it is a really common use case and support out of the box would be nice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to follow up on this.

We could add security groups meaning all cluster nodes can contact each other and it will make the conformance test pass. However it would cause LBs created with the cloud provider to fail. I'm not at this point of any way of allowing LBs outside allowing IPs directly towards nodes.

I also think having a default where LBs "work" is preferable to have a default that works "out of the box".

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 27, 2024
@jichenjc
Copy link
Contributor

jichenjc commented Jul 3, 2024

/test pull-cluster-api-provider-openstack-test

Copy link

linux-foundation-easycla bot commented Sep 10, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Sep 10, 2024
@mikaelgron
Copy link

/test pull-cluster-api-provider-openstack-test

@@ -169,6 +170,7 @@ func (s *Service) generateDesiredSecGroups(openStackCluster *infrav1.OpenStackCl
var secControlPlaneGroupID string
var secWorkerGroupID string
var secBastionGroupID string
var SubnetCIDR string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var SubnetCIDR string
var subnetCIDR string

}
}

// Allow all traffic, including from outside the cluster, to access node port services.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Allow all traffic, including from outside the cluster, to access node port services.
// Allow all traffic from a specific CIDR to access node port services.

}

// Allow all traffic, including from outside the cluster, to access node port services.
func getSGWorkerNodePortCidr(cidr string) []resolvedSecurityGroupRuleSpec {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func getSGWorkerNodePortCidr(cidr string) []resolvedSecurityGroupRuleSpec {
func getSGWorkerNodePortCIDR(cidr string) []resolvedSecurityGroupRuleSpec {

workerRules = append(workerRules, getSGWorkerNodePort()...)

// Fetch subnet to use for worker node port rules
// In the future IPv6 support need to be added here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPv4-only is icky, but everything else is still IPv4 only and it's not worth fixing it piecemeal so this is fine.


// Fetch subnet to use for worker node port rules
// In the future IPv6 support need to be added here
if openStackCluster.Status.Network != nil && len(openStackCluster.Status.Network.Subnets) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating over a nil or empty list is safe and idiomatic. The loop below performs this check implicitly anyway.

Suggested change
if openStackCluster.Status.Network != nil && len(openStackCluster.Status.Network.Subnets) > 0 {
if openStackCluster.Status.Network != nil {

// In the future IPv6 support need to be added here
if openStackCluster.Status.Network != nil && len(openStackCluster.Status.Network.Subnets) > 0 {
for _, subnet := range openStackCluster.Status.Network.Subnets {
// Check uf subnet.CIDR is ipv4 subnet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Check uf subnet.CIDR is ipv4 subnet
// Check if subnet.CIDR is ipv4 subnet

Comment on lines +211 to +218
_, ipnet, err := net.ParseCIDR(subnet.CIDR)
if err != nil {
return nil, fmt.Errorf("invalid subnet found during security groups reconcile: %v", err)
}
if ipnet.IP.To4() != nil {
SubnetCIDR = subnet.CIDR
break
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole block can be replaced with netutils.IsIPv4CIDRString(subnet.CIDR)

}
if SubnetCIDR != "" {
// If SubnetCIDR is found we allow tcp and udp traffic from said SubnetCIDR, this in order to allow octavia Loadbalancers created by CCM to function properly
workerRules = append(workerRules, getSGWorkerNodePortCidr(SubnetCIDR)...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could move this inside the block above and also remove the break so you simply add any listed IPv4 subnet. Any other restrictions we have on subnets aren't relevant here, so might as well make it handle whatever's there.

@mdbooth
Copy link
Contributor

mdbooth commented Sep 26, 2024

Please can you also add a code comment in there about why we're explicitly adding this subnet? Namely because this is the default subnet used by CPO as the origin of traffic from Octavia load balancers to load balancer members? Otherwise it risks being deleted by somebody because it looks redundant, even me in 6 months when I've forgotten this discussion again 🙄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Inbox
Development

Successfully merging this pull request may close these issues.

Nodeports open from 0.0.0.0/0 by default, causing unexpected behaviour together with Floating IPs
5 participants