Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cni migration and the current docs state #16766

Open
rstoermer opened this issue Aug 22, 2024 · 2 comments
Open

cni migration and the current docs state #16766

rstoermer opened this issue Aug 22, 2024 · 2 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@rstoermer
Copy link

rstoermer commented Aug 22, 2024

/kind feature

Already checked #1071 and a few other issues.

It would be great if Kops handles changes of a CNI in a safe way. The docs are not quite clear which options are safe to do and what kind of disruption is to be expected: https://github.com/kubernetes/kops/blob/master/docs/networking.md#switching-between-networking-providers. So either the docs need to be updated or the feature really is missing and needs to be implemented. In case kops is not handling the migration automatically, it would at least be great to add a table of common migration paths with pre-migration, migration and post-migration steps.

In the current case I want to migrate from a vanilla Calico installation:

  networking:
    calico: {}

to

  networking:
    cilium: {}

Of course, following the docs, I could deactivate the CNI managed by kOps altogether and install a vanilla Cilium doing

  networking:
    cni: {}

but I would have to manually ensure that Cilium requirements are met by the kOps created cluster.

It would be so much better to keep the CNI management to kOps, especially as Cilium is now the default CNI. I am especially unclear about the meaning of the following in the docs "It is also possible to switch between CNI providers, but this usually is a disruptive change. kOps will also not clean up any resources left behind by the previous CNI, including the CNI daemonset." Does disruption mean a short downtime but the change is usually safe? Will kOps in an already existing cluster handle all needed config changes, like mounting epbf or populate etcd certficates, if selecting a kv store? Or do I have to do those configs myself, as this is only done when first creating a cluster?

For my current case: If Calico to Cilium migration is not safe, but I still want to keep the cni management in kops: Would it also be possible to follow the Cilium migration guide (https://docs.cilium.io/en/latest/installation/k8s-install-migration.html) node-by-node (tried this out already on a non kOps cluster)? What would happen if I then set "networking: cilium: {}"? Would a second Cilium version be deployed or could I deploy my first installation in a way understood and recognized by kOps?

If somebody can give me a hint on the current state I am also more than happy to do a PR for the docs after my migration :)

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 22, 2024
@rstoermer rstoermer changed the title kOps cni migration and the current docs state cni migration and the current docs state Aug 22, 2024
@hakman
Copy link
Member

hakman commented Aug 23, 2024

Hi @rstoermer! CNI migration is difficult and would require complex orchestration to happen without downtime.
At the moment thee is no support for any kind of CNI migration in kOps.
That being said, you can try to switch to "cni", delete the Calico components and enable Cilium, but you need to do a rolling-update with cloudonly to clean evertything. The will be downtime and maybe some unexpected surprises, so I suggest to try first on some test cluster(s).

@rstoermer
Copy link
Author

Hi @rstoermer! CNI migration is difficult and would require complex orchestration to happen without downtime. At the moment thee is no support for any kind of CNI migration in kOps. That being said, you can try to switch to "cni", delete the Calico components and enable Cilium, but you need to do a rolling-update with cloudonly to clean evertything. The will be downtime and maybe some unexpected surprises, so I suggest to try first on some test cluster(s).

Hi @hakman, alright, thats what I suspected. Thanks for replying so quickly! I will see what I learn and update the issue then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants