Skip to content
This repository has been archived by the owner on Mar 31, 2023. It is now read-only.

If 'kubeadm reset' fails to remove an etcd member, then that node will not recover automatically #284

Open
bboreham opened this issue Jul 30, 2020 · 0 comments
Labels
chore Related to fix/refinement/improvement of end user or new/existing developer functionality

Comments

@bboreham
Copy link
Contributor

Symptom is kubeadm repeatedly failing like this:

time="2020-07-28T02:45:41Z" level=info msg=Applying resource="kubeadm:join"
time="2020-07-28T02:45:41Z" level=info msg="joining Kubernetes cluster"
time="2020-07-28T02:45:41Z" level=debug msg="running command: ..."
[preflight] Running pre-flight checks
...
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
error execution phase check-etcd: etcd cluster is not healthy: dial tcp 172.31.71.248:2379: connect: connection refused
time="2020-07-28T02:45:50Z" level=error msg="failed to join cluster" stdouterr="..."

The reason it is failing is that etcd thinks it has three members, but only two of them are alive, and the missing one was running on this node until wks-controller shut it down via kubeadm reset.

kubeadm does have code to remove from etcd, but it seems on this occasion it failed (might have been because we had a problem earlier and the kubeadm certs expired)

time="2020-07-28T02:38:46Z" level=debug msg="running command: sudo -n -- sh -c 'kubeadm reset --force'"
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
W0728 02:38:46.733486   22923 reset.go:73] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get node registration: failed to get node name from kubelet config: open /etc/kubernetes/kubelet.conf: no such file or directory
W0728 02:38:46.733675   22923 reset.go:234] [reset] No kubeadm config, using etcd pod spec to get data directory
[preflight] Running pre-flight checks
[reset] No etcd config found. Assuming external etcd
[reset] Please manually reset etcd to prevent further issues
@bboreham bboreham added the chore Related to fix/refinement/improvement of end user or new/existing developer functionality label Jul 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
chore Related to fix/refinement/improvement of end user or new/existing developer functionality
Projects
None yet
Development

No branches or pull requests

1 participant