Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAPD Control Plane machines fail because they have no IP Address available #252

Open
ron1 opened this issue Jan 25, 2024 · 4 comments
Open
Labels
kind/bug Something isn't working priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@ron1
Copy link

ron1 commented Jan 25, 2024

What happened:
CAPD Control Plane machine stuck in Provisioning PHASE fails because it has no IP Address available.

What did you expect to happen:
CAPD Control Plane machine provisions successfully.

How to reproduce it:
Execute the following steps to provision the CAPD cluster:

cat > kind-cluster-with-extramounts.yaml <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: capi-test
nodes:
- role: control-plane
  image: kindest/node:v1.24.15
  extraMounts:
    - hostPath: /var/run/docker.sock
      containerPath: /var/run/docker.sock
EOF

kind create cluster --config kind-cluster-with-extramounts.yaml

clusterctl init --bootstrap rke2 --control-plane rke2 --infrastructure docker

export CABPR_NAMESPACE=example
export CLUSTER_NAME=capd-rke2-test
export CABPR_CP_REPLICAS=1
export CABPR_WK_REPLICAS=1
export KUBERNETES_VERSION=v1.24.15

export YAML_URL=https://raw.githubusercontent.com/rancher-sandbox/cluster-api-provider-rke2/v0.2.3/samples/docker/online-default/rke2-sample.yaml

curl -sL "${YAML_URL}" > rke2-sample.yaml
cat rke2-sample.yaml | clusterctl generate yaml > rke2-docker-example.yaml

kubectl apply -f rke2-docker-example.yaml

Note that the CAPD Control Plane node is stuck in the Provisioning PHASE as shown below:

$ kubectl get machine -A
NAMESPACE   NAME                                 CLUSTER          NODENAME   PROVIDERID   PHASE          AGE   VERSION
example     capd-rke2-test-control-plane-kd59v   capd-rke2-test                           Provisioning   23m   v1.24.15+rke2r1
example     worker-md-0-lt6dw-lqpml              capd-rke2-test                           Pending        23m   v1.24.15
$

Anything else you would like to add:
Note the following errors that are consistently repeated in the rke2controlplane_controller log:

I0125 19:03:19.102640       1 rke2controlplane_controller.go:387]  "msg"="Reconcile RKE2 Control Plane" "RKE2ControlPlane"={"name":"capd-rke2-test-control-plane","namespace":"example"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="capd-rke2-test-control-plane" "namespace"="example" "reconcileID"="ee66ae3a-4fa2-4fc2-a9a4-ea8f6dab1f4c"
E0125 19:03:29.172494       1 rke2controlplane_controller.go:698]  "msg"="Unable to initialize workload cluster" "error"="failed to get API group resources: unable to retrieve the complete list of server APIs: v1: Get \"https://www.xx.y.z:6443/api/v1?timeout=30s\": EOF" "RKE2ControlPlane"={"name":"capd-rke2-test-control-plane","namespace":"example"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="capd-rke2-test-control-plane" "namespace"="example" "reconcileID"="ee66ae3a-4fa2-4fc2-a9a4-ea8f6dab1f4c"
E0125 19:03:29.173086       1 rke2controlplane_controller.go:463]  "msg"="failed to reconcile Control Plane conditions" "error"="failed to get API group resources: unable to retrieve the complete list of server APIs: v1: Get \"https://www.xx.y.z:6443/api/v1?timeout=30s\": EOF" "RKE2ControlPlane"={"name":"capd-rke2-test-control-plane","namespace":"example"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="capd-rke2-test-control-plane" "namespace"="example" "reconcileID"="ee66ae3a-4fa2-4fc2-a9a4-ea8f6dab1f4c"
E0125 19:03:29.195876       1 rke2controlplane_controller.go:153]  "msg"="Failed to update RKE2ControlPlane Status" "error"="some Control Plane machines exist and are ready but they have no IP Address available" "RKE2ControlPlane"={"name":"capd-rke2-test-control-plane","namespace":"example"} "cluster"="capd-rke2-test" "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="capd-rke2-test-control-plane" "namespace"="example" "reconcileID"="ee66ae3a-4fa2-4fc2-a9a4-ea8f6dab1f4c"
E0125 19:03:29.196841       1 controller.go:324]  "msg"="Reconciler error" "error"="[failed to get API group resources: unable to retrieve the complete list of server APIs: v1: Get \"https://www.xx.y.z:6443/api/v1?timeout=30s\": EOF, some Control Plane machines exist and are ready but they have no IP Address available]" "RKE2ControlPlane"={"name":"capd-rke2-test-control-plane","namespace":"example"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="capd-rke2-test-control-plane" "namespace"="example" "reconcileID"="ee66ae3a-4fa2-4fc2-a9a4-ea8f6dab1f4c"

Environment:

  • rke provider version: 0.2.3
  • OS (e.g. from /etc/os-release): RHEL 8.9
@ron1 ron1 added kind/bug Something isn't working needs-priority Indicates an issue or PR needs a priority assigning to it needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 25, 2024
Copy link

This issue is stale because it has been open 90 days with no activity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2024
@ron1
Copy link
Author

ron1 commented Apr 25, 2024

This is still an issue.

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 26, 2024
Copy link

This issue is stale because it has been open 90 days with no activity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 25, 2024
@alexander-demicev alexander-demicev added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates an issue or PR needs a priority assigning to it needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 27, 2024
@alexander-demicev
Copy link
Member

@ron1 Hi, can you try the newer version of CAPD and CAPRKE2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

2 participants