Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAPG: Upstream CCM manifest doesn't work #666

Open
jayesh-srivastava opened this issue Apr 19, 2024 · 11 comments
Open

CAPG: Upstream CCM manifest doesn't work #666

jayesh-srivastava opened this issue Apr 19, 2024 · 11 comments
Labels
kind/support Categorizes issue or PR as a support question. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@jayesh-srivastava
Copy link
Member

Tried deploying CCM in a CAPG cluster and used the provided CCM manifest from (https://github.com/kubernetes/cloud-provider-gcp/blob/master/deploy/packages/default/manifest.yaml).
The CCM pod is stuck in CrashLoopBack with this error:

unable to load configmap based request-header-client-ca-file: Get "https://127.0.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication": dial tcp 127.0.0.1:443: connect: connection refused
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If the repository mantainers determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 19, 2024
@mcbenjemaa
Copy link
Member

Please use:

  command: ['/usr/local/bin/cloud-controller-manager']
  args:
  - --cloud-provider=gce
  - --leader-elect=true
  - --use-service-account-credentials

and remove the env.

@mcbenjemaa
Copy link
Member

/kind support

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Apr 26, 2024
@jayesh-srivastava
Copy link
Member Author

Hi @mcbenjemaa , Thanks for the help. CCM pod is up now with these args

  - args:
    - --cloud-provider=gce
    - --leader-elect=true
    - --use-service-account-credentials
    - --allocate-node-cidrs=true
    - --cluster-cidr=192.168.0.0/16
    - --configure-cloud-routes=false

One more doubt, I see the cloud-controller-manager image being used is k8scloudprovidergcp/cloud-controller-manager:latest . How can I use k8s version specific images for ccm?

@BenTheElder
Copy link
Member

You may have to build the image while the release process is being revampled, there are instructions in the README.

The :latest tag is aimed at CI / testing of the project itself I think.

/retitle CAPG: Upstream CCM manifest doesn't work

I don't think the manifest is necessarily meant to work with CAPG, I would expect CAPG to handle deploying everything?

Otherwise this may be in scope for #686

@k8s-ci-robot k8s-ci-robot changed the title Upstream CCM manifest doesn't work CAPG: Upstream CCM manifest doesn't work May 7, 2024
@mcbenjemaa
Copy link
Member

Self deployed CCM:
i got this error:

message="Error syncing load balancer: failed to ensure load balancer: instance not found"

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 12, 2024
@esierra-stratio
Copy link

Something similar here, i'm trying to deploy the Cloud Controller Manager (CCM) and I'm encountering the following error:

I0823 08:10:42.838284       1 node_controller.go:391] Initializing node minplus0-md-2-vbvmr-856l7 with cloud provider
I0823 08:10:42.920926       1 gen.go:15649] GCEInstances.Get(context.Background.WithDeadline(2024-08-23 09:10:42.83965981 +0000 UTC m=+3629.567729051 [59m59.918720336s]), Key{"minplus0-md-2-vbvmr-856l7", zone: "europe-west4-b"}) = <nil>, googleapi: Error 404: The resource 'projects/clusterapi-369611/zones/europe-west4-b/instances/minplus0-md-2-vbvmr-856l7' was not found, notFound
E0823 08:10:42.921062       1 node_controller.go:213] error syncing 'minplus0-md-2-vbvmr-856l7': failed to get instance metadata for node minplus0-md-2-vbvmr-856l7: failed to get instance ID from cloud provider: instance not found, requeuing

I don't understand why CCM is adding the label zone as:

I0823 08:10:41.974944       1 node_controller.go:493] Adding node label from cloud provider: beta.kubernetes.io/instance-type=n2-standard-2
I0823 08:10:41.974950       1 node_controller.go:494] Adding node label from cloud provider: node.kubernetes.io/instance-type=n2-standard-2
I0823 08:10:41.974954       1 node_controller.go:505] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/zone=europe-west4-b
I0823 08:10:41.974958       1 node_controller.go:506] Adding node label from cloud provider: topology.kubernetes.io/zone=europe-west4-b
I0823 08:10:41.974963       1 node_controller.go:516] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/region=europe-west4
I0823 08:10:41.974968       1 node_controller.go:517] Adding node label from cloud provider: topology.kubernetes.io/region=europe-west4

The correct zone should be gce://clusterapi-369611/europe-west4-c/minplus0-md-2-vbvmr-856l7.
This is how I'm deploying CCM:

        - name: cloud-controller-manager
          image: k8scloudprovidergcp/cloud-controller-manager:latest
          imagePullPolicy: IfNotPresent
          # ko puts it somewhere else... command: ['/usr/local/bin/cloud-controller-manager']
          command: ['/usr/local/bin/cloud-controller-manager']
          args:
            - --cloud-provider=gce  # Add your own cloud provider here!
            - --leader-elect=true
            - --use-service-account-credentials
            # these flags will vary for every cloud provider
            - --allocate-node-cidrs=true
            - --configure-cloud-routes=true
            - --cluster-cidr=192.168.0.0/16
            - --v=4
          livenessProbe:
            failureThreshold: 3
            httpGet:
              host: 127.0.0.1
              path: /healthz
              port: 10258
              scheme: HTTPS
            initialDelaySeconds: 15
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 15
          resources:
            requests:
              cpu: "200m"
          volumeMounts:
            - mountPath: /etc/kubernetes/cloud.config
              name: cloudconfig
              readOnly: true
      hostNetwork: true
      priorityClassName: system-cluster-critical
      volumes:
        - hostPath:
            path: /etc/kubernetes/cloud.config
            type: ""
          name: cloudconfig

@aojea
Copy link
Member

aojea commented Aug 23, 2024

The correct zone should be gce://clusterapi-369611/europe-west4-c/minplus0-md-2-vbvmr-856l7.

what do you mean by correct zone there?

the instance url is https://www.googleapis.com/compute/v1/projects/{PROJECT}/zones/{ZONE}/instances/{VM_INSTANCE}

that is the providerId, isn't it?

@esierra-stratio
Copy link

esierra-stratio commented Aug 26, 2024

The issue is that the GCEInstances.Get function constructs the provider ID with the wrong zone. It assumes the zone must match where the master CCM is deployed (in this case, europe-west4-b), instead of the correct one, which is europe-west4-c. That's why the CCM couldn't find the instance.

Is there any way to make the CCM check every single zone? Maybe a multizone option or something similar?

@esierra-stratio
Copy link

esierra-stratio commented Aug 26, 2024

Solved!

          args:
            - --cloud-provider=gce  # Add your own cloud provider here!
            - --leader-elect=true
            - --use-service-account-credentials
            # these flags will vary for every cloud provider
            - --allocate-node-cidrs=true
            - --cluster-cidr=192.168.0.0/16
            - --v=4
            - --cloud-config=/etc/kubernetes/gce.conf
          livenessProbe:
            failureThreshold: 3
            httpGet:
              host: 127.0.0.1
              path: /healthz
              port: 10258
              scheme: HTTPS
            initialDelaySeconds: 15
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 15
          resources:
            requests:
              cpu: "200m"
          volumeMounts:
            - mountPath: /etc/kubernetes/gce.conf
              name: cloudconfig
              readOnly: true
      hostNetwork: true
      priorityClassName: system-cluster-critical
      volumes:
        - hostPath:
            path: /etc/kubernetes/gce.conf
            type: FileOrCreate
          name: cloudconfig

where gce.conf:

[Global]
multizone=true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

7 participants