Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AmazonVPC CNI broken in Kops #16734

Open
lukasmrtvy opened this issue Aug 4, 2024 · 0 comments
Open

AmazonVPC CNI broken in Kops #16734

lukasmrtvy opened this issue Aug 4, 2024 · 0 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@lukasmrtvy
Copy link

/kind bug

1. What kops version are you running? The command kops version, will display
this information.

1.29.2

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

1.29.7

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

export AWS_ACCESS_KEY_ID=XXXX
export AWS_SECRET_ACCESS_KEY=XXXX
export AWS_REGION=eu-central-1
export KOPS_STATE_STORE=s3://example-kops-state-store

kops create -f kops.yaml
kops update cluster --name test.example.com --yes --admin

5. What happened after the commands executed?

Cluster is created, CNI is broken ( AmazonVPC ), Pod->Pod, Pod->Service fails to i/o timeout
Running mixed topology, Control Plane, Workloads in private subnet, Gateway nodes in public subnet

NAMESPACE     NAME                                            READY   STATUS             RESTARTS        AGE
default       test678aa                                       1/1     Running            0               6m7s
kube-system   aws-cloud-controller-manager-2sk5s              1/1     Running            0               11m
kube-system   aws-node-8jw57                                  2/2     Running            0               8m49s
kube-system   aws-node-nftng                                  2/2     Running            0               9m17s
kube-system   aws-node-phkdf                                  2/2     Running            0               11m
kube-system   aws-node-termination-handler-5b988d67cd-2hjlb   0/1     CrashLoopBackOff   6 (45s ago)     11m
kube-system   coredns-78ccb5b8c5-4rq4c                        1/1     Running            0               8m16s
kube-system   coredns-78ccb5b8c5-gmzx5                        0/1     Running            4 (60s ago)     11m
kube-system   coredns-autoscaler-55c99b49b7-pffqc             1/1     Running            0               11m
kube-system   ebs-csi-controller-65676964b6-7vx7d             5/6     CrashLoopBackOff   9 (15s ago)     11m
kube-system   ebs-csi-node-4ldz5                              3/3     Running            7 (2m51s ago)   10m
kube-system   ebs-csi-node-5rl4v                              2/3     CrashLoopBackOff   6 (50s ago)     9m17s
kube-system   ebs-csi-node-zddbk                              2/3     CrashLoopBackOff   6 (41s ago)     8m49s
kube-system   etcd-manager-events-i-0fe4d8007f51c493b         1/1     Running            0               10m
kube-system   etcd-manager-main-i-0fe4d8007f51c493b           1/1     Running            0               9m44s
kube-system   kops-controller-7cbpn                           1/1     Running            0               11m
kube-system   kube-apiserver-i-0fe4d8007f51c493b              2/2     Running            2 (11m ago)     10m
kube-system   kube-controller-manager-i-0fe4d8007f51c493b     1/1     Running            3 (11m ago)     10m
kube-system   kube-proxy-i-001e89332beaa4ab7                  1/1     Running            0               9m17s
kube-system   kube-proxy-i-0182e11c841a4f31b                  1/1     Running            0               8m49s
kube-system   kube-proxy-i-0fe4d8007f51c493b                  1/1     Running            0               10m
kube-system   kube-scheduler-i-0fe4d8007f51c493b              1/1     Running            0               10m

6. What did you expect to happen?

CNI is working

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  name: test.example.com
spec:
  api:
    loadBalancer:
      class: Network
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://example-kops-state-store/test.example.com
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-eu-central-1a
      name: a
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-eu-central-1a
      name: a
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
    useServiceAccountExternalPermissions: true
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  - ::/0
  kubernetesVersion: 1.29.7
  networkCIDR: 172.20.0.0/16
  networking:
    amazonvpc: {}
  nonMasqueradeCIDR: 172.20.0.0/16
  serviceAccountIssuerDiscovery:
    discoveryStore: s3://example-kops-oidc-store/test.example.com/discovery/test.example.com
    enableAWSOIDCProvider: true
  sshAccess:
  - 0.0.0.0/0
  - ::/0
  subnets:
  - cidr: 172.20.0.0/19
    name: eu-central-1a-public
    type: Public
    zone: eu-central-1a
  - cidr: 172.20.32.0/19
    name: eu-central-1b-public
    type: Public
    zone: eu-central-1b
  - cidr: 172.20.64.0/19
    name: eu-central-1a-private
    type: Private
    zone: eu-central-1a
  - cidr: 172.20.96.0/19
    name: eu-central-1b-private
    type: Private
    zone: eu-central-1b
  - cidr: 172.20.128.0/19
    name: eu-central-1a-Utility
    type: Utility
    zone: eu-central-1a
  - cidr: 172.20.160.0/19
    name: eu-central-1b-Utility
    type: Utility
    zone: eu-central-1b
  topology:
    dns:
      type: None
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: test.example.com
  name: control-plane-eu-central-1a
spec:
  image: 137112412989/al2023-ami-2023.5.20240722.0-kernel-6.1-arm64
  machineType: t4g.medium
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - eu-central-1a-private
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: test.example.com
  name: workload-eu-central-1a
spec:
  image: 137112412989/al2023-ami-2023.5.20240722.0-kernel-6.1-x86_64
  machineType: t3a.xlarge
  maxSize: 3
  minSize: 1
  role: Node
  subnets:
  - eu-central-1a-private
  nodeLabels:
    role: "workload"
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: test.example.com
  name: gateway-eu-central-1a
spec:
  image: 137112412989/al2023-ami-2023.5.20240722.0-kernel-6.1-x86_64
  machineType: t3a.xlarge
  maxSize: 3
  minSize: 1
  role: Node
  subnets:
  - eu-central-1a-public
  nodeLabels:
    role: "gateway"
  taints:
  - node.com/type=gateway:NoSchedule

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants