pod cluster network does not work when number of worker nodes > 1 #194

doschkinow · 2018-04-10T10:29:43Z

Terraform Version

[pdos@ol7 terraform-kubernetes-installer]$ terraform -v
Terraform v0.11.3

provider.null v1.0.0
provider.oci v2.1.4
provider.random v1.2.0
provider.template v1.0.0
provider.tls v1.1.0

OCI Provider Version

[pdos@ol7 terraform-kubernetes-installer]$ ls -l terraform.d/plugins/linux_amd64/terraform-provider-oci_v2.1.4
-rwxr-xr-x. 1 pdos pdos 28835846 Apr 10 09:33 terraform.d/plugins/linux_amd64/terraform-provider-oci_v2.1.4

Terraform Installer for Kubernetes Version

v.1.3.0

Input Variables

[pdos@ol7 terraform-kubernetes-installer]$ cat terraform.tfvars

OCI authentication

region = "us-ashburn-1"
tenancy_ocid = "ocid1.tenancy.oc1..aaaaaaaa4jaw55rds22u6yaiy5fxt5qxjr2ja4l5fzkv4hci7kwmexv3hpqq"
compartment_ocid = "ocid1.compartment.oc1..aaaaaaaakvhehb5u7nrupwuunhefoedpbegvbnysvz5pdfluxt5wxl5aquwa"
fingerprint = "15:fd:5a:0f:7b:f7:c8:d0:82:f5:20:f8:97:07:42:02"
private_key_path = "/home/pdos/.oci/oci_api_key.pem"
user_ocid = "ocid1.user.oc1..aaaaaaaai3a6zzhjw23wncjhk5ogvjmk4x22zsws6xn4ydmzzlxoo6rthxya"

#tenancy_ocid = "ocid1.tenancy.oc1..aaaaaaaa763cu5f3m7qpzwnvr2shs3o26ftrn7fkgz55cpzgxmglgtui3v7q"
#compartment_ocid = "ocid1.compartment.oc1..aaaaaaaaidy3jl7bdmiwfryo6myhdnujcuug5zxzoclsz7vpfzw4bggng7iq"
#fingerprint = "ed:51:83:3b:d2:04:f4:af:9d:7b:17:96:dd:8a:99:bc"
#private_key_path = "/tmp/oci_api_key.pem"
#user_ocid = "ocid1.user.oc1..aaaaaaaa5fy2l5aki6z2bzff5yrrmlahiif44vzodeetygxmpulq3mbnckya"

CCM user

#cloud_controller_user_ocid = "ocid1.tenancy.oc1..aaaaaaaa763cu5f3m7qpzwnvr2shs3o26ftrn7fkgz55cpzgxmglgtui3v7q"
#cloud_controller_user_fingerprint = "ed:51:83:3b:d2:04:f4:af:9d:7b:17:96:dd:8a:99:bc"
#cloud_controller_user_private_key_path = "/tmp/oci_api_key.pem"

etcdShape = "VM.Standard1.1"
k8sMasterShape = "VM.Standard1.1"
k8sWorkerShape = "VM.Standard2.1"

etcdAd1Count = "0"
etcdAd2Count = "0"
etcdAd3Count = "1"

k8sMasterAd1Count = "0"
k8sMasterAd2Count = "0"
k8sMasterAd3Count = "1"

k8sWorkerAd1Count = "0"
k8sWorkerAd2Count = "1"
k8sWorkerAd3Count = "1"

etcdLBShape = "400Mbps"
k8sMasterLBShape = "400Mbps"
#etcd_ssh_ingress = "10.0.0.0/16"
etcd_ssh_ingress = "0.0.0.0/0"
#etcd_cluster_ingress = "10.0.0.0/16"
master_ssh_ingress = "0.0.0.0/0"
worker_ssh_ingress = "0.0.0.0/0"
master_https_ingress = "0.0.0.0/0"
worker_nodeport_ingress = "0.0.0.0/0"
#worker_nodeport_ingress = "10.0.0.0/16"

control_plane_subnet_access = "public"
k8s_master_lb_access = "public"
#natInstanceShape = "VM.Standard1.2"
#nat_instance_ad1_enabled = "true"
#nat_instance_ad2_enabled = "false"
#nat_instance_ad3_enabled = "true"
#nat_ssh_ingress = "0.0.0.0/0"
public_subnet_http_ingress = "0.0.0.0/0"
public_subnet_https_ingress = "0.0.0.0/0"

#worker_iscsi_volume_create is a bool not a string
#worker_iscsi_volume_create = true
#worker_iscsi_volume_size = 100

#etcd_iscsi_volume_create = true
#etcd_iscsi_volume_size = 50

Description of issue:

pods deployed to a worker node, different to the node where the dns-pod is deployed, are not able to resolve kubernetes.default. This is true even if the worker nodes are in the same availability domain.

Steps to reproduce:

terraform apply (with similar terrafrom.tfvars as above
deploy a busybox pod in the cluster and scale it to 2, so there is a pod on each worker node
note on which node the dns pod is deployed
go inside busybox pod on the same node - here "nslookup kubernetes.default" works
go inside busybox pod on the other worker node - here "nslookup kubernetes.default" does not work

Below is a log of kubectl commands which illustrate this:
[opc@k8s-master-ad3-0 ~]$ kubectl -n kube-system get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
kube-apiserver-k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com 1/1 Running 0 4m 10.0.32.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com
kube-controller-manager-k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com 1/1 Running 0 4m 10.0.32.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com
kube-dns-596797cd48-lghdb 3/3 Running 0 5m 10.99.78.2 k8s-worker-ad3-0.k8sworkerad3.k8sbmcs.oraclevcn.com
kube-proxy-k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com 1/1 Running 0 4m 10.0.32.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com
kube-proxy-k8s-worker-ad3-0.k8sworkerad3.k8sbmcs.oraclevcn.com 1/1 Running 0 3m 10.0.42.3 k8s-worker-ad3-0.k8sworkerad3.k8sbmcs.oraclevcn.com
kube-proxy-k8s-worker-ad3-1.k8sworkerad3.k8sbmcs.oraclevcn.com 1/1 Running 0 3m 10.0.42.2 k8s-worker-ad3-1.k8sworkerad3.k8sbmcs.oraclevcn.com
kube-scheduler-k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com 1/1 Running 0 4m 10.0.32.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com
kubernetes-dashboard-796487df76-d8q7f 1/1 Running 0 5m 10.99.69.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com
oci-cloud-controller-manager-sdqv5 1/1 Running 0 5m 10.0.32.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com
oci-volume-provisioner-66f47d7fcf-ks6pk 1/1 Running 0 5m 10.99.17.2 k8s-worker-ad3-1.k8sworkerad3.k8sbmcs.oraclevcn.com

[opc@k8s-master-ad3-0 ~]$ kubectl scale deployment busybox --replicas=2
deployment "busybox" scaled
[opc@k8s-master-ad3-0 ~]$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
busybox-56b5f5cd9d-6brvj 1/1 Running 0 6s 10.99.78.4 k8s-worker-ad3-0.k8sworkerad3.k8sbmcs.oraclevcn.com
busybox-56b5f5cd9d-lvz4z 1/1 Running 1 13m 10.99.17.5 k8s-worker-ad3-1.k8sworkerad3.k8sbmcs.oraclevcn.com
nginx-7cbc4b4d9c-7z772 1/1 Running 0 15m 10.99.17.3 k8s-worker-ad3-1.k8sworkerad3.k8sbmcs.oraclevcn.com
nginx-7cbc4b4d9c-c6lrj 1/1 Running 0 15m 10.99.78.3 k8s-worker-ad3-0.k8sworkerad3.k8sbmcs.oraclevcn.com
nginx-7cbc4b4d9c-k2kjr 1/1 Running 0 15m 10.99.17.4 k8s-worker-ad3-1.k8sworkerad3.k8sbmcs.oraclevcn.com
[opc@k8s-master-ad3-0 ~]$ kubectl exec -it busybox-56b5f5cd9d-6brvj nslookup kubernetes.default
Server: 10.21.21.21
Address 1: 10.21.21.21 kube-dns.kube-system.svc.cluster.local

Name: kubernetes.default
Address 1: 10.21.0.1 kubernetes.default.svc.cluster.local
[opc@k8s-master-ad3-0 ~]$ kubectl exec -it busybox-56b5f5cd9d-lvz4z nslookup kubernetes.default
Server: 10.21.21.21
Address 1: 10.21.21.21

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pod cluster network does not work when number of worker nodes > 1 #194

pod cluster network does not work when number of worker nodes > 1 #194

doschkinow commented Apr 10, 2018

pod cluster network does not work when number of worker nodes > 1 #194

pod cluster network does not work when number of worker nodes > 1 #194

Comments

doschkinow commented Apr 10, 2018

Terraform Version

OCI Provider Version

Terraform Installer for Kubernetes Version

Input Variables

OCI authentication

CCM user

Description of issue: