Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods unable to communicate between each other in single-node cluster when node goes offline #10785

Open
NGTOne opened this issue Aug 30, 2024 · 2 comments

Comments

@NGTOne
Copy link

NGTOne commented Aug 30, 2024

Environmental Info:
K3s Version:

k3s version 1.29.5+k3s1 (4e53a323)
go version go1.21.9

Node(s) CPU architecture, OS, and Version:

Linux nvidia-desktop 5.10.104-tegra #1 SMP PREEMPT Tue May 16 10:43:59 CEST 2023 aarch64 aarch64 aarch64 GNU/Linux

Cluster Configuration:
Single server node, running in a "semi-airgapped" configuration. Has both 4G and WiFi capability, but is deployed aboard a vehicle in remote locations, meaning frequent network outages and changes are expected.

Describe the bug:
When no Internet connection is present, network requests between Pods stop working for an unclear reason. Example is gRPC requests: they fail with a timeout rather than succeeding. Connecting the device to a network again causes requests to begin succeeding with no apparent other changes.

Steps To Reproduce:

  • Installed K3s:
ExecStart=/usr/local/bin/k3s \
    server \
	'--tls-san' \
	'10.43.0.1' \
	'--resolv-conf' \
	'/run/systemd/resolve/resolv.conf' \
	'--prefer-bundled-bin' \

Dummy network:

dummy0: flags=195<UP,BROADCAST,RUNNING,NOARP>  mtu 1500
        inet 192.168.255.254  netmask 255.255.255.254  broadcast 0.0.0.0
        inet6 fe80::86e:a8ff:fe8e:91ed  prefixlen 64  scopeid 0x20<link>
        ether 0a:6e:a8:8e:91:ed  txqueuelen 1000  (Ethernet)

ip route output:

default via 192.168.20.1 dev wlan0 proto dhcp metric 600 
default via 192.168.255.254 dev dummy0 metric 50000 
10.42.0.0/24 dev cni0 proto kernel scope link src 10.42.0.1 
169.254.0.0/16 dev dummy0 scope link metric 1000 
192.168.20.0/24 dev wlan0 proto kernel scope link src 192.168.20.82 metric 600 
192.168.255.254/31 dev dummy0 proto kernel scope link src 192.168.255.254 

Expected behavior:
Pods remain able to communicate when device goes offline.

Actual behavior:
Pods are unable to communicate when device goes offline.

Additional context / logs:
Not sure what to provide.

@dereknola
Copy link
Contributor

You are advertising your servers on '--tls-san' '10.43.0.1' , but that's the default cidr for K8s services. This is likely conflicting with your pods communication.

@NGTOne
Copy link
Author

NGTOne commented Sep 2, 2024

I removed --tls-san from the startup arguments. No change. The Pods are still unable to communicate with each other when the device is offline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New
Development

No branches or pull requests

2 participants