Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use TCP keepalive to detect broken sockets #681

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

joeyoravec
Copy link

@joeyoravec joeyoravec commented Apr 25, 2024

fixes #674

vsomeip does not enforce a single socket between endpoints; it does not actively close an existing socket after a new one is established.

It is possible that:

  • socket is established from nodeA to nodeB
  • ethernet gets unplugged
  • nodeA tries to transmit, times out, closes the socket
  • ethernet is reconnected
  • nodeA establishes a new socket to node B

nodeB never received an RST while the cable was unplugged, still thinks the old socket is established, and effectively "leaks" until TCP keepalive detects the break. At default values this takes 2 hr 10 min. This can be mitigated by reducing the keepalive settings.

The implementation of TCP keepalive is different between OS:

  • Linux can configure all 3 settings on a per-socket basis
  • QNX can configure 1 setting per-socket and 2 settings globally

QNX would need to set the global values using sysctl, perhaps at startup.

vsomeip does not enforce a single socket between endpoints; it does not
actively close an existing socket after a new one is established.

It is possible that:
- socket is established from nodeA to nodeB
- ethernet gets unplugged
- nodeA tries to transmit, times out, closes the socket
- ethernet is reconnected
- nodeA establishes a new socket to node B

nodeB never received an RST while the cable was unplugged, still thinks
the old socket is established, and effectively "leaks" until TCP keepalive
detects the break. At default values this takes 2 hr 10 min. This can be
mitigated by reducing the keepalive settings.

The implementation of TCP keepalive is different between OS:
- Linux can configure all 3 settings on a per-socket basis
- QNX can configure 1 setting per-socket and 2 settings globally

QNX would need to set the global values using sysctl, perhaps at startup.
@goncaloalmeida
Copy link
Contributor

@joeyoravec is this ready to review or are you still working on this?

@joeyoravec
Copy link
Author

@joeyoravec is this ready to review or are you still working on this?

I've left this in draft so it's clear that this is not ready for merge. Yes this "works" and is worth reviewing.

if (rc != 0) {
VSOMEIP_WARNING << "tcp_server_endpoint::connect: couldn't enable keep_alive(10)" ;
}
#ifndef __QNX__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @joeyoravec

Thanks for this Pull Request.

One suggestion to maybe avoid "#ifdef platform" and the hardcoded values, the tcp keep alive options could be added as new configurations in the vsomeip json files.
If no configurations would be present, the default from the platform would be used (this would be preferable for linux based systems, where sysctl is used)
If configurations for the keep alive would be present, those would be used instead, for all platforms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: vsomeip leaks TCP sockets
3 participants