Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Ensure SSH connections are closed after each command execution #859

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jfanals
Copy link

@jfanals jfanals commented Jun 24, 2024

This commit addresses an issue where SSH connections were not being properly closed after each command execution, leading to timeout errors during the kamal deploy process.

Problem

When executing commands using the execute method in SSHKit::Runner::Parallel, SSH connections were left open, causing subsequent commands to fail with timeout errors. This issue was particularly evident when running kamal build deliver after kamal registry login during the normal kamal deploy process

Solution

The execute method in the SSHKit::Runner::Parallel::CompleteAll module has been modified to ensure that all SSH connections are closed after each command execution. This is achieved by adding an ensure block to the thread creation logic, which calls SSHKit::Backend::Netssh.pool.close_connections after each command, regardless of whether an exception occurs.

Changes

  • Added an ensure block to the execute method in SSHKit::Runner::Parallel::CompleteAll to close SSH connections after each command execution.

Impact

This change ensures that SSH connections are properly closed, preventing timeout errors and improving the reliability of the kamal deploy process.

Testing

Tested the changes by running kamal deploy commands. Verified that the timeout errors no longer occur and the deployment process completes successfully.

Closes #857

This commit addresses an issue where SSH connections were not being properly closed after each command execution, leading to timeout errors during the `kamal deploy` process.
@jfanals
Copy link
Author

jfanals commented Jun 25, 2024

For some strange reason the test on Ruby 3.1 failed, it looks like it might have been just a glitch in the test procedures as the code should not affect only one particular version of ruby but all versions.

        ensure
          SSHKit::Backend::Netssh.pool.close_connections

Would it be possible to run the tests again?

@djmb
Copy link
Collaborator

djmb commented Jun 25, 2024

I've kicked that off. I'm not sure though about this change. Creating the SSH connections can be expensive especially with large numbers of servers to deploy to.

We configure keepalives with an interval of 30 seconds on the connections, so that generally should stop them from timing out. Maybe you could try reducing the keepalive interval and see if that makes any difference?

@jfanals
Copy link
Author

jfanals commented Jun 25, 2024

Thanks for the suggestion, I did try to decrease the .ssh/config ServerAliveInterval to 10, unfortunately that did not fix the issue.

@ruyrocha
Copy link

👍 on this one, as it's pretty much similar to delano/rye#63 I faced a while ago, and I think delano/rye#38 has more context on this

cc @djmb @jfanals

@djmb
Copy link
Collaborator

djmb commented Jul 16, 2024

@jfanals - we overwrite the keepalive_interval in the Kamal config, so I'm not sure it will pick up that from .ssh/config.

Could you try:

ssh:
  keepalive_interval: 10
  log_level: debug

Setting the log_level to debug might also give us some useful feedback.

@plattenschieber
Copy link

@jfanals - we overwrite the keepalive_interval in the Kamal config, so I'm not sure it will pick up that from .ssh/config.

Could you try:

ssh:
  keepalive_interval: 10
  log_level: debug

Setting the log_level to debug might also give us some useful feedback.

I had a similar problem, which brought me to this issue. Unfortunately setting the keepalive_interval and keepalive parameters is not possible from the configuration file, as they are hardcoded.

❯ bin/kamal config 
  ERROR (Kamal::ConfigurationError): ssh: unknown key: keepalive_interval

The fix for me was to increase the ufw imposed limit of 6 per 30 second.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SSH Connection Timeouts During Full Deployment
4 participants