Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #666 handle SIGTERM #712

Open
wants to merge 7 commits into
base: rolling
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 13 additions & 23 deletions launch/launch/launch_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
import collections.abc
import contextlib
import logging
import platform
import signal
import threading
import traceback
Expand Down Expand Up @@ -185,37 +184,28 @@ def _prepare_run_loop(self):
this_task = asyncio.Task.current_task(this_loop)

self.__this_task = this_task
# Setup custom signal handlers for SIGINT, SIGTERM and maybe SIGQUIT.
sigint_received = False

def _on_sigint(signum):
nonlocal sigint_received
base_msg = 'user interrupted with ctrl-c (SIGINT)'
if not sigint_received:
# Setup custom signal handlers for SIGINT & SIGTERM
signals_received = set()

def _on_signal(signum):
nonlocal signals_received
signal_name = signal.Signals(signum).name
base_msg = 'caught ' + signal_name
if signal_name not in signals_received:
signals_received.add(signal_name)
due_to_sigint = True if signal_name == 'SIGINT' else False
self.__logger.warning(base_msg)
ret = self._shutdown(
reason='ctrl-c (SIGINT)', due_to_sigint=True, force_sync=True
reason=base_msg, due_to_sigint=due_to_sigint, force_sync=True
Copy link
Contributor

@hidmic hidmic May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ciandonovan meta: sending a shutdown event is not the same as pulling the brakes on the event loop, but I can see how the previous implementation can cause problems when managing launch itself. Most process managers use SIGTERM. I guess you could argue that SIGKILL escalation is appropriate if a shutdown isn't enough.

)
assert ret is None, ret
sigint_received = True
else:
self.__logger.warning('{} again, ignoring...'.format(base_msg))

def _on_sigterm(signum):
signame = signal.Signals(signum).name
self.__logger.error(
'user interrupted with ctrl-\\ ({}), terminating...'.format(signame))
# TODO(wjwwood): try to terminate running subprocesses before exiting.
self.__logger.error('using {} can result in orphaned processes'.format(signame))
self.__logger.error('make sure no processes launched are still running')
this_loop.call_soon(this_task.cancel)

with AsyncSafeSignalManager(this_loop) as manager:
# Setup signal handlers
manager.handle(signal.SIGINT, _on_sigint)
manager.handle(signal.SIGTERM, _on_sigterm)
if platform.system() != 'Windows':
manager.handle(signal.SIGQUIT, _on_sigterm)
manager.handle(signal.SIGINT, _on_signal)
manager.handle(signal.SIGTERM, _on_signal)
# Yield asyncio loop and current task.
yield this_loop, this_task
finally:
Expand Down
1 change: 1 addition & 0 deletions launch/launch/utilities/signal_management.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,7 @@ def handle(
:return: previous handler if any, otherwise None
"""
signum = signal.Signals(signum)
signal.signal(signum, signal.default_int_handler)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ciandonovan thanks for catching this! I dropped the ball here, my bad.

That said, I don't think this is the complete fix. We should only override default signal handlers (chaining non-default ones), and we should restore whatever was in place on AsyncSafeSignalManager exit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I'm not overriding any default handlers, as SIGINT is already handled by signal.default_int_handler, and SIGTERM isn't handled at all, so I'm just setting it to be handled by the same function.

Ideally both would be handled by some global _noop function that just passes, letting signal.set_wakeup_fd pick it up and do its job. Not sure the best way to implement that though. There was a _noop function in test_signal_management.py previously, but the assert fails if a signal handler was installed for SIGTERM as it's pointed to a different handler address.

Copy link
Contributor

@hidmic hidmic May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I'm not overriding any default handlers, as SIGINT is already handled by signal.default_int_handler, and SIGTERM isn't handled at all, so I'm just setting it to be handled by the same function.

Right, but we don't really know which signal number we will be required to handle, and even if we did (or assumed so or constrained it to be such and such), there is also the possibility that some other library, used by launch or using launch, may have set their own as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what should we do here? Is it ok to just override any existing signal handler with when this is called? How does this interact with signal.set_wakeup_fd? What if someone sets their own signal handler after initializing launch, will our handling still function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only existing signal handler at this point is the one the Python binary sets itself for SIGINT to generate the KeyboardInterrupt exception, which I don't change as I point all handlers towards that.
The only other possibility is that someone could have created a handler within their launchfile? Not sure if that's even possible though, and would be terrible practice anyway.
Signal handlers they install in their nodes are of course separate, as signals are a per-process property, and launch is but the parent of the node children.
In terms of signal.set_wakeup_fd, nothing extra needs to be done. Installing the signal handlers here simply prevents the kernel from immediately terminating the process as is the common default disposition, and allows Python to then forward the signal through the fd.

if handler is not None:
if not callable(handler):
raise ValueError('signal handler must be a callable')
Expand Down
10 changes: 5 additions & 5 deletions launch/test/launch/utilities/test_signal_management.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,19 +25,19 @@


def cap_signals(*signals):
def _noop(*args):
pass

def _decorator(func):
@functools.wraps(func)
def _wrapper(*args, **kwargs):
handlers = {}
try:
for s in signals:
handlers[s] = signal.signal(s, _noop)
handlers[s] = signal.signal(s, signal.default_int_handler)
return func(*args, **kwargs)
except KeyboardInterrupt:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ciandonovan hmm, this will skip the entire test with a green check. The asyncio loop won't deliver any handlers if it is interrupted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify why, or what it's interrupting? My understanding is that the assert in the finally section will always get run no matter what exceptions occur so long as they're handled.

Copy link
Contributor

@hidmic hidmic May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, those assert will be executed (and likely succeed), but all the assert in the test_async_safe_signal_manager body after the first signal is raised won't because a KeyboardInterrupt exception will have been raised during asyncio.run_until_complete execution.

finally:
assert all(signal.signal(s, h) is _noop for s, h in handlers.items())
assert all(signal.signal(s, h) is signal.default_int_handler
for s, h in handlers.items())
return _wrapper

return _decorator
Expand Down