Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soften POSIX signal for termination of perf #384

Merged
merged 1 commit into from
May 15, 2019

Conversation

ptosi
Copy link
Contributor

@ptosi ptosi commented May 3, 2019

Context

A SIGKILL is, by definition, not allowed to be handled by a process. Therefore, a process receiving this signal isn't able to "clean up" before termination. This is particularly problematic in the case of programs that have "buffered" data (to limit filesystem I/O) and/or that are supposed to output their result when returning.

Typically, perf is used with a target command (e.g. sleep in the COMMAND_TEMPLATES). It collects samples (or does something else depending on the command and flags) until that command returns at which point, it writes back its result (perf.data, output to stdout, ...) before its own termination.

However, if a SIGKILL (or other similar signal) is sent to perf, this can't happen for the reason detailed above and its results are (at least partially) lost. Furthermore, in this case, the target command is still alive after this has happened as it is not a child of the perf process.

Finally, if a SIGINT (or other similar signal) is sent to perf, it handles it by "properly" outputting its results (AFAIK) before returning; it ignores the target command, which stays alive.

Therefore, we can trigger the "clean" reporting from perf and have a "clean exit" (no target command alive) by sending SIGKILL to the target command but nothing prevents us from sending SIGINT to perf before that! By doing so, we guarantee "clean perf results" which adds robustness to the code.

Issue

I am worried that the current implementation might send SIGKILL to perf, depending on the output format of ps (typically native to the device and varying between models):

  • .killall() (which sends SIGKILL, by default, to a list of processes) calls .get_pids_of() which finds the list by running ps and keeping the lines in which the passed keyword (sleep, here) can be found [*];
  • Depending on the output format of ps, its COMMAND/CMD column could output the arguments of the call (along with the command itself): in such a case, the previously described "grep" would pick all lines containing 'sleep' which includes the ones such as 'perf stat -a sleep 100';
  • If the arguments are in COMMAND/CMD, .killall() ends up sending SIGKILL to perf;

This PR is motivated by comments on #382 and I believe that what I'm addressing here might be the cause. But, as this is an idea that came up from reading the code, I would really appreciate if it was tested on devices that currently report "corrupted" outputs from perf.

ps

The lack of (single) standard behaviour for ps is pretty clear in its Ubuntu man page:

DESCRIPTION
       ps displays information about a selection of the active processes.
[...]
       This version of ps accepts several kinds of options:

       1   UNIX options, which may be grouped and must be preceded by a dash.
       2   BSD options, which may be grouped and must not be used with a dash.
       3   GNU long options, which are preceded by two dashes.

and formats exist with and without arguments:

STANDARD FORMAT SPECIFIERS
       Here are the different keywords that may be used to control the output format
[...]
       CODE        HEADER    DESCRIPTION
[...]
       args        COMMAND   command with all its arguments as a string.
[...]
       cmd         CMD       see args.  (alias args, command).

       comm        COMMAND   command name (only the executable name).
[...]
       command     COMMAND   See args.  (alias args, command).
[...]
       ucomm       COMMAND   see comm.  (alias comm, ucmd).

I particularly like the following acknowledgement ("most"):

This version of ps tries to recognize most of the keywords used in other implementations of ps.

So, depending on what the version of ps available on the device implements (and on which defaults it uses), its output is likely to vary between devices. For comparison:

OnePlus6:/ # ps --help
usage: ps [-AadefLlnwZ] [-gG GROUP,] [-k FIELD,] [-o FIELD,] [-p PID,] [-t TTY,] [-uU USER,]

List processes.
[...]
Which FIELDs to show. (Default = -o PID,TTY,TIME,CMD)

-f      Full listing (-o USER:12=UID,PID,PPID,C,STIME,TTY,TIME,ARGS=CMD)
-l      Long listing (-o F,S,UID,PID,PPID,C,PRI,NI,ADDR,SZ,WCHAN,TTY,TIME,CMD)
-o      Output FIELDs instead of defaults, each with optional :size and =title
[...]
Command line -o fields:

  ARGS     CMDLINE minus initial path     CMD  Command (thread) name (stat[2])
  CMDLINE  Command line (argv[])          COMM Command filename (/proc/$PID/exe)
  COMMAND  Command file (/proc/$PID/exe)  NAME Process name (argv[0] of $PID)

Comments

On top of this PR, could .ps() be re-implemented to use ps from the pushed busybox? This would allow a somewhat more standard behaviour or are there reasons why we don't want to do that? From a log, I believe this is not what is being done:

2019-05-03 17:54:34,671 DEBUG       android: adb -s 10.1.16.130:5555 push /home/pietos01/src/devlib/devlib/bin/arm64/busybox /sdcard/devlib-target/busybox
[...]
2019-05-03 17:54:35,068 DEBUG       android: adb -s 10.1.16.130:5555 shell cp /sdcard/devlib-target/busybox /data/local/tmp/bin/busybox
2019-05-03 17:54:35,124 DEBUG       android: adb -s 10.1.16.130:5555 shell rm -rf /sdcard/devlib-target/busybox
2019-05-03 17:54:35,160 DEBUG       android: adb -s 10.1.16.130:5555 shell chmod 0777 /data/local/tmp/bin/busybox
[...(busybox never touched)...]
2019-05-03 17:54:38,150 DEBUG       android:         adb -s 10.1.16.130:5555 shell ps
2019-05-03 17:54:38,322 DEBUG       android:         adb -s 10.1.16.130:5555 shell kill  493
2019-05-03 17:54:38,368 DEBUG       android:         adb -s 10.1.16.130:5555 shell kill  753
2019-05-03 17:54:38,407 DEBUG       android:         adb -s 10.1.16.130:5555 shell kill  975

[*] : Note that, in the case of LinuxTarget, .get_pids_of() seems to be somewhat more robust as it uses -C (which uses the command name) and will simply fail if that flag is not supported. Obviously, this is based on the hope that there are no implementations of ps where -C uses args ...

Replace the default SIGKILL signal sent to perf to "request" its
termination by a SIGINT, allowing it to handle the signal by cleaning up
before exit. This should address issues regarding corrupted perf.data
output files.
devlib/trace/perf.py Show resolved Hide resolved
devlib/trace/perf.py Show resolved Hide resolved
devlib/trace/perf.py Show resolved Hide resolved
@setrofim
Copy link
Collaborator

setrofim commented May 7, 2019

On top of this PR, could .ps() be re-implemented to use ps from the pushed busybox? This would allow a somewhat more standard behaviour or are there reasons why we don't want to do that?

I have this feeling that there may have some reason. But at the moment, I cannot recall the specific issue. Using the busybox version certainly makes sense.

One thing that would need to be checked as part of the switch is that the busybox version reports all the fields needed to populate PsEntry, and if not, that the fields that are to be removed are not used in existing code (including in WA and LISA).

@marcbonnici marcbonnici merged commit b5f3661 into ARM-software:master May 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants