Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for 'perf record and 'perf report' to devlib #382

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 52 additions & 2 deletions devlib/trace/perf.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

# Copyright 2018 ARM Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
Expand All @@ -23,7 +24,8 @@
from devlib.utils.misc import ensure_file_directory_exists as _f


PERF_COMMAND_TEMPLATE = '{} stat {} {} sleep 1000 > {} 2>&1 '
PERF_COMMAND_TEMPLATE = '{} {} {} {} sleep 1000 > {} 2>&1 '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could take the opportunity here to introduce named replacement format fields:

Suggested change
PERF_COMMAND_TEMPLATE = '{} {} {} {} sleep 1000 > {} 2>&1 '
PERF_COMMAND_TEMPLATE = '{perf} {command} {args} {events} sleep 1000 > {outfile} 2>&1 '

allowing more readable and reliable calls to PERF_COMMAND_TEMPLATE.format() as they don't use position-dependant inputs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@setrofim & @marcbonnici : This might be a standard approach in devlib but, as we're capturing the output to a file, why are we merging the 2 streams? Why not '{perf_command} > {stdout} 2> {stderr}' and extract '{label}.stdout' and '{label}.stderr which would make parsing the WA output on the host easier (e.g. check the stderr output file first, if it's alright, use stdout or other outputs instead of having to untangle the 2 streams)?

If we don't want that, isn't > {outfile} 2>&1 simply &> {outfile} or is there a subtle difference?

Second question (tiny detail, out of curiosity), what is the final space in the string for?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, this is to save having to pull two files instead of one. Given that the output is generally parsed with a line parser looking for specific patterns, mixing in stderr lines usually don't impact things. The question is -- is there specific benefit to splitting stdout and stderr in this case?

If we don't want that, isn't > {outfile} 2>&1 simply &> {outfile} or is there a subtle difference?

The bash man page specifically states they're semantically equivalent. I just end to write 2>&1 out if habit and because I keep forgetting that &> exists.

Second question (tiny detail, out of curiosity), what is the final space in the string for?

Erm. Aesthetics? A subtle reference to the TBS animated show? </joke> Seriously, I have no idea. It might be a hold-over form an earlier WA2 implementation where it might have mattered. But I don't see it serving a purpose in the current code. Well spotted.

PERF_REPORT_COMMAND_TEMPLATE = '{} report {} -i {} > {} 2>&1 '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we modify PERF_COMMAND_TEMPLATE so that it can also be used by perf report (and, eventually, other perf subcommands) instead of having to create a new constant everytime?


PERF_COUNT_REGEX = re.compile(r'^(CPU\d+)?\s*(\d+)\s*(.*?)\s*(\[\s*\d+\.\d+%\s*\])?\s*$')

Expand Down Expand Up @@ -69,6 +71,8 @@ def __init__(self, target,
events=None,
optionstring=None,
labels=None,
perf_mode='stat',
report_optionstring=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@setrofim & @marcbonnici : could we consider extending optionstring, instead of adding even more YAML parameters? I have some suggestions for this but I feel like they go beyond this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on what you mean (maybe raise a separate issue, if inappropriate for this PR)?

force_install=False):
super(PerfCollector, self).__init__(target)
self.events = events if events else DEFAULT_EVENTS
Expand All @@ -82,10 +86,20 @@ def __init__(self, target,
self.optionstrings = [optionstring]
if self.events and isinstance(self.events, basestring):
self.events = [self.events]
if isinstance(report_optionstring, list):
self.report_optionstrings = report_optionstring
else:
self.report_optionstrings = [report_optionstring]
if not self.labels:
self.labels = ['perf_{}'.format(i) for i in range(len(self.optionstrings))]
if len(self.labels) != len(self.optionstrings):
raise ValueError('The number of labels must match the number of optstrings provided for perf.')
if len(self.optionstrings) != len(self.report_optionstrings):
raise ValueError('The number of report_optionstrings must match the number of optionstrings provided for perf.')
if perf_mode == 'stat' or perf_mode == 'record':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, along with the elif clause can be converted to just

if perf_mode not in ['stat', 'record']:
    raise ValueError('Invalid perf_mode: must be either "stat" or "record"')
self.perf_mode = perf_mode

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if perf_mode not in ['stat', 'record']:
    raise ValueError('Invalid perf_mode: must be either "stat" or "record"')

Instead of the hardcoded ['stat', 'record'], we could even have a class attribute (PerfCollector.SUPPORTED_SUBCOMMANDS?) used in the if and error string that would also be easily and cleanly accessible by the "front-end" WA, if required (e.g. allowed_values).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll update this.

self.perf_mode = perf_mode
elif perf_mode != 'stat' or perf_mode != 'record':
raise ValueError('Invalid perf_mode setting, must be stat or record')

self.binary = self.target.get_installed('perf')
if self.force_install or not self.binary:
Expand All @@ -98,6 +112,8 @@ def reset(self):
for label in self.labels:
filepath = self._get_target_outfile(label)
self.target.remove(filepath)
filepath = self._get_target_reportfile(label)
self.target.remove(filepath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We remove a report file event if we're not going to call perf report: this might be an unexpected side effect (or not, depending on the guarantees given devlib on what it's allowed to do with files already existing on the target).


def start(self):
for command in self.commands:
Expand All @@ -108,7 +124,23 @@ def stop(self):

# pylint: disable=arguments-differ
def get_trace(self, outdir):
for label in self.labels:
for label, report_opt in zip(self.labels, self.report_optionstrings):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, as a reader that hasn't written the code, I find the fact that PerfCollector.get_trace relies exclusively on side-effects quite confusing when reading the user code (WA). The self.collector.get_trace(outdir) call doesn't inform (the reader or the calling code) which files were pulled, from where, and how many they were. I believe returning a List[str] giving the paths to the host files that were created would be the simplest way to address this.

Going one step further, I notice that TraceCollector.get_trace (I couldn't find the documentation for that class) names its parameter outfile instead of outdir meaning that the expected behaviour of get_trace in a descendant of TraceCollector might be different from what this overloaded version does; in fact, to be picky, this implementation might be extracting more than one trace to outdir making even its name misleading (as opposed to get_traces).

Another way to do this, would be to implement get_trace so that the caller has to ask what it wants and gets exactly that (as opposed to a method output depending on the internal state of the PerfCollector instance); the function could take, for example, a str giving the extension of the file ('data', 'stdout', 'stderr', 'report', ...) or, even cleaner, a value taken from an enumeration defined by the class (say PerfCollector.TRACE_KINDS). I appreciate the extra effort that this would require but it would allow for more robust execution path, more elegant error handling (or at least detection) and more readable and expressive code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are all excellent points. The "trace" interface hasn't been particularly well thought out and probably needs some serious though put into redesigning it. (And as has been pointed out elsewhere, "trace" is not a good name for what is being handled by this interface at the moment (see #378.)

This a much bigger issue that this PR, and I don't think it should block it. There is already precedent for directories being passed to get_trace (e.g. ftrace implementation expects outfile to be either a directory or a file). I agree that the name outfile is confusing, and you're right, it's not documented -- that's on me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have raised an issue to track this: #383. If you have any suggestions for changes to the collector API, please feel free to add them to the issue.

#in record mode generate report and copy
if self.perf_mode == 'record':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same idea as for the COMMAND_TEMPLATE string: It would be nice to have "subcommand-agnostic" code here and have __init__ (or something else) register which files have to be pulled from the device by get_trace (or, maybe, figure this out at entry of get_trace) instead of having to replicate more "file extracting" code here whenever a new subcommand if added.

# .rpt
command = self._build_perf_report_command(report_opt, label)
self.target.execute(command, as_root=True)
target_file = self._get_target_reportfile(label)
host_relpath = os.path.basename(target_file)
host_file = _f(os.path.join(outdir, host_relpath))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you seem to be doing this process of:

  1. calling _get_target_<...>file
  2. building the path for the corresponding host file

Could we consider instead having a _get_<...>path_pair() which would return (target_path, host_path) (or even better, a namedtuple) allowing 2 cleaner calls to self.target.pull and reducing code duplication?

self.target.pull(target_file, host_file)
# .trace
target_file = self._get_target_tracefile(label)
host_relpath = os.path.basename(target_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a cumbersome way of getting '{}.trace'.format(label) or am I missing something?

host_file = _f(os.path.join(outdir, host_relpath))
target_file = self._get_target_perfdatafile()
self.target.pull(target_file, host_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous 5 lines seem to be pulling perf.data and renaming it in the process; isn't there a simpler and cleaner way to do that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be the cause of the permission errors: perf record seems to generate a perf.data file with permissions 600 (owned by root as it's running through su) so that self.target.pull (which ends up doing an adb pull which is seen, from the target filesystem, as an access from a non-root user) would fail. The solution might be to use self.target.pull(..., as_root=True)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, what happens to the file pointed to by self._get_target_tracefile(label)? As target_file is overwritten, it's never pulled from the device or is it done elsewhere? Should we even care?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file never exists on the device because a mixture of perf behavior and how it is executed by WA. WA executes perf to run sleep and then kills it after a certain time. This seems to break the perf -o option and perf just outputs to a default place/file name ({current directory}/perf.data). So the -o option isn't used to run perf record and we do a rename of perf.data. Seemed easiest just to do it as part of the copy rather than directly on the device.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[..] then kills it after a certain time. This seems to break the perf -o option [...]

Interesting, I didn't know that. It's probably because perf has buffered some data and, as .kill() calls kill which sends SIGKILL, perf can't handle the signal and, therefore, can't write back what it's holding to the file. Sending a signal it's allowed to handle would probably solve that; this is something to keep in mind. Thanks

# .out
target_file = self._get_target_outfile(label)
host_relpath = os.path.basename(target_file)
host_file = _f(os.path.join(outdir, host_relpath))
Expand All @@ -131,7 +163,25 @@ def _get_target_outfile(self, label):
def _build_perf_command(self, options, events, label):
event_string = ' '.join(['-e {}'.format(e) for e in events])
command = PERF_COMMAND_TEMPLATE.format(self.binary,
self.perf_mode,
options or '',
event_string,
self._get_target_outfile(label))
return command

def _get_target_perfdatafile(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these functions really necessary? self.target.get_workpath('perf.data') seems perfectly readable and more straightforward, IMO.

return self.target.get_workpath('perf.data')

def _get_target_reportfile(self, label):
return self.target.get_workpath('{}.rpt'.format(label))

def _get_target_tracefile(self, label):
return self.target.get_workpath('{}.trace'.format(label))

def _build_perf_report_command(self, reportoptions, label):
reportoptions_string = reportoptions
command = PERF_REPORT_COMMAND_TEMPLATE.format(self.binary,
reportoptions_string,
self._get_target_perfdatafile(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing this on my linux device the perf.data file is generated in the default login location due to the automatic generation rather than the workpath that this expects. If there is no way to control where this file is generated could I suggest explicitly changing to the desired output folder before running the record command to ensure that the output file is in the correct location?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no way to control where this file is generated

perf record has -o, --output= for doing exactly that (see perf help record for more).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Android it seems to include a cd to the output folder as part of the command execution:
2019-04-29 19:23:04,804 DEBUG android: adb -s 10.1.16.10:5555 shell echo 'cd sdcard/devlib-target && /data/local/tmp/bin/busybox nohup /data/local/tmp/bin/perf record -a -e migrations -e cs -e r11 sleep 1000 > /sdcard/devlib-target/perf_0.out 2>&1 &' | su

I'd assume that it is the same on Linux but haven't confirmed. I can look at running on Linux target but will take me a little time. Do you happen to have the run.log file from your testing to confirm the command used on Linux?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I see the same behaviour on my android device however on linux it produces:
DEBUG ssh: (sh -c '/usr/bin/perf record -a -e migrations -e cs sleep 1000 > /home/root/devlib-target/perf_0.out 2>&1 ' 1>/dev/null 2>/dev/null &); __devlib_ec=$?; echo; echo $__devlib_ec

For some reason the kick_off implementation on linux diffs to androids and does not have the initial cd command. I'm not sure why this would be the case and it seems like something that should be corrected to be consistent.

self._get_target_reportfile(label))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And when testing this on my android device the fails with error code 244 caused by magic/endian check failed. Have you seen this on your end?

Copy link
Author

@markbellamy markbellamy Apr 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen error code 244 on a few occasions but haven't tracked it down. It has been working reliable for a while for me on the Android device I am using. I was guessing it was related to the perf report command being '{command}' | su rather than '{command} | su' used for perf record but that just a guess. I'll try it on a few devices to see if it reliable reproduces or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested on a few Android devices and see error code 244 or just failures on some but not others. It seems to be consistent behavior on the devices, so a device either works or doesn't.

The error message seems to be related to some check of the perf.data file and likely finding an corruption issue with perf.data. So I think this is a result of the approach WA uses for running (aka killing) perf and some platforms behavior writing to the perf.data file to storage when killed.

Apart from a re-write of perf support in WA I think the only thing that could be done here is to try identify the behavior that causes the file corruption and see if we can override it from the device command prompt (e.g. setprop ....)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting, I've tried running this again with the latest version and no longer see this problem on my device.
And I apologies however I think I may have made an error before. In order to stop the perf instrument we're actually just killing the sleep command that perf is monitoring rather than killing perf itself in order to stop the measurements. The killing of perf only happens when the instrument is reset to ensure there are no extra processes running before running the workload and in the teardown stage. So I'm not sure this would explain this behaviour.
I'm instead wondering if it could be down to the speed of the device and if the data hasn't finished being written to disk before we try to process it? Would it be possible to try adding a delay in between the two and then test this on a device that has this issue to see if that could be the case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't notice @marcbonnici had replied to this: could we try rebasing on #384 to see if the error message still appears?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried #384 and sleep(10) and still seeing exit code 244.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I'm wondering why I received this initially and haven't been able to reproduce it since.
Would you be able to run the steps that WA is performing manually to determine if it is something that WA is doing to cause this or whether it's perf on the device itself that is having issues?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case, could you give #386 a try?

return command