Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for 'perf record and 'perf report' to devlib #382

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

markbellamy
Copy link

No description provided.

if not self.labels:
self.labels = ['perf_{}'.format(i) for i in range(len(self.optionstrings))]
if len(self.labels) != len(self.optionstrings):
raise ValueError('The number of labels must match the number of optstrings provided for perf.')
if len(self.optionstrings) != len(self.report_optionstrings):
raise ValueError('The number of report_optionstrings must match the number of optionstrings provided for perf.')
if perf_mode == 'stat' or perf_mode == 'record':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, along with the elif clause can be converted to just

if perf_mode not in ['stat', 'record']:
    raise ValueError('Invalid perf_mode: must be either "stat" or "record"')
self.perf_mode = perf_mode

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if perf_mode not in ['stat', 'record']:
    raise ValueError('Invalid perf_mode: must be either "stat" or "record"')

Instead of the hardcoded ['stat', 'record'], we could even have a class attribute (PerfCollector.SUPPORTED_SUBCOMMANDS?) used in the if and error string that would also be easily and cleanly accessible by the "front-end" WA, if required (e.g. allowed_values).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll update this.

reportoptions_string = reportoptions
command = PERF_REPORT_COMMAND_TEMPLATE.format(self.binary,
reportoptions_string,
self._get_target_perfdatafile(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing this on my linux device the perf.data file is generated in the default login location due to the automatic generation rather than the workpath that this expects. If there is no way to control where this file is generated could I suggest explicitly changing to the desired output folder before running the record command to ensure that the output file is in the correct location?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no way to control where this file is generated

perf record has -o, --output= for doing exactly that (see perf help record for more).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Android it seems to include a cd to the output folder as part of the command execution:
2019-04-29 19:23:04,804 DEBUG android: adb -s 10.1.16.10:5555 shell echo 'cd sdcard/devlib-target && /data/local/tmp/bin/busybox nohup /data/local/tmp/bin/perf record -a -e migrations -e cs -e r11 sleep 1000 > /sdcard/devlib-target/perf_0.out 2>&1 &' | su

I'd assume that it is the same on Linux but haven't confirmed. I can look at running on Linux target but will take me a little time. Do you happen to have the run.log file from your testing to confirm the command used on Linux?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I see the same behaviour on my android device however on linux it produces:
DEBUG ssh: (sh -c '/usr/bin/perf record -a -e migrations -e cs sleep 1000 > /home/root/devlib-target/perf_0.out 2>&1 ' 1>/dev/null 2>/dev/null &); __devlib_ec=$?; echo; echo $__devlib_ec

For some reason the kick_off implementation on linux diffs to androids and does not have the initial cd command. I'm not sure why this would be the case and it seems like something that should be corrected to be consistent.

command = PERF_REPORT_COMMAND_TEMPLATE.format(self.binary,
reportoptions_string,
self._get_target_perfdatafile(),
self._get_target_reportfile(label))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And when testing this on my android device the fails with error code 244 caused by magic/endian check failed. Have you seen this on your end?

Copy link
Author

@markbellamy markbellamy Apr 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen error code 244 on a few occasions but haven't tracked it down. It has been working reliable for a while for me on the Android device I am using. I was guessing it was related to the perf report command being '{command}' | su rather than '{command} | su' used for perf record but that just a guess. I'll try it on a few devices to see if it reliable reproduces or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested on a few Android devices and see error code 244 or just failures on some but not others. It seems to be consistent behavior on the devices, so a device either works or doesn't.

The error message seems to be related to some check of the perf.data file and likely finding an corruption issue with perf.data. So I think this is a result of the approach WA uses for running (aka killing) perf and some platforms behavior writing to the perf.data file to storage when killed.

Apart from a re-write of perf support in WA I think the only thing that could be done here is to try identify the behavior that causes the file corruption and see if we can override it from the device command prompt (e.g. setprop ....)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting, I've tried running this again with the latest version and no longer see this problem on my device.
And I apologies however I think I may have made an error before. In order to stop the perf instrument we're actually just killing the sleep command that perf is monitoring rather than killing perf itself in order to stop the measurements. The killing of perf only happens when the instrument is reset to ensure there are no extra processes running before running the workload and in the teardown stage. So I'm not sure this would explain this behaviour.
I'm instead wondering if it could be down to the speed of the device and if the data hasn't finished being written to disk before we try to process it? Would it be possible to try adding a delay in between the two and then test this on a device that has this issue to see if that could be the case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't notice @marcbonnici had replied to this: could we try rebasing on #384 to see if the error message still appears?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried #384 and sleep(10) and still seeing exit code 244.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I'm wondering why I received this initially and haven't been able to reproduce it since.
Would you be able to run the steps that WA is performing manually to determine if it is something that WA is doing to cause this or whether it's perf on the device itself that is having issues?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case, could you give #386 a try?

Copy link
Contributor

@ptosi ptosi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I am not a reviewer of devlib, this "review" is posted as comments and should only be seen as so.

These include comments about details but also general comments about the architecture of this instrument and the way its eventual growth seems to be shaped. As an end user, I would personally prefer the perf instrument to be as generic and flexible as possible (in terms of running perf commands, not only stat or record nor having to manually patch/inject redundant code for each subcommand).

However, I believe that this should not influence this PR. Instead, it could be merged for now and another (heavier) PR could be open later on to refactor the perf instrument. Therefore, I leave it to the author of the PR and the maintainers to decide which of my comments add value to this PR and which ones may be ignored.

@@ -23,7 +24,8 @@
from devlib.utils.misc import ensure_file_directory_exists as _f


PERF_COMMAND_TEMPLATE = '{} stat {} {} sleep 1000 > {} 2>&1 '
PERF_COMMAND_TEMPLATE = '{} {} {} {} sleep 1000 > {} 2>&1 '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could take the opportunity here to introduce named replacement format fields:

Suggested change
PERF_COMMAND_TEMPLATE = '{} {} {} {} sleep 1000 > {} 2>&1 '
PERF_COMMAND_TEMPLATE = '{perf} {command} {args} {events} sleep 1000 > {outfile} 2>&1 '

allowing more readable and reliable calls to PERF_COMMAND_TEMPLATE.format() as they don't use position-dependant inputs.

@@ -23,7 +24,8 @@
from devlib.utils.misc import ensure_file_directory_exists as _f


PERF_COMMAND_TEMPLATE = '{} stat {} {} sleep 1000 > {} 2>&1 '
PERF_COMMAND_TEMPLATE = '{} {} {} {} sleep 1000 > {} 2>&1 '
PERF_REPORT_COMMAND_TEMPLATE = '{} report {} -i {} > {} 2>&1 '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we modify PERF_COMMAND_TEMPLATE so that it can also be used by perf report (and, eventually, other perf subcommands) instead of having to create a new constant everytime?

@@ -69,6 +71,8 @@ def __init__(self, target,
events=None,
optionstring=None,
labels=None,
perf_mode='stat',
report_optionstring=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@setrofim & @marcbonnici : could we consider extending optionstring, instead of adding even more YAML parameters? I have some suggestions for this but I feel like they go beyond this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on what you mean (maybe raise a separate issue, if inappropriate for this PR)?

self.target.pull(target_file, host_file)
# .trace
target_file = self._get_target_tracefile(label)
host_relpath = os.path.basename(target_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a cumbersome way of getting '{}.trace'.format(label) or am I missing something?

host_relpath = os.path.basename(target_file)
host_file = _f(os.path.join(outdir, host_relpath))
target_file = self._get_target_perfdatafile()
self.target.pull(target_file, host_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous 5 lines seem to be pulling perf.data and renaming it in the process; isn't there a simpler and cleaner way to do that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be the cause of the permission errors: perf record seems to generate a perf.data file with permissions 600 (owned by root as it's running through su) so that self.target.pull (which ends up doing an adb pull which is seen, from the target filesystem, as an access from a non-root user) would fail. The solution might be to use self.target.pull(..., as_root=True)?

@@ -23,7 +24,8 @@
from devlib.utils.misc import ensure_file_directory_exists as _f


PERF_COMMAND_TEMPLATE = '{} stat {} {} sleep 1000 > {} 2>&1 '
PERF_COMMAND_TEMPLATE = '{} {} {} {} sleep 1000 > {} 2>&1 '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@setrofim & @marcbonnici : This might be a standard approach in devlib but, as we're capturing the output to a file, why are we merging the 2 streams? Why not '{perf_command} > {stdout} 2> {stderr}' and extract '{label}.stdout' and '{label}.stderr which would make parsing the WA output on the host easier (e.g. check the stderr output file first, if it's alright, use stdout or other outputs instead of having to untangle the 2 streams)?

If we don't want that, isn't > {outfile} 2>&1 simply &> {outfile} or is there a subtle difference?

Second question (tiny detail, out of curiosity), what is the final space in the string for?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, this is to save having to pull two files instead of one. Given that the output is generally parsed with a line parser looking for specific patterns, mixing in stderr lines usually don't impact things. The question is -- is there specific benefit to splitting stdout and stderr in this case?

If we don't want that, isn't > {outfile} 2>&1 simply &> {outfile} or is there a subtle difference?

The bash man page specifically states they're semantically equivalent. I just end to write 2>&1 out if habit and because I keep forgetting that &> exists.

Second question (tiny detail, out of curiosity), what is the final space in the string for?

Erm. Aesthetics? A subtle reference to the TBS animated show? </joke> Seriously, I have no idea. It might be a hold-over form an earlier WA2 implementation where it might have mattered. But I don't see it serving a purpose in the current code. Well spotted.

for label in self.labels:
for label, report_opt in zip(self.labels, self.report_optionstrings):
#in record mode generate report and copy
if self.perf_mode == 'record':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same idea as for the COMMAND_TEMPLATE string: It would be nice to have "subcommand-agnostic" code here and have __init__ (or something else) register which files have to be pulled from the device by get_trace (or, maybe, figure this out at entry of get_trace) instead of having to replicate more "file extracting" code here whenever a new subcommand if added.

self.target.execute(command, as_root=True)
target_file = self._get_target_reportfile(label)
host_relpath = os.path.basename(target_file)
host_file = _f(os.path.join(outdir, host_relpath))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you seem to be doing this process of:

  1. calling _get_target_<...>file
  2. building the path for the corresponding host file

Could we consider instead having a _get_<...>path_pair() which would return (target_path, host_path) (or even better, a namedtuple) allowing 2 cleaner calls to self.target.pull and reducing code duplication?

@@ -108,7 +123,23 @@ def stop(self):

# pylint: disable=arguments-differ
def get_trace(self, outdir):
for label in self.labels:
for label, report_opt in zip(self.labels, self.report_optionstrings):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, as a reader that hasn't written the code, I find the fact that PerfCollector.get_trace relies exclusively on side-effects quite confusing when reading the user code (WA). The self.collector.get_trace(outdir) call doesn't inform (the reader or the calling code) which files were pulled, from where, and how many they were. I believe returning a List[str] giving the paths to the host files that were created would be the simplest way to address this.

Going one step further, I notice that TraceCollector.get_trace (I couldn't find the documentation for that class) names its parameter outfile instead of outdir meaning that the expected behaviour of get_trace in a descendant of TraceCollector might be different from what this overloaded version does; in fact, to be picky, this implementation might be extracting more than one trace to outdir making even its name misleading (as opposed to get_traces).

Another way to do this, would be to implement get_trace so that the caller has to ask what it wants and gets exactly that (as opposed to a method output depending on the internal state of the PerfCollector instance); the function could take, for example, a str giving the extension of the file ('data', 'stdout', 'stderr', 'report', ...) or, even cleaner, a value taken from an enumeration defined by the class (say PerfCollector.TRACE_KINDS). I appreciate the extra effort that this would require but it would allow for more robust execution path, more elegant error handling (or at least detection) and more readable and expressive code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are all excellent points. The "trace" interface hasn't been particularly well thought out and probably needs some serious though put into redesigning it. (And as has been pointed out elsewhere, "trace" is not a good name for what is being handled by this interface at the moment (see #378.)

This a much bigger issue that this PR, and I don't think it should block it. There is already precedent for directories being passed to get_trace (e.g. ftrace implementation expects outfile to be either a directory or a file). I agree that the name outfile is confusing, and you're right, it's not documented -- that's on me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have raised an issue to track this: #383. If you have any suggestions for changes to the collector API, please feel free to add them to the issue.

@@ -98,6 +111,8 @@ def reset(self):
for label in self.labels:
filepath = self._get_target_outfile(label)
self.target.remove(filepath)
filepath = self._get_target_reportfile(label)
self.target.remove(filepath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We remove a report file event if we're not going to call perf report: this might be an unexpected side effect (or not, depending on the guarantees given devlib on what it's allowed to do with files already existing on the target).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants