Skip to content
This repository has been archived by the owner on May 17, 2023. It is now read-only.

Performance monitoring and debug

Dmitry Rogozhkin edited this page Oct 9, 2019 · 7 revisions

Table of Contents

Overview

There are 2 distinctive kind of usages around collecting performance data:

  1. Performance monitoring (which might happen at production environment)
  2. Performance debug (which happens in development debug environment)
Available tools usually focus on one of the above usages, sometimes - both.
Tool OS Monitoring Debug uAPI Notes
gputop Linux Possible (w/ care) Yes No Performance monitoring possible (without high impact on the system) for some counters
IGT trace.pl Linux No Yes No  
Linux perf Linux Yes Yes Yes  
metrics monitor Linux Yes High level only Yes Sample for Linux perf
UMDPerfProfiler Linux No Yes No  
VTune Linux/Windows No Yes No  

Linux perf

Quick start:

  • For more information see: Linux perf .
  • To install on Linux:
    • Either install thru package manager like: yum install perf
    • Or build from sources which are inside kernel tree: cd tools && make perf
  • You have an access to API documentation: man 2 perf_event_open
Key available tools:
Tool Description
perf stat Obtain event counts
perf record Record events for later reporting
perf report Break down events by process, function, etc.
perf annotate Annotate assembly or source code with event counts
perf top See live event count
perf bench Run different kernel microbenchmarks
Mind that appropriate permissions are required to get some metrics:

1. /proc/sys/kernel/perf_event_paranoid system file specifies kernel action globally for all users:

Value Meaning Notes
-1 Allow use of (almost) all events by all users
>= 0 Disallow raw tracepoint access by users without CAP_IOC_LOCK
>= 1 Disallow CPU event access by users without CAP_SYS_ADMIN Starting from this level non-priv users will not be able to query global statistics
>= 2 Disallow kernel profiling by users without CAP_SYS_ADMIN
>= 3 Disallow events access by non-priv users Depending on the distribution! Not all distributions support this setting. Known distributions are: Debian, Android.

2. Alternatively application can be granted capabilities by the privileged users - see which capabilities are needed in the previous table. In this case this application would be capable to request events statistics if ran by any user (who has permission to execute this program). For example, to grant CAP_SYS_ADMIN to metrics_monitor application:

$ sudo setcap cap_sys_admin+ep metrics_monitor
$ sudo getcap metrics_monitor
metrics_monitor = cap_sys_admin+ep

$ sudo setcap -r metrics_monitor  # this command will remove all the caps from the metrics_monitor

Pay attention that as soon as some capabilities are set for the application, shared libraries from which this application depends on will be searched for only in the system paths, i.e. LD_LIBRARY_PATH adjustments would no longer be possible: make sure that all dependencies for applications with capabilities are properly installed.

Quick start commands:

Command Description
perf list Get list of available perf event
perf stat -e cpu-cycles,power/energy-cores/ ls /sys Collect per-task events
perf stat -e cpu-cycles,power/energy-cores/ -a ls /sys Collect global events (mind -a option)
perf stat -e i915/rcs0-busy/,i915/vcs0-busy/ -a < workload.sh > Collect busy times for RCS0 and VCS0 (i915 events are global)
perf stat -e i915/rcs0-busy/,i915/vcs0-busy/ -a -I 100 < workload.sh > Sample busy metrics each 100 ms
perf record -g < workload.sh > Collect CPU% metrics
perf report -G View collected metrics from perf record
How to get data similar to the one reported by metrics monitor (you may wish to tweak reporting interval):
perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/vcs0-busy/,i915/actual-frequency/ -a -I 100 <workload.sh> 
How to access metrics from the application: Example:
# perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/vcs0-busy/ -a <workload.sh>
 
 
 Performance counter stats for 'system wide':
 
     7,946,514,387 ns   i915/rcs0-busy/
     3,206,574,844 ns   i915/vcs0-busy/
     3,206,763,484 ns   i915/vcs1-busy/
     6,842,922,734 ns   i915/vecs0-busy/
 
       8.436751418 seconds time elapsed

Linux metrics monitor

Quick start:

  • For more information see: metrics monitor readme
  • Sources: metrics monitor
  • To run:
    • sudo LD_LIBRARY_PATH=$MFX_INSTALL/share/mfx/samples $MFX_INSTALL/share/mfx/samples/metrics_monitor
    • Run workload in parallel shell
See Linux perf notes about required permissions. Metrics monitor reports the following:
Metric Corresponding i915 event Meaning
RENDER usage i915/rcs0-busy/ RCS (Render Engine, GPGPU) utilization, [0-100%]
VIDEO usage i915/vcs0-busy/ VCS0 (VDBOX0) utilization, [0-100%]
VIDEO_E usage i915/vecs0-busy/ VECS (VEBOX) utilization, [0-100%]
VIDEO2 usage i915/vcs1-busy/ VCS1 (VDBOX1) utilization, [0-100%]
GT Freq i915/actual-frequency/ Actual (granted) average GPU frequency, MHz

Example:

perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/vcs0-busy/ -a <workload.sh> 
 
RENDER usage: 0.00,   VIDEO usage: 0.00,  VIDEO_E usage: 0.00  VIDEO2 usage: 0.00  GT Freq: 349.95
RENDER usage: 0.00,   VIDEO usage: 0.00,  VIDEO_E usage: 0.00  VIDEO2 usage: 0.00  GT Freq: 349.95
RENDER usage: 1.85,   VIDEO usage: 4.09,  VIDEO_E usage: 7.13  VIDEO2 usage: 4.09  GT Freq: 453.94
RENDER usage: 99.01,  VIDEO usage: 36.88, VIDEO_E usage: 77.94 VIDEO2 usage: 36.90 GT Freq: 949.88
RENDER usage: 100.00, VIDEO usage: 37.34, VIDEO_E usage: 77.80 VIDEO2 usage: 37.39 GT Freq: 949.84
RENDER usage: 100.00, VIDEO usage: 37.86, VIDEO_E usage: 78.84 VIDEO2 usage: 37.91 GT Freq: 949.88

gputop

Quick start:

  • For mode details see: gputop
  • Permits to access Intel GPU HW counters
  • Server-kind application to collect data
To run server:
sudo gputop

To collect data:

gputop-wrapper -m RenderBasic -c GpuCoreClocks,EuActive,L3Misses,GtiL3Throughput,EuFpuBothActive
Server: localhost:7890
Sampling period: 1 s
Monitoring system wide
Connected
 
System info:
    Kernel release: 4.15.0-rc4+
    Kernel build: #49 SMP Tue Dec 19 12:17:49 GMT 2017
CPU info:
    CPU model: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
    CPU cores: 4
GPU info:
    GT name: Kabylake GT2 (Gen 9, PCI 0x5916)
    Topology: 168 threads, 24 EUs, 1 slices, 3 subslices
    GT frequency range: 0.0MHz / 0.0MHz
    CS timestamp frequency: 12000000 Hz / 83.33 ns
OA info:
    OA Hardware Sampling Exponent: 22
    OA Hardware Period: 699050666 ns / 699.1 ms
  Timestamp  GpuCoreClocks  EuActive      L3Misses  GtiL3Throughput  EuFpuBothActive
         (ns)     (cycles/s)       (%)  (messages/s)              (B)              (%)
 285961912416,770.9 M cycles,  0.919 %,   1473133.00,       89.91 MiB,         0.256 %
 286992496416,900.1 M cycles,   1.04 %,   2036968.00,       124.3 MiB,         0.316 %
 288190601500,521.4 M cycles,   1.81 %,   2030997.00,         124 MiB,         0.537 %
 289519269500,1.028 G cycles,   11.8 %,  33181879.00,       1.978 GiB,          3.82 %
 290562176250,1.007 G cycles,   11.1 %,  30115582.00,       1.795 GiB,          3.66 %
 291569408333,905.9 M cycles,     10 %,  24534419.00,       1.462 GiB,          3.18 %
 292590314500,762.4 M cycles,   6.89 %,  10934947.00,       667.4 MiB,          2.31 %
 293954678166,538.5 M cycles,   1.72 %,   2034698.00,       124.2 MiB,         0.543 %
 295323480416,751.6 M cycles,   1.28 %,   2034477.00,       124.2 MiB,         0.356 %

IGT trace.pl

Quick start:

  • Source: IGT trace.pl
  • Requires:
    • Linux perf to collect data and dump in raw text format: yum install perf
    • VIS to render in HTML: apt-get install npm && npm install vis
    • CONFIG_EXPERT=y
    • CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS=y
  • Permits to view i915 perf events on the timeline
To dump data in raw text format:
$INSTALL/igt-gpu-tools/scripts/trace.pl --trace <workload.sh>   # will produce perf.data file
perf script > workload.data
To render data in html:
mkdir ~/workdir && cd ~/workdir && npm install vis
$INSTALL/igt-gpu-tools/scripts/trace.pl --html < workload.data > node_modules/workload.html    # mind that < and > are redirections
firefox node_modules/workload.html
Please, pay attention that location of output workload.html is important: it should be in node_modules directory. Display example:

UMDPerfProfiler

Quick start:

  • Source: UMDPerfProfiler
  • Permits to collect and profile media tasks timing

VTune

Comprehensive performance analysis tool. Permits to collect various HW CPU and GPU counters, profile application, display tasks timelines. For details see VTune.

Clone this wiki locally