Performance monitoring and debug

Table of Contents Overview Linux perf Linux metrics monitor gputop IGT trace.pl UMDPerfProfiler VTune

Overview

There are 2 distinctive kind of usages around collecting performance data:

Performance monitoring (which might happen at production environment)
Performance debug (which happens in development debug environment)

Available tools usually focus on one of the above usages, sometimes - both.

Tool	OS	Monitoring	Debug	uAPI	Notes
gputop	Linux	Possible (w/ care)	Yes	No	Performance monitoring possible (without high impact on the system) for some counters
IGT trace.pl	Linux	No	Yes	No
Linux perf	Linux	Yes	Yes	Yes
metrics monitor	Linux	Yes	High level only	Yes	Sample for Linux perf
UMDPerfProfiler	Linux	No	Yes	No
VTune	Linux/Windows	No	Yes	No

Linux perf

Quick start:

For more information see: Linux perf .
To install on Linux:
- Either install thru package manager like: yum install perf
- Or build from sources which are inside kernel tree: cd tools && make perf
You have an access to API documentation: man 2 perf_event_open

Key available tools:

Tool	Description
perf stat	Obtain event counts
perf record	Record events for later reporting
perf report	Break down events by process, function, etc.
perf annotate	Annotate assembly or source code with event counts
perf top	See live event count
perf bench	Run different kernel microbenchmarks

Mind that appropriate permissions are required to get some metrics:

1. /proc/sys/kernel/perf_event_paranoid system file specifies kernel action globally for all users:

Value	Meaning	Notes
-1	Allow use of (almost) all events by all users
>= 0	Disallow raw tracepoint access by users without CAP_IOC_LOCK
>= 1	Disallow CPU event access by users without CAP_SYS_ADMIN	Starting from this level non-priv users will not be able to query global statistics
>= 2	Disallow kernel profiling by users without CAP_SYS_ADMIN
>= 3	Disallow events access by non-priv users	Depending on the distribution! Not all distributions support this setting. Known distributions are: Debian, Android.

2. Alternatively application can be granted capabilities by the privileged users - see which capabilities are needed in the previous table. In this case this application would be capable to request events statistics if ran by any user (who has permission to execute this program). For example, to grant CAP_SYS_ADMIN to metrics_monitor application:

$ sudo setcap cap_sys_admin+ep metrics_monitor
$ sudo getcap metrics_monitor
metrics_monitor = cap_sys_admin+ep

$ sudo setcap -r metrics_monitor  # this command will remove all the caps from the metrics_monitor

Pay attention that as soon as some capabilities are set for the application, shared libraries from which this application depends on will be searched for only in the system paths, i.e. LD_LIBRARY_PATH adjustments would no longer be possible: make sure that all dependencies for applications with capabilities are properly installed.

Quick start commands:

Command	Description
perf list	Get list of available perf event
perf stat -e cpu-cycles,power/energy-cores/ ls /sys	Collect per-task events
perf stat -e cpu-cycles,power/energy-cores/ -a ls /sys	Collect global events (mind -a option)
perf stat -e i915/rcs0-busy/,i915/vcs0-busy/ -a < workload.sh >	Collect busy times for RCS0 and VCS0 (i915 events are global)
perf stat -e i915/rcs0-busy/,i915/vcs0-busy/ -a -I 100 < workload.sh >	Sample busy metrics each 100 ms
perf record -g < workload.sh >	Collect CPU% metrics
perf report -G	View collected metrics from perf record

How to get data similar to the one reported by metrics monitor (you may wish to tweak reporting interval):

perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/vcs0-busy/,i915/actual-frequency/ -a -I 100 <workload.sh>

How to access metrics from the application:

See metrics monitor which is a sample for Linux perf

Example:

# perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/vcs0-busy/ -a <workload.sh>
 
 
 Performance counter stats for 'system wide':
 
     7,946,514,387 ns   i915/rcs0-busy/
     3,206,574,844 ns   i915/vcs0-busy/
     3,206,763,484 ns   i915/vcs1-busy/
     6,842,922,734 ns   i915/vecs0-busy/
 
       8.436751418 seconds time elapsed

Linux metrics monitor

Quick start:

For more information see: metrics monitor readme
Sources: metrics monitor

To run:

sudo LD_LIBRARY_PATH=$MFX_INSTALL/share/mfx/samples $MFX_INSTALL/share/mfx/samples/metrics_monitor

Run workload in parallel shell

See Linux perf notes about required permissions. Metrics monitor reports the following:

Metric	Corresponding i915 event	Meaning
RENDER usage	i915/rcs0-busy/	RCS (Render Engine, GPGPU) utilization, [0-100%]
VIDEO usage	i915/vcs0-busy/	VCS0 (VDBOX0) utilization, [0-100%]
VIDEO_E usage	i915/vecs0-busy/	VECS (VEBOX) utilization, [0-100%]
VIDEO2 usage	i915/vcs1-busy/	VCS1 (VDBOX1) utilization, [0-100%]
GT Freq	i915/actual-frequency/	Actual (granted) average GPU frequency, MHz

Example:

perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/vcs0-busy/ -a <workload.sh> 
 
RENDER usage: 0.00,   VIDEO usage: 0.00,  VIDEO_E usage: 0.00  VIDEO2 usage: 0.00  GT Freq: 349.95
RENDER usage: 0.00,   VIDEO usage: 0.00,  VIDEO_E usage: 0.00  VIDEO2 usage: 0.00  GT Freq: 349.95
RENDER usage: 1.85,   VIDEO usage: 4.09,  VIDEO_E usage: 7.13  VIDEO2 usage: 4.09  GT Freq: 453.94
RENDER usage: 99.01,  VIDEO usage: 36.88, VIDEO_E usage: 77.94 VIDEO2 usage: 36.90 GT Freq: 949.88
RENDER usage: 100.00, VIDEO usage: 37.34, VIDEO_E usage: 77.80 VIDEO2 usage: 37.39 GT Freq: 949.84
RENDER usage: 100.00, VIDEO usage: 37.86, VIDEO_E usage: 78.84 VIDEO2 usage: 37.91 GT Freq: 949.88

gputop

Quick start:

For mode details see: gputop
Permits to access Intel GPU HW counters
Server-kind application to collect data

To run server:

sudo gputop

To collect data:

gputop-wrapper -m RenderBasic -c GpuCoreClocks,EuActive,L3Misses,GtiL3Throughput,EuFpuBothActive

Server: localhost:7890
Sampling period: 1 s
Monitoring system wide
Connected
 
System info:
    Kernel release: 4.15.0-rc4+
    Kernel build: #49 SMP Tue Dec 19 12:17:49 GMT 2017
CPU info:
    CPU model: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
    CPU cores: 4
GPU info:
    GT name: Kabylake GT2 (Gen 9, PCI 0x5916)
    Topology: 168 threads, 24 EUs, 1 slices, 3 subslices
    GT frequency range: 0.0MHz / 0.0MHz
    CS timestamp frequency: 12000000 Hz / 83.33 ns
OA info:
    OA Hardware Sampling Exponent: 22
    OA Hardware Period: 699050666 ns / 699.1 ms
  Timestamp  GpuCoreClocks  EuActive      L3Misses  GtiL3Throughput  EuFpuBothActive
         (ns)     (cycles/s)       (%)  (messages/s)              (B)              (%)
 285961912416,770.9 M cycles,  0.919 %,   1473133.00,       89.91 MiB,         0.256 %
 286992496416,900.1 M cycles,   1.04 %,   2036968.00,       124.3 MiB,         0.316 %
 288190601500,521.4 M cycles,   1.81 %,   2030997.00,         124 MiB,         0.537 %
 289519269500,1.028 G cycles,   11.8 %,  33181879.00,       1.978 GiB,          3.82 %
 290562176250,1.007 G cycles,   11.1 %,  30115582.00,       1.795 GiB,          3.66 %
 291569408333,905.9 M cycles,     10 %,  24534419.00,       1.462 GiB,          3.18 %
 292590314500,762.4 M cycles,   6.89 %,  10934947.00,       667.4 MiB,          2.31 %
 293954678166,538.5 M cycles,   1.72 %,   2034698.00,       124.2 MiB,         0.543 %
 295323480416,751.6 M cycles,   1.28 %,   2034477.00,       124.2 MiB,         0.356 %

IGT trace.pl

Quick start:

Source: IGT trace.pl
Requires:
- Linux perf to collect data and dump in raw text format: yum install perf
- VIS to render in HTML: apt-get install npm && npm install vis
- CONFIG_EXPERT=y
- CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS=y
Permits to view i915 perf events on the timeline

To dump data in raw text format:

$INSTALL/igt-gpu-tools/scripts/trace.pl --trace <workload.sh>   # will produce perf.data file
perf script > workload.data

To render data in html:

mkdir ~/workdir && cd ~/workdir && npm install vis
$INSTALL/igt-gpu-tools/scripts/trace.pl --html < workload.data > node_modules/workload.html    # mind that < and > are redirections
firefox node_modules/workload.html

Please, pay attention that location of output workload.html is important: it should be in node_modules directory. Display example:

UMDPerfProfiler

Quick start:

Source: UMDPerfProfiler
Permits to collect and profile media tasks timing

VTune

Comprehensive performance analysis tool. Permits to collect various HW CPU and GPU counters, profile application, display tasks timelines. For details see VTune.

Home

Media SDK for Linux
- Media SDK in Linux Distributions
- Intel Graphics Support in Linux Kernels
Media SDK for Windows
- Media SDK dispatcher for Windows
- Media SDK for UWP applications
FFmpeg QSV
GStreamer MSDK
- Build GStreamer MSDK
Docker
- Running on GPU under docker
Usage guides
- Intel media stack on Ubuntu
- Performance monitoring and debug
Building Media SDK
Running Media SDK CI tests
- Run CI smoke tests
Additional information
- Media SDK Shaders (EU Kernels)
- Previous Media SDK products
Multi-Frame Encode

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance monitoring and debug

Table of Contents

Overview

Linux perf

Linux metrics monitor

gputop

IGT trace.pl

UMDPerfProfiler

VTune

Clone this wiki locally