ll_schedule: add a DSP load tracker to ll_tasks_execute #4943

kv2019i · 2021-11-01T16:44:46Z

Add a lightweight load calculatio to ll_tasks_execute() and prints out
the average every 2^10 runs (roughly once per second on most systems).

Example log output:
--cut--
[ 6950848.421715] ( 809129.187500) c0 ll-schedule ./schedule/ll_schedule.c:139 INFO ll avg 19720/38400
[ 8868838.449667] ( 679100.000000) c0 ll-schedule ./schedule/ll_schedule.c:139 INFO ll avg 19658/38400
[ 10802857.643650] ( 565147.187500) c0 ll-schedule ./schedule/ll_schedule.c:139 INFO ll avg 19822/38400
--cut--

UPDATE: example trace output in a case where DSP is under >95% load:
https://sof-ci.01.org/sofpr/PR4943/build10955/devicetest/?model=ADLP_RVP_NOCODEC&testcase=multiple-pipeline-all

Signed-off-by: Kai Vehmanen [email protected]

Add a lightweight load calculatio to ll_tasks_execute() and prints out the average every 2^10 runs (roughly once per second on most systems). Example log output: --cut-- [ 6950848.421715] ( 809129.187500) c0 ll-schedule ./schedule/ll_schedule.c:139 INFO ll avg 19720/38400 [ 8868838.449667] ( 679100.000000) c0 ll-schedule ./schedule/ll_schedule.c:139 INFO ll avg 19658/38400 [ 10802857.643650] ( 565147.187500) c0 ll-schedule ./schedule/ll_schedule.c:139 INFO ll avg 19822/38400 --cut-- Signed-off-by: Kai Vehmanen <[email protected]>

kv2019i · 2021-11-01T16:45:19Z

We have similar trace output in SOF Zephyr builds and they've proven to be very useful for DSP algorithm debug and to track performance changes in CI and daily tests.

marc-hb · 2021-11-01T16:48:03Z

We have similar trace output in SOF Zephyr builds and they've proven to be very useful for DSP algorithm debug and to track performance changes in CI and daily tests.

They've also proven very verbose and preventing the DSP from going to sleep. I comment them out in my workspace.

kv2019i · 2021-11-01T17:02:20Z

@marc-hb wrote:

They've also proven very verbose and preventing the DSP from going to sleep. I comment them out in my workspace.

Why there is no bug about this? That sounds like an error if it prevents DSP from going to sleep.

It's a good question what the default should be for reporting. We can put this under Kconfig, but this should be enabled in CI for this to provide as a useful too to track DSP load.

marc-hb · 2021-11-01T17:49:53Z

Related to older discussion in thesofproject/sof-test#726 [FEATURE] sof-test should run without DMA trace - only etrace

keyonjie · 2021-11-02T04:45:20Z

@kv2019i thank for thinking of this.

Basically the idea here is simple enough, though the usage of CONFIG_SYSTICK_PERIOD could be not accurate sometimes, as we are not always scheduled in 1 ms period.

OTOH, we actually already have similar mechanism (no percentage data though), please check the perf_cnt_xxx(), if we select CONFIG_PERFORMANCE_COUNTERS from kconfig, we will get logs like "perf ll_work peak plat %u cpu %u" with info level, where it only cares the peak but not the average value. To add average or percentage data support, better to do that with the existed mechanism.

If we want to get accurate percentage data, we might need to leverage the new_target_tick value, e.g.

percentage = 100 * (cycles1 - cycles0) / (new_target_tick - cycles0);

kv2019i · 2021-11-02T08:09:55Z

@keyonjie wrote:

Basically the idea here is simple enough, though the usage of CONFIG_SYSTICK_PERIOD could be not accurate sometimes, as we are not always scheduled in 1 ms period.

Good point. I ported this from the new Zephyr-based scheduler that has a fixed tick. But indeed, especially if we want to flag errors from the data, the target tick should be taken into account. That doesn affect the other part:

OTOH, we actually already have similar mechanism (no percentage data though), please check the perf_cnt_xxx(), if we
select CONFIG_PERFORMANCE_COUNTERS from kconfig, we will get logs like "perf ll_work peak plat %u cpu %u" with
info level, where it only cares the peak but not the average value. To add average or percentage data support, better to

I looked again at this and the main problem this is designed to be a debugging feature.

There is definite value if the DSP load tracking can be made lightweight enough so we can leave it on in at least CI builds, and even in the product builds. This does limit what we can do. The existing mechanisms can print out traces at a much higher rate (when problems arise) and they do more calculations per trace.

The peak data is useful for debugging inprogress algorithms, but I'd argue a periodic DSP load measurement is more useful to debug issues with e.g. tplg changes and/or regressions in CI.

What this PR implements is not zero cost either, but due to fixed reporting interval, the cost is at least deterministic. But that does mean we don't really want to spend cycles to pretty-print the traces and e.g. the target-tick makes this a bit more challenging, but let's have a look. Maybe we can bake it in.

Of course, if the consensus is that even this is too much for non-R&D builds, then we might be better off to just add this functionality to existing PERFORMANCE_COUNTERS. Looking forward for input.

lyakh · 2021-11-02T09:07:30Z

src/schedule/ll_schedule.c

+
+	if (++load->checks == 1 << CHECKS_WINDOW_SIZE) {
+		load->cycles_sum >>= CHECKS_WINDOW_SIZE;
+		max = clock_us_to_ticks(PLATFORM_DEFAULT_CLOCK, CONFIG_SYSTICK_PERIOD);


This should be light-weight, but the above line performs run-time divisions and multiplications with constants, even if only every N runs. Since clock_us_to_ticks() is a real function, I don't think this will be pre-calculated and optimised away by the compiler. But since it's always the same calculation (as long as the clock frequency doesn't change), this should be calculated once and stored. To also take frequency calculation into account we can also store the frequency (index) that was used to calculate this, and if it changes - recalculate.
But yes, this would also somehow have to account for imprecise XTOS ticks.

The only part doing the div/mult is the clock_us_to_ticks() and this will only happen once per second (and you are right, we could do this at init since it's all const), all the rest is shifts which are optimal on xtensa

singalsu · 2021-11-02T09:41:30Z

src/schedule/ll_schedule.c

+	if (++load->checks == 1 << CHECKS_WINDOW_SIZE) {
+		load->cycles_sum >>= CHECKS_WINDOW_SIZE;
+		max = clock_us_to_ticks(PLATFORM_DEFAULT_CLOCK, CONFIG_SYSTICK_PERIOD);
+		tr_info(&ll_tr, "ll avg %u/%u", load->cycles_sum, max);


For automatic parsing, using blank as separator for numbers instead of "/" could be nicer?

lgirdwood

LGTM, but we could do the clock us conversion at init since it always const.

lgirdwood · 2021-11-03T21:56:00Z

src/schedule/ll_schedule.c

+
+	if (++load->checks == 1 << CHECKS_WINDOW_SIZE) {
+		load->cycles_sum >>= CHECKS_WINDOW_SIZE;
+		max = clock_us_to_ticks(PLATFORM_DEFAULT_CLOCK, CONFIG_SYSTICK_PERIOD);


The only part doing the div/mult is the clock_us_to_ticks() and this will only happen once per second (and you are right, we could do this at init since it's all const), all the rest is shifts which are optimal on xtensa

kv2019i · 2021-12-08T17:00:37Z

No time to progress with this, so closing.

kv2019i requested review from plbossart, lyakh, keyonjie, lgirdwood, singalsu, mmaka1 and marc-hb November 1, 2021 16:44

kv2019i requested a review from mrajwa as a code owner November 1, 2021 16:44

lyakh reviewed Nov 2, 2021

View reviewed changes

singalsu reviewed Nov 2, 2021

View reviewed changes

kv2019i mentioned this pull request Nov 2, 2021

[DRAFT] cavs-nocodec: drop the extra mixers #4948

Closed

marc-hb mentioned this pull request Nov 3, 2021

[BUG] cavs-nocodec tplg puts too much content on one DSP core #4911

Closed

lgirdwood reviewed Nov 3, 2021

View reviewed changes

kv2019i closed this Dec 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ll_schedule: add a DSP load tracker to ll_tasks_execute #4943

ll_schedule: add a DSP load tracker to ll_tasks_execute #4943

kv2019i commented Nov 1, 2021 •

edited

Loading

kv2019i commented Nov 1, 2021

marc-hb commented Nov 1, 2021

kv2019i commented Nov 1, 2021 •

edited

Loading

marc-hb commented Nov 1, 2021 •

edited

Loading

keyonjie commented Nov 2, 2021 •

edited

Loading

kv2019i commented Nov 2, 2021

lyakh Nov 2, 2021

lgirdwood Nov 3, 2021

singalsu Nov 2, 2021

lgirdwood left a comment

lgirdwood Nov 3, 2021

kv2019i commented Dec 8, 2021

ll_schedule: add a DSP load tracker to ll_tasks_execute #4943

ll_schedule: add a DSP load tracker to ll_tasks_execute #4943

Conversation

kv2019i commented Nov 1, 2021 • edited Loading

kv2019i commented Nov 1, 2021

marc-hb commented Nov 1, 2021

kv2019i commented Nov 1, 2021 • edited Loading

marc-hb commented Nov 1, 2021 • edited Loading

keyonjie commented Nov 2, 2021 • edited Loading

kv2019i commented Nov 2, 2021

lyakh Nov 2, 2021

Choose a reason for hiding this comment

lgirdwood Nov 3, 2021

Choose a reason for hiding this comment

singalsu Nov 2, 2021

Choose a reason for hiding this comment

lgirdwood left a comment

Choose a reason for hiding this comment

lgirdwood Nov 3, 2021

Choose a reason for hiding this comment

kv2019i commented Dec 8, 2021

kv2019i commented Nov 1, 2021 •

edited

Loading

kv2019i commented Nov 1, 2021 •

edited

Loading

marc-hb commented Nov 1, 2021 •

edited

Loading

keyonjie commented Nov 2, 2021 •

edited

Loading