Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hm2 read(), write() functions need signficant time #70

Open
mhaberler opened this issue Aug 20, 2016 · 22 comments
Open

hm2 read(), write() functions need signficant time #70

mhaberler opened this issue Aug 20, 2016 · 22 comments

Comments

@mhaberler
Copy link
Contributor

observed running irqtest.hal, on terasic de0, times are in nS:

halcmd: show funct
Exported Functions:
  Comp   Inst CodeAddr         Arg              Users Type    Name                              Runtime    Maxtime
    66        00000000b6d3fe1d 0000000000000000     0 user    delinst                                 0          0
    75     76 00000000b53c06d9 0000000000041bc8     0 xthread hm2_de0n.0.pet_watchdog                 0          0
    75     76 00000000b53a9e39 0000000000041bc8     1 xthread hm2_de0n.0.read                     59193     107665
    75     76 00000000b53a9fc1 0000000000041bc8     1 xthread hm2_de0n.0.read_gpio                11070      34502
    75     76 00000000b53c3291 0000000000041bc8     1 xthread hm2_de0n.0.waitirq                 861718     999784
    75     76 00000000b53a9ef1 0000000000041bc8     1 xthread hm2_de0n.0.write                    57493     133106
    75     76 00000000b53a9ff5 0000000000041bc8     1 xthread hm2_de0n.0.write_gpio               21101      53182
    66        00000000b6d3fcfd 0000000000000000     0 user    newinst                                 0          0
  1099   1100 00000000b53858d1 00000000b6546820     1 xthread rate                                 1310      23361
  1110   1111 00000000b5374891 00000000b6546848     1 xthread servo-period                          970      25661
  1118   1119 00000000b5363969 00000000b6546858     1 xthread sp-stats                             1420      26741
  1085        00000000b5396be1 00000000b65467e0     1 thread  test.update                          2830      24781

read() and write() are almost 60uS - together that's 120uS or 12% of the cycle budget at 1kHz; the waitirq() time basically shows the budget left for motion, kins etc

at 5kHz we have only about 50uS left for actual work:

halcmd: show funct
Exported Functions:
  Comp   Inst CodeAddr         Arg              Users Type    Name                              Runtime    Maxtime
    66        00000000b6d3fe1d 0000000000000000     0 user    delinst                                 0          0
    75     76 00000000b53c06d9 0000000000041bc8     0 xthread hm2_de0n.0.pet_watchdog                 0          0
    75     76 00000000b53a9e39 0000000000041bc8     1 xthread hm2_de0n.0.read                     55762     107665
    75     76 00000000b53a9fc1 0000000000041bc8     1 xthread hm2_de0n.0.read_gpio                10461      41402
    75     76 00000000b53c3291 0000000000041bc8     1 xthread hm2_de0n.0.waitirq                  49953     999784
    75     76 00000000b53a9ef1 0000000000041bc8     1 xthread hm2_de0n.0.write                    53602     133106
    75     76 00000000b53a9ff5 0000000000041bc8     1 xthread hm2_de0n.0.write_gpio               20501      56482
    66        00000000b6d3fcfd 0000000000000000     0 user    newinst                                 0          0
  1099   1100 00000000b53858d1 00000000b6546820     1 xthread rate                                  990      27521
  1110   1111 00000000b5374891 00000000b6546848     1 xthread servo-period                          950      32741
  1118   1119 00000000b5363969 00000000b6546858     1 xthread sp-stats                             1250      26741
  1085        00000000b5396be1 00000000b65467e0     1 thread  test.update                          2321      33982

this suggests looking into where the time is spent

@mhaberler
Copy link
Contributor Author

hm, the above samples were from the RIP build which is compiled with -O0 so gdb doesnt stumble across reordered code

with -O2 (as with the package build) things look a bit better (at 5Khz):

halcmd: show funct
Exported Functions:
  Comp   Inst CodeAddr         Arg              Users Type    Name                              Runtime    Maxtime
    66        00000000b66eeba9 0000000000000000     0 user    delinst                                 0          0
    75     76 00000000b562f369 000000000003bbc8     0 xthread hm2_de0n.0.pet_watchdog                 0          0
    75     76 00000000b5620db9 000000000003bbc8     1 xthread hm2_de0n.0.read                     36661      99374
    75     76 00000000b5620ced 000000000003bbc8     1 xthread hm2_de0n.0.read_gpio                 4301      47512
    75     76 00000000b5630f41 000000000003bbc8     1 xthread hm2_de0n.0.waitirq                 125696    1041507
    75     76 00000000b5620d09 000000000003bbc8     1 xthread hm2_de0n.0.write                    33001     114825
    75     76 00000000b5620cc5 000000000003bbc8     1 xthread hm2_de0n.0.write_gpio                7861      52882
    66        00000000b66eedc5 0000000000000000     0 user    newinst                                 0          0
  1099   1100 00000000b54dd651 00000000b6668820     1 xthread rate                                  770      42592
  1110   1111 00000000b54cc651 00000000b6668848     1 xthread servo-period                          850      28551
  1118   1119 00000000b54bb6c1 00000000b6668858     1 xthread sp-stats                             1220      43932
  1085        00000000b54ee8d9 00000000b66687e0     1 thread  test.update                          2050      42902

played with perf top:

  67.44%  [kernel]                  [k] arch_cpu_idle
   3.85%  [kernel]                  [k] _raw_spin_unlock_irq
   2.16%  hm2_soc_ol.so             [.] hm2_soc_read
   2.07%  hostmot2.so               [.] hm2_tram_read
   1.71%  [kernel]                  [k] _raw_spin_unlock_irqrestore
   1.29%  hm2_soc_ol.so             [.] hm2_soc_write
   1.27%  hostmot2.so               [.] hm2_ioport_update
   1.13%  hostmot2.so               [.] hm2_tram_write
   0.80%  hostmot2.so               [.] dummy_queue_read
   0.71%  hostmot2.so               [.] hm2_ioport_gpio_read
   0.70%  hostmot2.so               [.] hm2_encoder_process_tram_read
   0.66%  hostmot2.so               [.] hm2_ioport_gpio_prepare_tram_write
   0.64%  hostmot2.so               [.] hm2_stepgen_prepare_tram_write
   0.50%  [kernel]                  [k] __apbt_read_clocksource
   0.50%  hostmot2.so               [.] hm2_ioport_gpio_write
   0.49%  hostmot2.so               [.] hm2_watchdog_write
   0.44%  hostmot2.so               [.] hm2_sserial_read_pins
   0.43%  hostmot2.so               [.] hm2_sserial_prepare_tram_write
   0.37%  hostmot2.so               [.] hm2_ioport_gpio_process_tram_read
   0.37%  hostmot2.so               [.] dummy_queue_write
   0.31%  hostmot2.so               [.] hm2_stepgen_write
   0.30%  [kernel]                  [k] ktime_get_ts64

I'll have a look at the hm2_soc_read and hm2_soc_write functions tomorrow. I think this byte-wise reading is not necessary.

@dkhughes
Copy link
Contributor

If you want to burst read we need to make sure the firmware can handle it. For example, I don't think the AXI lite interface I implemented can do burst reading out of the box.

At the driver level they implement a translation ram, I think which is an attempt to avoid calling write repeatedly in the driver level. We could expose a function that reads the entire range specified by the tram instead of individual reads and writes if you think it's function call overhead from hostmot2 down to hm2 llio.

@dkhughes
Copy link
Contributor

One other thing to note:

The AXI bus and avalon bus are clocked much slower than the processor. For instance, I can't go any faster than 100MHz on the bus without timing closure issues. That's probably what is causing the hold up. I'm not sure what the avalon bus is clocked at, but I imagine it is something similar.

@mhaberler
Copy link
Contributor Author

mhaberler commented Aug 20, 2016

batching might help, yes - not sure how invasive that would be

for now I concentrate on dumbing-down - replacing these supersmart loops by memcpy (which is super-optimized anyway) shaves another 10-20% off hm2_soc_read/hm2_soc_write..

so with that we're in the 70uS ballpark for all four functs - that's ok for now

@dkhughes
Copy link
Contributor

Wow, that's a lot. With compiler optimizations turned on? The loops were a guard of some kind to protect misaligned reads, but I don't think they're entirely necessary. I'm a fan of memcpy replacement..

@mhaberler
Copy link
Contributor Author

yes, -O2 (default)

@mhaberler
Copy link
Contributor Author

perf top shows the candidate list to look at - yes, streamlining hm2_tram_* looks promising:

   2.00%  hostmot2.so               [.] hm2_tram_read
   1.32%  hostmot2.so               [.] hm2_ioport_update
   1.24%  hostmot2.so               [.] hm2_tram_write
   0.70%  hostmot2.so               [.] hm2_encoder_process_tram_read
   0.70%  hostmot2.so               [.] hm2_ioport_gpio_read
   0.65%  hostmot2.so               [.] hm2_stepgen_prepare_tram_write

perf is a great tool, recommended (I use linux-perf-4.6 from jessie-backports on the Altera)

@dkhughes
Copy link
Contributor

Is it necessary to call both ioport_update and tram_read / write? I think that just duplicates the same access to the hm2 firmware.

@dkhughes
Copy link
Contributor

This is what I'm driving at https://github.com/machinekit/machinekit/blob/master/src/hal/drivers/mesa-hostmot2/hostmot2.c#L78. The read/write functions automatically pull/push in the GPIO status.

Is this in your config?

addf hm2_de0n.0.read           servo
addf hm2_de0n.0.read_gpio      servo

addf test.update               servo
addf rate              servo

addf hm2_de0n.0.write          servo
addf hm2_de0n.0.write_gpio     servo

Would that be a duplicate GPIO read write per cycle to the firmware?

@mhaberler
Copy link
Contributor Author

yes, here

not sure, maybe @cdsteinkuehler or PCW know more? or Michael @micges for sure?

@dkhughes
Copy link
Contributor

Yeah, I'm willing to bet that's a duplicate interaction. I don't call the gpio directly in the configs because they are updated automatically. I think the gpio update functions are there if you don't need all of the other stuff hm2 provides and want to save the read time because of the slower bus speeds I mentioned (or even worse, bus speed across slow ethernet).

Maybe try removing those two calls. The other thing I do is remove any extra modules from the config firmware, which reduces the amount of data passed back and forth across the slower buses. E.g, instead of 10 stepgens, only instantiate 4, 5, or 6.

@mhaberler
Copy link
Contributor Author

well, our website has the answer..

hm2_..read_gpio

Read the GPIO input pins. Note that the effect of this function is a subset of the effect of the .read() function described above. Normally only .read() is used. The only reason to call this function is if you want to do GPIO things in a faster-than-servo thread. (This function is not available on the 7i43 due to limitations of the EPP bus.)

hm2_..write_gpio

Write the GPIO control registers and output pins. Note that the effect of this function is a subset of the effect of the .write() function described above. Normally only .write() is used. The only reason to call this function is if you want to do GPIO things in a faster-than-servo thread. (This function is not available on the 7i43 due to limitations of the EPP bus.)

summary: if one calls hm2....read, hm2...read_gpio is not needed, same for write/write_gpio

@mhaberler
Copy link
Contributor Author

it's interesting to compare the IRQ- and non-IRQ versions of perf output

perf_4.6 top -d 10, 5kHz: current hm2 funct usage is at ca 60uS so doable I'd say

nevertheless a couple of things can be read out of the below stats:

  • there is significant overhead through the uio driver: I think the _raw_spin_unlock_irq and _raw_spin_unlock_irqrestore stem from uio_read/uio_write functions - if one could waste a core for an RT thread, a polling solution would incur less overhead
  • hm2_tram_read/write are the prime candidates - it looks like they are O(n) with the number of pins+instances so cutting down on unneeded I/O would help
  • the good news is that using rtapi_get_time() in thread_task has no significant impact on the budget (_rtapi_get_time + sys_clock_gettime + ktime_get_ts64 + __apbt_read_clocksource)

IRQ version:

Samples: 41K of event 'cpu-clock', Event count (approx.): 10370343750
Overhead  Shared Object       Symbol
  51.85%  [kernel]            [k] arch_cpu_idle
   5.89%  [kernel]            [k] _raw_spin_unlock_irq
   4.10%  hostmot2.so         [.] hm2_tram_read
   3.77%  [kernel]            [k] _raw_spin_unlock_irqrestore
   2.87%  hostmot2.so         [.] hm2_tram_write
   1.38%  hostmot2.so         [.] hm2_ioport_update
   1.27%  hostmot2.so         [.] hm2_ioport_gpio_prepare_tram_write
   1.20%  libc-2.19.so        [.] 0x0005f0ec
   1.14%  hostmot2.so         [.] hm2_encoder_process_tram_read
   1.13%  libc-2.19.so        [.] 0x0005f0e4
   1.08%  hm2_soc_ol.so       [.] hm2_soc_read
   1.06%  hm2_soc_ol.so       [.] hm2_soc_write
   0.90%  hostmot2.so         [.] hm2_stepgen_prepare_tram_write
   0.89%  [kernel]            [k] __apbt_read_clocksource
   0.86%  hostmot2.so         [.] hm2_sserial_prepare_tram_write
   0.80%  hostmot2.so         [.] hm2_ioport_gpio_process_tram_read
   0.79%  libc-2.19.so        [.] 0x0005f0d4
   0.75%  hostmot2.so         [.] hm2_stepgen_write
   0.67%  hostmot2.so         [.] hm2_sserial_read_pins
   0.53%  [kernel]            [k] ktime_get_ts64
   0.52%  hostmot2.so         [.] hm2_encoder_update_control_register.isra.0
   0.50%  hostmot2.so         [.] hm2_stepgen_process_tram_read
   0.47%  [kernel]            [k] preempt_count_sub
   0.45%  [kernel]            [k] vector_swi
   0.45%  [kernel]            [k] preempt_count_add
   0.42%  hal_lib.so          [.] thread_task
   0.30%  libc-2.19.so        [.] 0x0005f0c8
   0.30%  [kernel]            [k] migrate_enable
   0.29%  [kernel]            [k] sys_clock_gettime
   0.29%  [kernel]            [k] check_preemption_disabled
   0.29%  hostmot2.so         [.] hm2_encoder_instance_update_position.isra.1
   0.28%  hostmot2.so         [.] hm2_write
   0.27%  [kernel]            [k] rcu_note_context_switch
   0.27%  hostmot2.so         [.] hm2_dpll_write
   0.23%  hostmot2.so         [.] dummy_queue_read
   0.23%  rtapi.so            [.] _rtapi_get_time
   0.22%  [kernel]            [k] rt_spin_lock
   0.21%  [kernel]            [k] schedule
   0.21%  libc-2.19.so        [.] 0x000178e6
   0.19%  hostmot2.so         [.] hm2_encoder_write
   0.19%  [uio]               [k] uio_read
   0.19%  [kernel]            [k] migrate_disable
   0.19%  [kernel]            [k] rcu_idle_exit
For a higher level overview, try: perf top --sort comm,dso

for comparision, without IRQ - task at 5Khz:

Samples: 725K of event 'cpu-clock', Event count (approx.): 112470974030
Overhead  Shared Object             Symbol
  84.83%  [kernel]                  [k] arch_cpu_idle
   6.69%  hostmot2.so               [.] hm2_encoder_process_tram_read
   1.92%  hostmot2.so               [.] hm2_encoder_instance_update_position.isra.1
   1.67%  hostmot2.so               [.] hm2_stepgen_process_tram_read
   0.99%  hostmot2.so               [.] hm2_resolver_process_tram_read
   0.36%  hostmot2.so               [.] hm2_read
   0.28%  hostmot2.so               [.] 0x000017e4
   0.21%  librtapi_math.so.0        [.] rtapi_fabs
   0.18%  hostmot2.so               [.] 0x000017e8
   0.10%  libc-2.19.so              [.] strcmp
   0.10%  hostmot2.so               [.] __floatdidf
   0.08%  hostmot2.so               [.] __aeabi_dadd
   0.08%  libc-2.19.so              [.] memset
   0.05%  libpthread-2.19.so        [.] __pthread_rwlock_unlock
   0.05%  perf_4.6                  [.] 0x00090708
   0.05%  perf_4.6                  [.] 0x000b7f2c
   0.04%  perf_4.6                  [.] 0x00093f00
   0.03%  perf_4.6                  [.] 0x00093f02
   0.03%  [kernel]                  [k] _raw_spin_unlock_irq
   0.03%  perf_4.6                  [.] 0x000918fc
   0.03%  libpthread-2.19.so        [.] __pthread_rwlock_wrlock
   0.03%  perf_4.6                  [.] 0x00091d2c
   0.03%  libpthread-2.19.so        [.] pthread_mutex_trylock
   0.03%  libpthread-2.19.so        [.] __pthread_rwlock_rdlock
   0.03%  perf_4.6                  [.] 0x00096a6e
   0.03%  perf_4.6                  [.] 0x00006a5c
   0.02%  [kernel]                  [k] stmmac_mdio_read
   0.02%  perf_4.6                  [.] 0x00096a88
   0.02%  libpthread-2.19.so        [.] pthread_mutex_lock
   0.02%  perf_4.6                  [.] 0x00093efc
   0.02%  perf_4.6                  [.] 0x00096a52
   0.02%  perf_4.6                  [.] 0x00096a6a
   0.02%  perf_4.6                  [.] 0x00093ef0
   0.02%  perf_4.6                  [.] 0x00096a62
   0.02%  hostmot2.so               [.] hm2_ioport_gpio_process_tram_read
   0.02%  perf_4.6                  [.] 0x00096a74
   0.02%  perf_4.6                  [.] 0x0001d35e
   0.02%  libslang.so.2.3.0         [.] SLsmg_write_chars
   0.01%  libavahi-common.so.3.5.3  [.] avahi_unescape_label
   0.01%  perf_4.6                  [.] 0x00091d72
   0.01%  perf_4.6                  [.] 0x00091d78
   0.01%  libpthread-2.19.so        [.] __pthread_mutex_unlock_usercnt
   0.01%  perf_4.6                  [.] 0x00096a50

@dkhughes
Copy link
Contributor

hm2_tram_read/write are the prime candidates - it looks like they are O(n) with the number of pins+instances so cutting down on unneeded I/O would help

This aligns with what I've seen in the firmware and the driver code. It makes sense intuitively, the more pins/modules you add, the more you need to swap data back and forth in a linearly increasing fashion.

there is significant overhead through the uio driver: I think the _raw_spin_unlock_irq and _raw_spin_unlock_irqrestore stem from uio_read/uio_write functions - if one could waste a core for an RT thread, a polling solution would incur less overhead

I wonder if this is from the driver or from some interaction with the interrupt controller. For instance, all of these interrupts are shared interrupts. If you have more than one interrupt registered, based on that document you linked from Linus, all registered interrupt handlers get notified, and they check if it is theirs to handle. Perhaps by turning on our uio interrupt, it's showing the additional overhead of that notification framework along with the overhead of the uio driver. That leads me to wonder:

What are the debug config options in your kernel config? Lot's of debugging turned on?

From your testing and my testing, it looks like we sacrifice some processing time for less interrupt jitter - a fair trade in my mind.

@mhaberler
Copy link
Contributor Author

@dkhughes - that is the stock cv kernel - I think perf should work just alike on the zynq

the tradeoff is ok - I'm just exploring options

assume we were to do an alternative polling solution for external timing - let's think through if/how that would be possible. Naive approach:

  • do NOT enable the IRQ in the DTBO
  • IRQStatus: set Bit1 "Irq mask" to 1
  • ClearIRQ: write to clear the IRQ bit
  • in the wait function, loop until ClearIRQ becomes non-zero, then write to ClearIRQ to reset

does that make sense? would we cause an unclaimed IRQ making the kernel complain ".. nobody cared" ?

@dkhughes
Copy link
Contributor

dkhughes commented Aug 21, 2016

does that make sense? would we cause an unclaimed IRQ making the kernel complain ".. nobody cared" ?

No, that should be totally fine, just don't register the IRQ with the controller. If the interrupt isn't registered it shouldn't cause any trouble. That being said, we probably need to go back to the polling version of hm2_uio driver that was dkms built just to be safe.

Then, we need to poll the IRQ status register bit '0' (hm2's status register, not the ARM interrupt controller) in waitirq in a tight spin loop. When it goes true (or times out) we break out, and write clear interrupt to hm2 firmware just like we are doing now. It will eat an entire thread, but with the 2 cores I think we're fine. We might have our cake and eat it to in that case, since it would skip the uio_irq portion entirely, we just deal directly with the firmware.

@dkhughes
Copy link
Contributor

CONFIG_DEBUG_INFO=y

Might be worth a shot of turning down the debug level, and maybe turn on the quiet option.

@mhaberler
Copy link
Contributor Author

The picture is a bit different on the zynq, not sure I see a common pattern yet which warrants action..

enable jessie-backports and install linux-perf

irqtest.hal/5kHz, perf_4.6 top --cpu=0 -d 10

Samples: 1M of event 'cycles:ppp', Event count (approx.): 27839070423
Overhead  Shared Object              Symbol
  19.99%  [kernel]                   [k] cpuidle_enter_state
  15.57%  [kernel]                   [k] _raw_spin_unlock_irq
   2.27%  hm2_soc_ol.so              [.] hm2_soc_write
   2.16%  [kernel]                   [k] tick_nohz_idle_enter
   1.75%  [kernel]                   [k] _raw_spin_unlock_irqrestore
   1.62%  hostmot2.so                [.] hm2_stepgen_instance_prepare_tram_write
   1.52%  hostmot2.so                [.] hm2_ioport_gpio_prepare_tram_write
   1.51%  hostmot2.so                [.] hm2_ioport_gpio_process_tram_read
   1.45%  hostmot2.so                [.] hm2_ioport_update
   1.15%  hm2_soc_ol.so              [.] hm2_soc_read
   1.09%  hostmot2.so                [.] hm2_stepgen_instance_position_control
   1.07%  [kernel]                   [k] run_ksoftirqd
   1.07%  [kernel]                   [k] vector_swi
   1.02%  [kernel]                   [k] walk_stackframe
   0.93%  [kernel]                   [k] preempt_count_add
   0.87%  [kernel]                   [k] sys_clock_gettime
   0.79%  [kernel]                   [k] check_preemption_disabled
   0.76%  hostmot2.so                [.] hm2_stepgen_process_tram_read
   0.76%  [kernel]                   [k] unwind_frame
   0.75%  hostmot2.so                [.] hm2_stepgen_prepare_tram_write
   0.72%  [kernel]                   [k] schedule
   0.71%  [kernel]                   [k] preempt_count_sub
   0.70%  [kernel]                   [k] ktime_get_ts64

@dkhughes
Copy link
Contributor

That aligns with what I mentioned before about the slow bus speed at the processor -> fpga interface. The zynq side is configured with a 100MHz axi interface. I'd have to double check, but I think the cv interface is only clocked at 50MHz, which would double your transfer time before any (if there is any) extra overhead incurred by an avalon->axi merge.

@mhaberler
Copy link
Contributor Author

re the tight spin loop.. starvation is always an issue whith an RT thread and spinning, and the cycles are lost for good

one way could be to adopt an old idea of PCW and do the following:

  • after executing the last funct, calculate how much of the cycle budget is left - we do have the current timestamps already (for example's sake: leftover=300uS)
  • use a safe upper limit for latency, (for example margin=100uS)
  • execute a wait for (leftover - margin = 200uS) - that would be cycles gained for other threads
  • when the wait returns, start spinning for IRQbit set
  • if the margin is set to -1, never wait=always spin - so we can test how effective this is

@cdsteinkuehler
Copy link
Contributor

On 8/22/2016 10:23 AM, Michael Haberler wrote:

re the tight spin loop.. starvation is always an issue whith an RT thread and
spinning, and the cycles are lost for good

The way to make things fast is to avoid reads.

If it seriously becomes an issue to read the required data from the
FPGA (which is not a big stretch, considering how fast processors run
these days, and that the processor is pretty much stalled for the
duration of the required reads), the hardware needs to initiate writes
instead of the CPU doing reads:

  • Periodic hardware timer tick happens
  • Hardware latches all required values at this point in time
  • Hardware performs bus-mastering DMA into main memory, typically via
    a circular buffer or similar mechanism
  • Hardware generates an interrupt, letting the CPU and real-time
    thread know there is new work to be done

The overhead of doing bus-mastering DMA is a pain, but it provides
much better performance. Writes are pipelined throughout modern
interconnect fabrics, can happen in large bursts, and generally help
keep system interconnect throughput high. Reads generally bring
everything to a screeching halt for the full bus turn-around time
(read requested to data returned).

There are some memory semantics you need to be careful with when doing
DMA, but nothing much more than what we already have to do to use
circular buffers in HAL on multi-core machines.

For now, switching to memcpy seems like the low-hanging fruit. :)

Charles Steinkuehler
[email protected]

@dkhughes
Copy link
Contributor

I don't need the extra performance yet, but I have to say, full blown dma between the host and the fpga sounds like fun. I'd have to read the TRM a lot more for the zynq first...

Maybe after some of the other wishlist things get ticked off...like allowing aux gpio when cutter comp is turned on...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants