Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD Conversion Functions #386

Merged
merged 2 commits into from
Aug 6, 2024
Merged

SIMD Conversion Functions #386

merged 2 commits into from
Aug 6, 2024

Conversation

RalphSteinhagen
Copy link
Member

This PR introduces a set of conversion functions that for production use should be inlined/compile-time-merged.
These functions leverage SIMD capabilities to improve the performance of complex data transformations. Original vectorisiation example by @mattkretz: https://godbolt.org/z/7MT71MbGz

New Conversion Blocks:

  • ComplexToInterleaved: Converts a stream of complex numbers to an interleaved stream of real and imaginary components.
  • InterleavedToComplex: Converts an interleaved stream of real and imaginary components back to a stream of complex numbers.
  • Abs: Computes the magnitude of complex or arithmetic input streams.
  • Real: Extracts the real component of complex input streams.
  • Imag: Extracts the imaginary component of complex input streams.
  • Arg: Computes the argument (phase angle) of complex input streams.
  • RadiansToDegree: Converts radians to degrees.
  • DegreeToRadians: Converts degrees to radians.
  • ToRealImag: Decomposes complex numbers into their real and imaginary components.
  • RealImagToComplex: Combines real and imaginary components into complex numbers.
  • ToMagPhase: Decomposes complex numbers into their magnitude and phase components.
  • MagPhaseToComplex: Combines magnitude and phase components into complex numbers.

Performance Assessment:

The following table summarizes the nominal clock cycles for each assembly instruction, demonstrating the efficiency improvements achieved through SIMD vectorization. Some functions do not (yet) use SIMD implementation due to the missing SIMD-permute API. This should be improved once this API becomes available.

As an illustrative example to assess the asm-related efficiency of various possible implementations: https://godbolt.org/z/obexWhjMq

Function x86_64 (Zen3) ARM (Raspberry Pi 5)
Latency Throughput Latency Throughput
ProcessOne 4 cycles 1 cycle 4 cycles 1 cycle
ProcessOneScalling 10 cycles 3 cycles 9 cycles 4 cycles
ProcessOneClamp 16 cycles 8 cycles 12 cycles 4 cycles
complexToComponents 8 cycles 4 cycles 6 cycles 3 cycles
complexToArg 60 cycles 15 cycles 80 cycles 20 cycles
complexToArg2 60 cycles 15 cycles 80 cycles 20 cycles
complexToArgGeneric 45 cycles 11 cycles 60 cycles 15 cycles
complexToArgGeneric2 15 cycles 3 cycles 5 cycles 2 cycles
complexToAbsGeneric 45 cycles 11 cycles 60 cycles 15 cycles
complexToAbsGeneric2 10 cycles 2 cycles 3 cycles 1 cycle
complexToAbs 12 cycles 6 cycles 14 cycles 7 cycles
complexToAbs2 12 cycles 6 cycles 14 cycles 7 cycles

@RalphSteinhagen RalphSteinhagen linked an issue Aug 3, 2024 that may be closed by this pull request
23 tasks
@RalphSteinhagen RalphSteinhagen force-pushed the simdConverter branch 2 times, most recently from a9781c0 to 4d03561 Compare August 5, 2024 09:02
Copy link
Contributor

@drslebedev drslebedev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!

Overall, it looks good, and the unit tests are well done.

I’ve suggested several changes in the comments regarding the compilation error for clang and conversion warnings.

Please review the comments, once these issues are resolved, the PR can be merged.

blocks/basic/test/qa_Converter.cpp Outdated Show resolved Hide resolved
Functions implemented:
- ComplexToInterleaved
- InterleavedToComplex
- Abs
- Real
- Imag
- Arg
- RadiansToDegree
- DegreeToRadians
- ToRealImag
- RealImagToComplex
- ToMagPhase
- MagPhaseToComplex

Based on the original vectorisiation example by @mattkretz: https://godbolt.org/z/7MT71MbGz
Extended example to assess the asm-related efficiency: https://godbolt.org/z/obexWhjMq

Some functions have (not yet) a SIMD implementation due to the missing permute API.

Signed-off-by: Ralph J. Steinhagen <[email protected]>
Signed-off-by: rstein <[email protected]>
Signed-off-by: Ralph J. Steinhagen <[email protected]>
@drslebedev
Copy link
Contributor

There are several issues with the failing test including: 1) qa_Message, 2) qa_FilterTool, 3) long compilation times for gcc, 4) during the first run, the qa_DataSink for emcc | Debug test also failed due to timeouts, but it passed upon rerun.

This PR will be merged, but the remaining issues and the problems with the flaky tests need to be addressed and investigated further.

@drslebedev drslebedev merged commit c01c93c into main Aug 6, 2024
8 of 11 checks passed
@drslebedev drslebedev deleted the simdConverter branch August 6, 2024 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Category 'Type Converters' (23 blocks)
2 participants