Rewrite ascii check to allow compiler auto vectorization #133

rhpvorderman · 2024-04-08T08:57:05Z

Rewriting it to use size_t will create the most optimal code for all platofrms. Also I did a small optimization that works well for small strings.

Say if the string is length 13. Previously the function would check 8 bytes at once, then check 5 bytes individually. What I do now is simply substract 8 from the length and check an 8 byte vector at the end. This is much more optimal for smaller strings.

Not that we need that, as we check 128kb chunks. But it eliminates a loop from the function, so might as well leave it in.

The 8byte loop is unrolled to do 4 OR operations simultaneously. This is automatically compiled to do two SSE2 OR operations with -O2 on GCC. As a result this code is not slower than the code that it replaces with had vectors inserted manually.

The resulting code is much more portable as a result, and should also compile to use vectors on ARM platforms.

EDIT: See also godbolt compiler explorer

rhpvorderman · 2024-04-08T09:38:58Z

Damn! Clang does not recognize the optimization opportunity. But it does so for the last loop (that is not unrolled). However if I write it like that, GCC does not recognize the optimization opportunity. In any case, this code + GCC optimization results in the most optimal assembly. But it will be not usable for clang users.

So, best to not merge this, I guess?

EDIT: I did a bit of benchmarking. Due to the unrolling, the code is more optimal. Even in such a way that before and after this PR there is no significant difference with CLANG. With GCC however, it is even faster.

rhpvorderman · 2024-04-24T14:01:36Z

@marcelm, friendly reminder ping. I decided to merge this code into sequali because GCC is used on linux and Linux is the only production platform for bioinformatics. (I am not aware of any MacOS and Windows compute clusters out there). Furthermore it should be a lot faster on platforms that do not benefit from auto vectorization due to the loop unrolling. In addition to that, I think the code is nicer when it does not make have use of if __SSE2__ and other compile guards. But it is your call in this case.

Rewrite ascii check to allow compiler auto vectorization

0704145

marcelm merged commit 2aaaba6 into main Apr 26, 2024
22 checks passed

marcelm deleted the asciiagain branch April 26, 2024 13:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite ascii check to allow compiler auto vectorization #133

Rewrite ascii check to allow compiler auto vectorization #133

rhpvorderman commented Apr 8, 2024 •

edited

Loading

rhpvorderman commented Apr 8, 2024 •

edited

Loading

rhpvorderman commented Apr 24, 2024

Rewrite ascii check to allow compiler auto vectorization #133

Rewrite ascii check to allow compiler auto vectorization #133

Conversation

rhpvorderman commented Apr 8, 2024 • edited Loading

rhpvorderman commented Apr 8, 2024 • edited Loading

rhpvorderman commented Apr 24, 2024

rhpvorderman commented Apr 8, 2024 •

edited

Loading

rhpvorderman commented Apr 8, 2024 •

edited

Loading