Rewrite ascii check to allow compiler auto vectorization #133
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rewriting it to use
size_t
will create the most optimal code for all platofrms. Also I did a small optimization that works well for small strings.Say if the string is length 13. Previously the function would check 8 bytes at once, then check 5 bytes individually. What I do now is simply substract 8 from the length and check an 8 byte vector at the end. This is much more optimal for smaller strings.
Not that we need that, as we check 128kb chunks. But it eliminates a loop from the function, so might as well leave it in.
The 8byte loop is unrolled to do 4 OR operations simultaneously. This is automatically compiled to do two SSE2 OR operations with
-O2
on GCC. As a result this code is not slower than the code that it replaces with had vectors inserted manually.The resulting code is much more portable as a result, and should also compile to use vectors on ARM platforms.
EDIT: See also godbolt compiler explorer