-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help the compiler vectorize adjacent_difference
#4958
base: main
Are you sure you want to change the base?
Conversation
adjacent_differentce
adjacent_difference
@@ -469,6 +487,30 @@ _CONSTEXPR20 _OutIt adjacent_difference(const _InIt _First, const _InIt _Last, _ | |||
const auto _ULast = _STD _Get_unwrapped(_Last); | |||
auto _UDest = _STD _Get_unwrapped_n(_Dest, _STD _Idl_distance<_InIt>(_UFirst, _ULast)); | |||
if (_UFirst != _ULast) { | |||
if constexpr (_Iterators_are_contiguous<_InIt, _OutIt> && !_Iterator_is_volatile<_InIt> | |||
&& is_trivially_copyable_v<_Iter_value_t<_InIt>>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is !_Iterator_is_volatile<_InIt>
or is_trivially_copyable_v<_Iter_value_t<_InIt>>
meaningful here, given we're not performing any trivial bitwise copy in this branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure.
The Effects as in [adjacent.difference]/4 specifically requiring making a copy at certain time, which we do later. So the difference may be observable. But it is not observable, if the memory is not volatile and the copying is trivial.
Reported DevCom-10745948 |
Interestingly it vectorizes if we use |
📜 The approach
The following things prevented the original algorithm from vectorization:
✅ Checks for eligibility
The following checks were added:
There's no need in check for integral types or so, since the compiler makes the final decision anyway, and it may be able optimize even something that wouldn't pass a strict check.
Apparently there's no rule that the source and the destination ranges may not overlap.
We should handle aliasing.
Unlike the #4431 precedent, we can't yield to the compiler here. The compiler is able to insert overlaps check that prevents vectorization and go to the scalar fallback in case of checks failure, but:
So we do our own checks.
Then we tell the compiler with
__restrict
that we already checked, and it should not bother. This is done in a separate function, because the__restrict
is not aliased within scope, so saying__restrict
within the original algorithm would apparently be a lie.The extra check by the compiler, if not prevented would slightly add run time and dead code size.
😾 Compiler warnings
We have a great feature called integral promotion, or whatever it is called. Smaller types are converted to integers, and there is a warning about converting them back. Local pragma does suppresses them in benchmark, but not in the test.
I don't know what to do with this.
⏱️ Benchmark results
🥇 Results interpretation