Highway (vector loop)
a C++ library that provides portable SIMD/vector intrinsics. - github / HN
Online demos using Compiler Explorer:
Concept
Strip-mining loops
To vectorize a loop, “strip-mining” transforms it into an outer loop and inner loop with number of iterations matching the preferred vector width.
There are several way to do it:
Ensure all inputs/outputs are padded. Then the loop is simply
N = Lanes(d); // number of lane for give vector type ( d = ScalableTag<T>)
for (size_t i = 0; i < count; i += N) {
...
}
Process whole vectors as above, followed by a scalar loop
size_t i = 0;
for (; i + N <= count; i += N) LoopBody<false>(d, i, 0);
for (; i < count; ++i) LoopBody<false>(CappedTag<T, 1>(), i, 0);
Process whole vectors as above, followed by a single call to a modified LoopBody with masking:
size_t i = 0;
for (; i + N <= count; i += N) {
LoopBody<false>(d, i, 0);
}
if (i < count) {
LoopBody<true>(d, i, count - i);
}
API synopsis / quick reference
FAQ
see also
- Entrywise addition of two double arrays using AVX - code comparison of AVX 512 / AVX / SSE2 loop
- How to use if condition in intrinsics
- Controlling the Data Flow (codingame) / some other reference - v8f - AVX x8 float. 32-bit x 8=256bits
Written on October 24, 2022, Last update on December 2, 2023
c++
lib
avx
loop