Highway (vector loop)

a C++ library that provides portable SIMD/vector intrinsics. - github / HN

Online demos using Compiler Explorer:

Concept

Strip-mining loops

To vectorize a loop, “strip-mining” transforms it into an outer loop and inner loop with number of iterations matching the preferred vector width.

There are several way to do it:

Ensure all inputs/outputs are padded. Then the loop is simply

N = Lanes(d);	// number of lane for give vector type ( d = ScalableTag<T>)
for (size_t i = 0; i < count; i += N) {
...
}

Process whole vectors as above, followed by a scalar loop

size_t i = 0;
for (; i + N <= count; i += N) LoopBody<false>(d, i, 0);
for (; i < count; ++i) LoopBody<false>(CappedTag<T, 1>(), i, 0);

Process whole vectors as above, followed by a single call to a modified LoopBody with masking:

size_t i = 0;
for (; i + N <= count; i += N) {
  LoopBody<false>(d, i, 0);
}
if (i < count) {
  LoopBody<true>(d, i, count - i);
}

API synopsis / quick reference

FAQ

see also

Written on October 24, 2022, Last update on December 2, 2023
c++ lib avx loop