Left Pack (AVX)
If you have an input array, and an output array, but you only want to write those elements which pass a certain condition, what would be the most efficient way to do this in AVX2? - SO
- Compact AVX2 register so selected integers are contiguous according to mask
- What is the most efficient way to pack left based on a mask?
- Optimizing Array Compaction
The first thing to do is find a fast scalar function. Here is a version which does not use a branch.
SSE instruction
Written on February 22, 2020, Last update on November 23, 2022
bits
pack
avx
c++