Left Pack (AVX)
If you have an input array, and an output array, but you only want to write those elements which pass a certain condition, what would be the most efficient way to do this in AVX2? - SO
- Compact AVX2 register so selected integers are contiguous according to mask
- What is the most efficient way to pack left based on a mask?
- Optimizing Array Compaction
The first thing to do is find a fast scalar function. Here is a version which does not use a branch.
inline int compact(int *x, int *y, const int n) {
int cnt = 0;
for(int i=0; i<n; i++) {
int cut = x[i]!=0;
y[cnt] = cut*x[i];
cnt += cut;
}
return cnt;
}
SSE instruction
Written on February 22, 2020, Last update on November 23, 2022
bits
pack
avx
c++