Left Pack (AVX)

If you have an input array, and an output array, but you only want to write those elements which pass a certain condition, what would be the most efficient way to do this in AVX2? - SO

caption

The first thing to do is find a fast scalar function. Here is a version which does not use a branch.

inline int compact(int *x, int *y, const int n) {
    int cnt = 0;
    for(int i=0; i<n; i++) {
        int cut = x[i]!=0;
        y[cnt] = cut*x[i];
        cnt += cut;
    }
    return cnt;
}

SSE instruction

Written on February 22, 2020, Last update on November 23, 2022
bits pack avx c++