Mask & Bitwise operation

There is no single instruction in AVX2 or earlier. (AVX512 can use masks in bitmap form directly, and has an instruction to expand masks to vectors). - Peter Cordes (SO)

Load / Get

Is AVX intrinsic _mm256_cmp_ps supposed to return NaN when true?

The result is all 1s, for true, which happens to be a NaN. For false it’s all 0s, which happens to be 0.0. Typically you use the result as a bitwise mask, so the float value isn’t really meaningful.

Load integer value

AVX - binary cast through a union and load float value

AVX2

_mm256_set1_epi32 - Initializes 256-bit vector with scalar integer values. No corresponding Intel® AVX instruction.

Mask

_mm256_movemask_ps (mm256 -> int)

int as_int( const v8f& f) { 
	return _mm256_movemask_ps( f.v); 
}

reversing _mm256_movemask_ps

is there an inverse instruction to the movemask instruction in intel avx2?
- How to use bits in a byte to set dwords in ymm register without AVX2? (Inverse of vmovmskps)

AVX Solution

// AVX2 can be significantly more efficient, doing this with integer SIMD
// Especially for the case where the bitmap is in an integer register, not memory
// It's fine if `bitmap` contains high garbage; make sure your C compiler broadcasts from a dword in memory if possible instead of integer load with zero extension. 
// e.g. __m256 _mm256_broadcast_ss(float *a);  or memcpy to unsigned. 
// Store/reload is not a bad strategy vs. movd + 2 shuffles so maybe just do it even if the value might be in a register; it will force some compilers to store/broadcast-load.  But it might not be type-punning safe  even though it's an intrinsic.

// Low bit -> element 0, etc.
__m256 inverse_movemask_ps_avx1(unsigned bitmap)
{
    // if you know DAZ is off: don't OR, just AND/CMPEQ with subnormal bit patterns
    // FTZ is irrelevant, we only use bitwise booleans and CMPPS
    const __m256 exponent = _mm256_set1_ps(1.0f);   // set1_epi32(0x3f800000)
    const __m256 bit_select = _mm256_castsi256_ps(
          _mm256_set_epi32(  // exponent + low significand bits
                0x3f800000 + (1<<7), 0x3f800000 + (1<<6),
                0x3f800000 + (1<<5), 0x3f800000 + (1<<4),
                0x3f800000 + (1<<3), 0x3f800000 + (1<<2),
                0x3f800000 + (1<<1), 0x3f800000 + (1<<0)
          ));

    // bitmap |= 0x3f800000;  // more efficient to do this scalar, but only if the data was in a register to start with
    __m256  bcast = _mm256_castsi256_ps(_mm256_set1_epi32(bitmap));
    __m256  ored  = _mm256_or_ps(bcast, exponent);
    __m256  isolated = _mm256_and_ps(ored, bit_select);
    return _mm256_cmp_ps(isolated, bit_select, _CMP_EQ_OQ);
}

AVX2 Solution ?

How to perform the inverse of _mm256_movemask_epi8 (VPMOVMSKB)?

int mask = _mm256_movemask_epi8(__m256i s1);
// vs
__m256i get_mask2(const uint32_t mask)