There is no single instruction in AVX2 or earlier. (AVX512 can use masks in bitmap form directly, and has an instruction to expand masks to vectors). - Peter Cordes (SO)
Load / Get
The result is all 1s, for true, which happens to be a NaN. For false it’s all 0s, which happens to be 0.0. Typically you use the result as a bitwise mask, so the float value isn’t really meaningful.
Load integer value
AVX - binary cast through a union and load float value
AVX2
_mm256_set1_epi32 - Initializes 256-bit vector with scalar integer values. No corresponding Intel® AVX instruction.
Mask
_mm256_movemask_ps (mm256 -> int)
int as_int ( const v8f & f ) {
return _mm256_movemask_ps ( f . v );
}
AVX Solution
// AVX2 can be significantly more efficient, doing this with integer SIMD
// Especially for the case where the bitmap is in an integer register, not memory
// It's fine if `bitmap` contains high garbage; make sure your C compiler broadcasts from a dword in memory if possible instead of integer load with zero extension.
// e.g. __m256 _mm256_broadcast_ss(float *a); or memcpy to unsigned.
// Store/reload is not a bad strategy vs. movd + 2 shuffles so maybe just do it even if the value might be in a register; it will force some compilers to store/broadcast-load. But it might not be type-punning safe even though it's an intrinsic.
// Low bit -> element 0, etc.
__m256 inverse_movemask_ps_avx1 ( unsigned bitmap )
{
// if you know DAZ is off: don't OR, just AND/CMPEQ with subnormal bit patterns
// FTZ is irrelevant, we only use bitwise booleans and CMPPS
const __m256 exponent = _mm256_set1_ps ( 1.0 f ); // set1_epi32(0x3f800000)
const __m256 bit_select = _mm256_castsi256_ps (
_mm256_set_epi32 ( // exponent + low significand bits
0x3f800000 + ( 1 << 7 ), 0x3f800000 + ( 1 << 6 ),
0x3f800000 + ( 1 << 5 ), 0x3f800000 + ( 1 << 4 ),
0x3f800000 + ( 1 << 3 ), 0x3f800000 + ( 1 << 2 ),
0x3f800000 + ( 1 << 1 ), 0x3f800000 + ( 1 << 0 )
));
// bitmap |= 0x3f800000; // more efficient to do this scalar, but only if the data was in a register to start with
__m256 bcast = _mm256_castsi256_ps ( _mm256_set1_epi32 ( bitmap ));
__m256 ored = _mm256_or_ps ( bcast , exponent );
__m256 isolated = _mm256_and_ps ( ored , bit_select );
return _mm256_cmp_ps ( isolated , bit_select , _CMP_EQ_OQ );
}
AVX2 Solution ?
int mask = _mm256_movemask_epi8 ( __m256i s1 );
// vs
__m256i get_mask2 ( const uint32_t mask )
Bitwise operation
see also