Simple Instruction Multiple Data Vectorization (SIMD)

SSE and SSE2 are available in every single x86-family CPU with 64-bit support… here’s a list of tricks to get you around some of the more common, eh, “idiosyncrasies” of SSE and its descendants. - SSE: mind the gap!

Intel Instruction Set

x86/x64 SIMD Instruction List (SSE to AVX512)

What is the difference between AVX2 and AVX-512?

AVX2 is a 256 bit vector instruction set. You have 256 bit registers which can be interpreted several ways (8 floats, 4 doubles, 32 bytes, etc).

caption

Gcc compiler intrinsic

Always use #include <immintrin.h>

X86 intrinsics are follow the naming convention mm[opname]_[suffix]

Do not use in Union - this impact performance

Beware that by defaullt:
__m256 is treated as 8xfloat by code/and debugger
__m256i is treated as 4x64bit integers.

Checking avx availability with Gcc

#define CHECK(target) if(__builtin_cpu_supports(target)) {   cerr << target << " supported\n"; }

CHECK("avx");
CHECK("avx2");
CHECK("avx512vl");

Casting

The intrinsics _mm256_castps_si256/_mm256_castsi256_ps are only to make the compiler happy “This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.”

Autovectorization

Codingame

Vectorizing indirect access through avx instructions

Example

Limitation

Written on December 27, 2017, Last update on October 12, 2023
c++ avx shader intel