Simple Instruction Multiple Data Vectorization (SIMD/AVX)

SSE and SSE2 are available in every single x86-family CPU with 64-bit support… here’s a list of tricks to get you around some of the more common, eh, “idiosyncrasies” of SSE and its descendants. - SSE: mind the gap!

# Intel Instruction Set

# x86/x64 SIMD Instruction List (SSE to AVX512)

# What is the difference between AVX, AVX2 and AVX-512?

AVX(1) supports only floating point operations, AVX2 adds 256 bit integer operations.

caption

# Gcc compiler intrinsic

Always use #include <immintrin.h>

X86 intrinsics are follow the naming convention mm[opname]_[suffix]

Do not use in Union - this impact performance

Beware that by defaullt:
__m256 is treated as 8xfloat by code/and debugger
__m256i is treated as 4x64bit integers.

# Checking avx availability with Gcc

#define CHECK(target) if(__builtin_cpu_supports(target)) {   cerr << target << " supported\n"; }

CHECK("avx");
CHECK("avx2");
CHECK("avx512vl");

Casting

The intrinsics _mm256_castps_si256/_mm256_castsi256_ps are only to make the compiler happy “This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.”

# Autovectorization

# Codingame

# Vectorizing indirect access through avx instructions

# Example

# Limitation

Written on December 27, 2017, Last update on February 16, 2026
avx 16bits c++ shader intel