NN Quantisation

we managed to selectively quantize certain layers to higher bits (like 4bit) & leave most MoE layers (like those used in GPT-4) to 1.5bit. - Run DeepSeek R1D ynamic 1.58-bit / HN

Written on March 17, 2025, Last update on March 17, 2025
NN quantisation