BLAS Performance

A common misconception is that BLAS implementations of matrix multiplication are orders of magnitude faster than naive implementations because they are very complex. - BLAS-level CPU Performance in 100 Lines of C / SO

Due to optimisations similar to those in this article. The asymptotically faster algorithms are not useful in practice. - HN

caption

Written on March 1, 2022, Last update on March 1, 2022
matrix math fastware