BLAS Performance
A common misconception is that BLAS implementations of matrix multiplication are orders of magnitude faster than naive implementations because they are very complex. - BLAS-level CPU Performance in 100 Lines of C / SO
Due to optimisations similar to those in this article. The asymptotically faster algorithms are not useful in practice. - HN
Written on March 1, 2022, Last update on March 1, 2022
matrix
math
fastware