User:Aisha/BLAS LAPACK dev guide

BLAS, LAPACK and friends are one of the most important parts of scientific computing toolkits and having optimized versions present is a must for fully utilizing the power of your processors.

In this document, we present the promises made by BLAS/CBLAS/LAPACK/LAPACKE libraries and their 64bit API counterparts, in hopes that this will help users and developers to have an understanding of working with these libraries.

Overview
Netlibs specifications are very precise in terms of API and usage but they lack the necessary specifications for developers on how to package these libraries.

Two of the most important things missing, from a software engineering point of view: There is no consistent nomenclature for linking a BLAS/LAPACK library. In fact there are no guidelines to mandate that the libraries need to be called  or   at all. It is dependent on the OS's package maintainer to make sure that a package which uses these libraries is linked to the BLAS library provided by the OS. Netlib guidelines do not need that BLAS is provided by a single library. It is possible to split the library into three chunks,, one for each level, and link with all of them during compile time. This model is particularly useful if you are working with GPU based architectures where Level 1 and 2 functions can be done on the CPU while larger Level 3 functions should be done on the GPU. Yet again, it is up to the package maintainer to ensure that all libraries get linked.
 * Shared library naming
 * Symbol presence

The recommendations in this article should help all maintainers deal with BLAS/LAPACK dependent packages in a consistent manner.

Shared library usage
If a package wants to link with any of BLAS/CBLAS/LAPACK/etc. then it should depend only on the virtual packages for each of these libraries.

In Gentoo, we have eight virtual packages for BLAS and friends.



The  in the latter packages implies using 64 bit representations of  s and in addition implies an API incompatibility with the standard libraries which expect a   bit representation of integers.

Traditionally, it was only important to have the  bit implementations present but the new and upcoming software have options to take advantage of the  -bit computation models and use the new interfaces. For the longest time, it has not been possible to have both models present at the same time. With the new nomenclature there should be little doubt on what API a package is using.

Library naming and linking
The libraries that need to be linked will always be named according to their virtual counterparts



If using the corresponding variable, it is guaranteed that this library can be linked to at compile time with the appropriate flag, e.g.. No other flags are to be needed unless noted in this article.

Runtime usage
It is guaranteed that during runtime, the linker will link to an BLAS/LAPACK provider, by linking to the specified library.

This situation already happens when using Intels MKL libraries as substitute for BLAS/LAPACK, hence a fair warning has been issued.

It is guaranteed that the symbols will be resolvable at runtime by the runtime linker and that the API functions will have an implementation during runtime. There will be no unresolvable symbols during runtime.

Do not expect that all functions are provided by the same library provider. It is possible to have BLAS and LAPACK resolved from different providers.

Runtime switching
It is possible and almost assured that the library that was linked at compile time is not the one that is used at run time.

Gentoo has a Blas-lapack-switch mechanism that allows provider changing during runtime.

This can lead to errors in code which have function optimizations and switching during compile times. Maintainers should ensure that this (very rare) situation does not arise for their package.

This situation is most common when packages have a flag for building with MKL. In those cases, the recommendation is to build packages with only one of BLAS or MKL and not both.

Threading model
If using Intel's, its use flag tbb should be enabled on a global scale so that all libraries supporting it are enabled with it. Mixing of threading backends is inconsistent and can blow up resource usage.

It is recommended that all libraries should try to select the same OpenMP library, either the GNU/Intel/Clang library. This is not always possible due to API incompatibility and most packages using API extensions provided by said libraries. In such cases, try to minimize the contact between consumer programs for stopping esoteric error conditions.