Blas-lapack-switch

= GSoC2019/Gentoo: BLAS/LAPACK Runtime Switch =

Table of Contents


 * 1) BLAS/LAPACK Runtime Switch: User Guide
 * 2) BLAS/LAPACK Runtime Switch: Developer Guide
 * 3) Implementation Details
 * 4) Frequently Asked Questions

= BLAS/LAPACK Runtime Switch: User Guide =

Disabling The Feature
This feature is disabled by default, which means users who don’t care about it could simply ignore the  USE flag as if it doesn’t exist and install things under the default settings like before. Users who don’t read any documentation at all won’t fall into trouble with this default setting.

Enabling The Feature
First install the skeleton of the mechanism:

These virtual packages will pull in the reference BLAS/LAPACK implementation and the customized eselect modules for BLAS and LAPACK, i.e. . After finishing the installation, the user should be able to check the status of BLAS/LAPACK selections:
 * 1) USE=eselect-ldso emerge --ask &gt;=virtual/blas-3.8 &gt;=virtual/lapack-3.8

Available BLAS/CBLAS (lib64) candidates: [1]  reference * Available LAPACK (lib64) candidates: [1]  reference * That means all binaries linked against  or   will use the   BLAS implementation; those linked against   will use the   LAPACK implementation.
 * 1) eselect blas list
 * 1) eselect lapack list

The reference implementation is very slow, and for some users (e.g. scientific computing users) this is unacceptable. In Gentoo’s main repo there are several typical optimized BLAS/LAPACK implementations available, for example BLIS and OpenBLAS. They could be automatically registered in the mechanism as long as the  USE flag is toggled during installation. For example:

Note that without the  flag, these packages won’t be registered in the mechanism and won’t install extra libraries at all. After installation with the feature enabled, we could switch the BLAS/LAPACK implementation like so:
 * 1) USE=eselect-ldso emerge --ask &gt;=sci-libs/blis-0.6.0
 * 2) USE=eselect-ldso emerge --ask &gt;=sci-libs/openblas-0.3.5

Directly run your program again and see if it’s running faster. No any re-compilation is required thanks to this mechanism. For more details about the  or   usage please look up the manual page or the help messages.
 * 1) eselect blas set openblas
 * 2) eselect lapack set openblas

Side Notes
List of BLAS/LAPACK providers :


 * : the reference blas/cblas/lapack/lapacke, only supports serial mode.
 * : optimized blas/cblas implementation.
 * : optimized blas/cblas + partially optimized lapack/lapacke.
 * : Intel’s math kernel library. optimized blas/cblas/lapack/lapcke/etc implementation.

Here are some recommended combinations for your choice:

* blas=openblas lapack=openblas   (priority: high) Note the following combinations are discouraged:
 * blas=blis     lapack=reference  (priority: medium)
 * blas=reference lapack=reference (priority: low)
 * blas=mkl-rt   lapack=mkl-rt     (priority: high but non-free)

* blas=blis     lapack=openblas In case of package confliction (Block) during transition:

* Keep &gt;=virtual/{blas,cblas,lapack}-3.8 sci-lib/{blas,cblas,lapack,lapacke}-reference packages. The *-reference packages should be unmerged. app-eselect/eselect-cblas-* has been deprecated.
 * Keep &gt;=sci-libs/lapack-3.8 as it replaces all the
 * Keep &gt;=app-eselect/eslect-{blas,lapack}-0.2.

Pitfalls

 * 1) Please don’t use pthread and openmp at the same time since it may incur significant performance drop due to excessive thread creation. This may happen when some libraries linked against an application use OpenMP threading, whiles some other use pthread.
 * 2) Please don’t use GNU OpenMP  and  at the same time as the symbol clash between them may lead to silent computation error. This may happen when MKL uses Intel/LLVM OpenMP while some other libraries linked against the same application use GNU OpenMP.

= BLAS/LAPACK Runtime Switch: Developer Guide =

BLAS/LAPACK Providers
It must be pointed out that for any BLAS/LAPACK implementation, providing extra shared object with proper SONAMEs is necessary. For example, we cannot use  as the BLAS/CBLAS provider by simply symlinking it into   and   because any program to be linked against BLAS  or CBLAS  will be eventually linked against   (you can verify this with  ), which will clearly break the runtime switching mechanism. The current solution is to patch upstream build systems and build customized shared objects with proper SONAMEs.

To package a BLAS/LAPACK provider with the runtime switching feature enabled, the maintainer should pay attention to the following points:


 * 1) Patch upstream build systems and provide extra shared objects in a private library directory. Specifially, a new BLAS/CBLAS implementation, say “myblas”, should install 4 files to the   directory:


 * 1)   (ELF shared object, providing the fortran BLAS ABI, SONAME= ); (2)   (symlink pointing to  );
 * 2)   (ELF shared object, providing the C BLAS ABI, SONAME= ); (4)   (symlink pointing to  ). Similarly, a new LAPACK implementation, say “mylapack” should install 2 files to the   directory: (1)   (ELF shared object, providing the fortran LAPACK ABI, SONAME= ); (2)   (symlink pointing to  ).

  Register an alternative with  during postinst.   Remove an alternative with  during postrm.   Guard the code associated with all the above points with the  USE flag. 

For real example please see ebuild files for,  ,.

BLAS/LAPACK Reverse Dependencies
If a package needs to be linked against the reference (aka. netlib) BLAS and LAPACK, it should declare virtual packages dependency, i.e.  instead of a specific implementation. In this case the package must assume a standard (reference) API and ABI from the virtual package. Otherwise, please write a specific implementation in the dependency list and avoid linking against  or.

= Implementation Details =

The core part of the implementation involves,   and  , where the former one controls both (fortran) BLAS and CBLAS alternatives at the same time.

The  is codebase of the reference (or standard) BLAS, CBLAS, LAPACK, and LAPACKE. BLAS and LAPACK are a set of stable Fortran API / ABI. CBLAS and LAPACKE are thin wrappers around BLAS and LAPACK respectively, providing the C API / ABI. In our BLAS/LAPACK runtime switching mechanism, every candidate must provide every API / ABI that the reference implementation provides. Taking advantage of the API/ABI stability, we can change the backend libraries (e.g. ) without recompiling applications as long as the new one provides a compatible set of ABI.

The users could easily switch the libraries by adjusting the  environment variables as a temporary solution. For system level library switching, two custom eselect modules are provided. They manipulates configuration files under the  directory, hinting   on the places to find the BLAS/LAPACK libraries.

As a side effect, this solution depends on the  support from the system C standard library. Besides, It’s recommended to read the code if you need even more details.

Code: app-eselect/eselect-blas app-eselect/eselect-lapack sci-libs/lapack sci-libs/blis sci-libs/openblas sci-libs/mkl-rt

= Frequently Asked Questions =

'''Q: I disabled this feature when installing a bunch of packages, but now I regret and want to enable the runtime switching feature. How should I do?'''

A: Simply reinstall the virtual packages and your favorate BLAS/LAPACK providers with the  flag toggled. The whole dependency tree doesn’t need to be rebuilt as a rebuild is expected to make no difference.

'''Q: Some BLAS/LAPACK implementations support 64-bit array indexing, which provides functions such as. How does this mechanism deal with such feature?'''

A: The “BLAS64” or “BLAS-ILP64” ABI is different from the “BLAS32” or “BLAS-LP64” ABI. Mixing them together will lead to unpredictable results, hence the “BLAS64” feature is not integrated into the mechanism. Currently we only provide this feature in the  package for Julia’s use. Besides, the generic switching mechanism for BLAS64/LAPACK64 is still being experimented in Debian. When the demand on “BLAS64” is common enough or the experiment in Debian was successful, we could start to provide it in Gentoo.

Q: How do I add a customized implementation into this mechanism?

A: Taking MKL as an example. We first install MKL to, and symlink   to. Then register it with. Note that building programs when MKL is selected is discouraged. The reason could be found in the developer guide part.

A real example about adding and setting Intel MKL as the backend library:

To remove the MKL candidate or any other customized library, just remove the corresponding files under  and   directories, then select some other candidates. Note, the  can do all the above steps for you.
 * 1) pip install mkl --user
 * 2) cd ~/.local/lib/
 * 3) ln -s libmkl_rt.so libblas.so.3
 * 4) ln -s libmkl_rt.so libblas.so
 * 5) ln -s libmkl_rt.so libcblas.so.3
 * 6) ln -s libmkl_rt.so libcblas.so
 * 7) eselect blas add lib64 $(pwd) mkl
 * 8) ln -s libmkl_rt.so liblapack.so.3
 * 9) ln -s libmkl_rt.so liblapack.so
 * 10) eselect lapack add lib64 $(pwd) mkl
 * 11) eselect blas set mkl
 * 12) eselect lapack set mkl

= Reference =


 * 1) [gentoo-science] GSoC Proposal: Improvements to the BLAS / LAPACK and their reverse-dependencies https://archives.gentoo.org/gentoo-science/message/4d0186acdce6df538a2740e0f1146ae6
 * 2) [gentoo-dev] RFC: BLAS and LAPACK runtime switching https://archives.gentoo.org/gentoo-dev/message/d917547f7a9e1226fca63632a1e02026
 * 3) [gentoo-dev] [PATCH 0/2] RFC: Introducing ldso switching to BLAS/LAPACK https://archives.gentoo.org/gentoo-dev/message/95beba3dc1c0f684ce1ec82d51988fc8
 * 4) [gentoo-science] On BLAS and LAPACK int64 ABI https://archives.gentoo.org/gentoo-science/message/8e3b9567297de5a1809feb28c62be633
 * 5) Hasan ÇALIŞIR (Gentoo Proxy Maintainer) wrote an “openblas” script for similar switching purpose. However the implementation is neither generic nor simple enough. See https://github.com/gentoo/gentoo/pull/11700/files

= Authors, Acknowledgement, Credits =

Author: Mo Zhou [mailto:lumin@debian.org lumin@debian.org] GSoC Mentor: Benda Xu [mailto:heroxbd@gentoo.org heroxbd@gentoo.org]

This work is supported by Google through Google Summer of Code. Project Link

This runtime switching mechanism borrowed many brilliant ideas from Debian's BLAS/LAPACK alternatives mechanism. Many thanks to Debian science team for the brilliant reference!

Thanks to Zongyu Zhang who fixed a bug in numpy ebuild so that numpy could make use of the switching mechanism correctly.

= User Feedbacks =

Positive ones:


 * https://github.com/gentoo/sci/issues/805#issuecomment-510469206
 * https://github.com/gentoo/sci/issues/805#issuecomment-512097570

Negative ones: None yet.