BLAS and LAPACK: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
(Marked this version for translation)
Line 2: Line 2:
<languages />
<languages />
<translate>
<translate>
<!--T:1-->
[http://www.netlib.org/blas/ BLAS (Basic Linear Algebra Subprogram] and [http://www.netlib.org/lapack/ LAPACK (Linear Algebra PACK)] are two of the most commonly used libraries in advanced research computing. They are used for vector and matrix operations that are so commonly found in a plethora of algorithms. More importantly, they are more than libraries, as they define a standard programming interface. A programming interface is a set of function definitions that can be called to accomplish specific computation, for example the dot product of two vectors of double precision numbers, or the matrix product of two hermitian matrices of complex numbers.  
[http://www.netlib.org/blas/ BLAS (Basic Linear Algebra Subprogram] and [http://www.netlib.org/lapack/ LAPACK (Linear Algebra PACK)] are two of the most commonly used libraries in advanced research computing. They are used for vector and matrix operations that are so commonly found in a plethora of algorithms. More importantly, they are more than libraries, as they define a standard programming interface. A programming interface is a set of function definitions that can be called to accomplish specific computation, for example the dot product of two vectors of double precision numbers, or the matrix product of two hermitian matrices of complex numbers.  


<!--T:2-->
Beside the reference implementation done by Netlib, there exist a large number of implementations of these two standards. The performance of these implementations can vary widely depending on the hardware that is running them. For example, it is well established that the implementation provided by the [https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html Intel Math Kernel Library (Intel MKL)] performs best in most situations on Intel processors. That implementation is however proprietary, and in some situations, it is preferred to use the open source implementation [https://github.com/xianyi/OpenBLAS OpenBLAS]. In 2018, AMD also released its own implementation, named [https://developer.amd.com/amd-aocl/blas-library/ AMD BLIS], which performs better on AMD processors. Previously, you may have known [https://www.tacc.utexas.edu/research-development/tacc-software/gotoblas2 gotoblas] and [https://github.com/math-atlas/math-atlas ATLAS BLAS], but those projects are no longer maintained.
Beside the reference implementation done by Netlib, there exist a large number of implementations of these two standards. The performance of these implementations can vary widely depending on the hardware that is running them. For example, it is well established that the implementation provided by the [https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html Intel Math Kernel Library (Intel MKL)] performs best in most situations on Intel processors. That implementation is however proprietary, and in some situations, it is preferred to use the open source implementation [https://github.com/xianyi/OpenBLAS OpenBLAS]. In 2018, AMD also released its own implementation, named [https://developer.amd.com/amd-aocl/blas-library/ AMD BLIS], which performs better on AMD processors. Previously, you may have known [https://www.tacc.utexas.edu/research-development/tacc-software/gotoblas2 gotoblas] and [https://github.com/math-atlas/math-atlas ATLAS BLAS], but those projects are no longer maintained.


<!--T:3-->
Unfortunately, testing which implementation performs best for a given code and given hardware usually requires recompiling software. This is a problem when trying to create a portable software environment that works on multiple clusters. This can be fixed by using [https://www.mpi-magdeburg.mpg.de/projects/flexiblas FlexiBLAS]. This is an abstraction layer that allows one to swap which implementation of BLAS and LAPACK is used at runtime, rather than at compile time.  
Unfortunately, testing which implementation performs best for a given code and given hardware usually requires recompiling software. This is a problem when trying to create a portable software environment that works on multiple clusters. This can be fixed by using [https://www.mpi-magdeburg.mpg.de/projects/flexiblas FlexiBLAS]. This is an abstraction layer that allows one to swap which implementation of BLAS and LAPACK is used at runtime, rather than at compile time.  


= Which implementation should I use ? =
= Which implementation should I use ? = <!--T:4-->
For the past few years, we have been recommending to use Intel MKL as a reference implementation. This recommendation was driven by the fact that we only had Intel processors in our clusters. This was change with the arrival of Narval, which hosts AMD processors. We now recommend using FlexiBLAS when compiling code. Our FlexiBLAS module is configured such that Intel MKL will be used by default except when using AMD processors, in which case AMD BLIS will be used, offering optimal performance.  
For the past few years, we have been recommending to use Intel MKL as a reference implementation. This recommendation was driven by the fact that we only had Intel processors in our clusters. This was change with the arrival of Narval, which hosts AMD processors. We now recommend using FlexiBLAS when compiling code. Our FlexiBLAS module is configured such that Intel MKL will be used by default except when using AMD processors, in which case AMD BLIS will be used, offering optimal performance.  


= How do I compile against FlexiBLAS ? =
= How do I compile against FlexiBLAS ? = <!--T:5-->
Unfortunately, FlexiBLAS is relatively new, and not all build systems will recognize it by default. This can generally be fixed by setting the linking options to use <tt>-lflexiblas</tt> for BLAS and for LAPACK. You will typically find these options in your Makefile, or be able to pass them as parameters to <tt>configure</tt> or <tt>cmake</tt>.  
Unfortunately, FlexiBLAS is relatively new, and not all build systems will recognize it by default. This can generally be fixed by setting the linking options to use <tt>-lflexiblas</tt> for BLAS and for LAPACK. You will typically find these options in your Makefile, or be able to pass them as parameters to <tt>configure</tt> or <tt>cmake</tt>.  
{{Note|On [[Narval]], the <tt>flexiblas</tt> module is loaded by default. On other clusters, you may need to load the <tt>flexiblas</tt> module before being able to use it.}}
{{Note|On [[Narval]], the <tt>flexiblas</tt> module is loaded by default. On other clusters, you may need to load the <tt>flexiblas</tt> module before being able to use it.}}


= How do I change which implementation of BLAS/LAPACK is used at run time ? =
= How do I change which implementation of BLAS/LAPACK is used at run time ? = <!--T:6-->
The big benefit of using FlexiBLAS is to be able to change the implementation backend at run time by setting the environment variable <tt>FLEXIBLAS</tt>. At the time of this writing, four implementations are available: <tt>netlib</tt>, <tt>blis</tt>, <tt>imkl</tt> and <tt>openblas</tt>, but the full list can be obtained by running the command
The big benefit of using FlexiBLAS is to be able to change the implementation backend at run time by setting the environment variable <tt>FLEXIBLAS</tt>. At the time of this writing, four implementations are available: <tt>netlib</tt>, <tt>blis</tt>, <tt>imkl</tt> and <tt>openblas</tt>, but the full list can be obtained by running the command
{{Command|flexiblas list}}
{{Command|flexiblas list}}


<!--T:7-->
On [[Narval]], we have set <tt>FLEXIBLAS=blis</tt> to use AMD's implementation by default, while on other clusters, <tt>FLEXIBLAS</tt> is left undefined, which defaults to using Intel MKL.
On [[Narval]], we have set <tt>FLEXIBLAS=blis</tt> to use AMD's implementation by default, while on other clusters, <tt>FLEXIBLAS</tt> is left undefined, which defaults to using Intel MKL.


= Using Intel MKL directly =
= Using Intel MKL directly = <!--T:8-->
Although we recommend using FlexiBLAS, it is still possible to use Intel MKL directly. If you are using one of the Intel compilers (e.g. <code>ifort, icc, icpc</code>) then the solution is to replace <tt>-lblas</tt> and <tt>-llapack</tt> with <tt>-mkl=sequential</tt> (without internal MKL threading) or <tt>-mkl</tt> (with threading) in your compiler and linker options in order to ensure that the MKL and thus BLAS/LAPACK are used. See [https://software.intel.com/en-us/mkl-linux-developer-guide-using-the-mkl-compiler-option here] for more on the significance of <code>sequential</code> and other options.  
Although we recommend using FlexiBLAS, it is still possible to use Intel MKL directly. If you are using one of the Intel compilers (e.g. <code>ifort, icc, icpc</code>) then the solution is to replace <tt>-lblas</tt> and <tt>-llapack</tt> with <tt>-mkl=sequential</tt> (without internal MKL threading) or <tt>-mkl</tt> (with threading) in your compiler and linker options in order to ensure that the MKL and thus BLAS/LAPACK are used. See [https://software.intel.com/en-us/mkl-linux-developer-guide-using-the-mkl-compiler-option here] for more on the significance of <code>sequential</code> and other options.  


<!--T:9-->
If you are using a non-Intel compiler, for example the Gnu compiler collection, then you will need to explicitly list the necessary MKL libraries during the link phase. Intel provides a tool called the [https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor MKL Link Advisor] to help you find the correct compiler and linker options.
If you are using a non-Intel compiler, for example the Gnu compiler collection, then you will need to explicitly list the necessary MKL libraries during the link phase. Intel provides a tool called the [https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor MKL Link Advisor] to help you find the correct compiler and linker options.


<!--T:10-->
The same [https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor MKL Link Advisor] tool is also useful if you receive "undefined reference" errors while using Intel compilers and <code>-mkl</code>.
The same [https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor MKL Link Advisor] tool is also useful if you receive "undefined reference" errors while using Intel compilers and <code>-mkl</code>.
</translate>
</translate>

Revision as of 18:59, 17 December 2021


This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.



Other languages:

BLAS (Basic Linear Algebra Subprogram and LAPACK (Linear Algebra PACK) are two of the most commonly used libraries in advanced research computing. They are used for vector and matrix operations that are so commonly found in a plethora of algorithms. More importantly, they are more than libraries, as they define a standard programming interface. A programming interface is a set of function definitions that can be called to accomplish specific computation, for example the dot product of two vectors of double precision numbers, or the matrix product of two hermitian matrices of complex numbers.

Beside the reference implementation done by Netlib, there exist a large number of implementations of these two standards. The performance of these implementations can vary widely depending on the hardware that is running them. For example, it is well established that the implementation provided by the Intel Math Kernel Library (Intel MKL) performs best in most situations on Intel processors. That implementation is however proprietary, and in some situations, it is preferred to use the open source implementation OpenBLAS. In 2018, AMD also released its own implementation, named AMD BLIS, which performs better on AMD processors. Previously, you may have known gotoblas and ATLAS BLAS, but those projects are no longer maintained.

Unfortunately, testing which implementation performs best for a given code and given hardware usually requires recompiling software. This is a problem when trying to create a portable software environment that works on multiple clusters. This can be fixed by using FlexiBLAS. This is an abstraction layer that allows one to swap which implementation of BLAS and LAPACK is used at runtime, rather than at compile time.

Which implementation should I use ?[edit]

For the past few years, we have been recommending to use Intel MKL as a reference implementation. This recommendation was driven by the fact that we only had Intel processors in our clusters. This was change with the arrival of Narval, which hosts AMD processors. We now recommend using FlexiBLAS when compiling code. Our FlexiBLAS module is configured such that Intel MKL will be used by default except when using AMD processors, in which case AMD BLIS will be used, offering optimal performance.

How do I compile against FlexiBLAS ?[edit]

Unfortunately, FlexiBLAS is relatively new, and not all build systems will recognize it by default. This can generally be fixed by setting the linking options to use -lflexiblas for BLAS and for LAPACK. You will typically find these options in your Makefile, or be able to pass them as parameters to configure or cmake.

Light-bulb.pngOn Narval, the flexiblas module is loaded by default. On other clusters, you may need to load the flexiblas module before being able to use it.


How do I change which implementation of BLAS/LAPACK is used at run time ?[edit]

The big benefit of using FlexiBLAS is to be able to change the implementation backend at run time by setting the environment variable FLEXIBLAS. At the time of this writing, four implementations are available: netlib, blis, imkl and openblas, but the full list can be obtained by running the command

Question.png
[name@server ~]$ flexiblas list

On Narval, we have set FLEXIBLAS=blis to use AMD's implementation by default, while on other clusters, FLEXIBLAS is left undefined, which defaults to using Intel MKL.

Using Intel MKL directly[edit]

Although we recommend using FlexiBLAS, it is still possible to use Intel MKL directly. If you are using one of the Intel compilers (e.g. ifort, icc, icpc) then the solution is to replace -lblas and -llapack with -mkl=sequential (without internal MKL threading) or -mkl (with threading) in your compiler and linker options in order to ensure that the MKL and thus BLAS/LAPACK are used. See here for more on the significance of sequential and other options.

If you are using a non-Intel compiler, for example the Gnu compiler collection, then you will need to explicitly list the necessary MKL libraries during the link phase. Intel provides a tool called the MKL Link Advisor to help you find the correct compiler and linker options.

The same MKL Link Advisor tool is also useful if you receive "undefined reference" errors while using Intel compilers and -mkl.