Standard software environments: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
(Spelling of Open MPI, consistency for "/" or " ", Intel compiler 2020.1.)
m (heading levels)
 
(28 intermediate revisions by 4 users not shown)
Line 2: Line 2:


<translate>
<translate>
<!--T:21-->
For questions about migration to different standard environments, please see [[Migration to the new standard environment]].


= What are standard software environments ? = <!--T:1-->
== What are standard software environments? == <!--T:1-->
Compute Canada's software environment is provided through a set of [[Utiliser_des_modules/en|modules]] which allow you to switch between different versions of software packages you may want to use. These modules are organized in a tree, with the main branches being determined by a set of tools such as compilers, MPI distributions and CUDA. At the root of this tree are modules which we call "standard environments", named <code>StdEnv</code>. As of 2020, there are three such standard environments, versioned <code>2016.4</code>, <code>2018.3</code>, and <code>2020</code>. Some major improvements happened between each of these versions. This page is describing these changes, and why you should upgrade to more recent versions. In general, new versions of software packages will get installed with the newest version of our software environment.  
Our software environments are provided through a set of [[Utiliser_des_modules/en|modules]] which allow you to switch between different versions of software packages. These modules are organized in a tree structure with the trunk made up of typical utilities provided by any Linux environment. Branches are compiler versions and sub-branches are versions of MPI or CUDA.  


== <code>StdEnv/2016.4</code> == <!--T:2-->
<!--T:15-->
This was the initial version of our software environment, and was released in 2016, with the deployment of [[Cedar]] and [[Graham]]. This branch features the compilers GCC 5.4.0 and Intel 2016.4 as default compilers, and Open MPI 2.1.1 as its default implementation of MPI. Most of the software compiled with this environment does not support AVX512 instructions provided by the Skylake processors on [[Béluga]], [[Niagara]], as well as the most recent additions to Cedar and Graham.
Standard environments identify combinations of specific compiler and MPI modules that are used most commonly by our team to build other software. These combinations are grouped in modules named <code>StdEnv</code>.


<!--T:3-->
<!--T:16-->
As of February 2023, there are four such standard environments, versioned 2023, 2020, 2018.3 and 2016.4, with each new version incorporating major improvements. Only versions 2020 and 2023 are actively supported.
 
<!--T:17-->
This page describes these changes and explains why you should upgrade to a more recent version.
In general, new versions of software packages will get installed with the newest software environment.
 
=== <code>StdEnv/2023</code> === <!--T:22-->
This is the most recent iteration of our software environment. It uses GCC 12.3.0, Intel 2023.1, and Open MPI 4.1.5 as defaults.
 
<!--T:23-->
To activate this environment, use the command  
To activate this environment, use the command  
{{Command|module load StdEnv/2016.4}}
{{Command|module load StdEnv/2023}}
 
==== Performance improvements ==== <!--T:24-->
The minimum CPU instruction set supported by this environment is AVX2, or more generally, <tt>x86-64-v3</tt>. Even the compatibility layer which provides basic Linux commands is compiled with optimisations for this instruction set.


== <code>StdEnv/2018.3</code> == <!--T:4-->
==== Changes of default modules ==== <!--T:25-->
This was the second revision of our software environment, and was released in 2018, with the deployment of [[Béluga]], and shortly after the deployment of [[Niagara]]. Defaults were upgraded to GCC 7.3.0, Intel 2018.3, and Open MPI 3.1.2. This was also the first version to really support AVX512 instructions.
GCC becomes the default compiler, instead of Intel. We compile with Intel only software which have been known to offer better performance using Intel. CUDA becomes an add-on to OpenMPI, rather than the other way around, i.e. CUDA-aware MPI is loaded at run time if CUDA is loaded. This allows to share a lot of MPI libraries across CUDA and non-CUDA branches.


<!--T:5-->
<!--T:26-->
To activate this environment, use the command
The following core modules have seen their default version upgraded:
{{Command|module load StdEnv/2018.3}}
* GCC 9.3 => GCC 12.3
* OpenMPI 4.0.3 => OpenMPI 4.1.5
* Intel compilers 2020 => 2023
* Intel MKL 2020 => Flexiblas 3.3.1 (with MKL 2023 or BLIS 0.9.0)
* CUDA 11 => CUDA 12


== <code>StdEnv/2020</code> == <!--T:6-->
=== <code>StdEnv/2020</code> === <!--T:6-->
This is the most recent iteration of our software environment, and the largest change so far. It uses GCC 9.3.0, Intel 2020.1, and Open MPI 4.0.3 as defaults. Several changes were made with this release, most of which result in performance improvements.  
This is the most recent iteration of our software environment with the most changes so far. It uses GCC 9.3.0, Intel 2020.1, and Open MPI 4.0.3 as defaults.  


<!--T:7-->
<!--T:7-->
Line 27: Line 47:
{{Command|module load StdEnv/2020}}
{{Command|module load StdEnv/2020}}


=== Performance improvements === <!--T:8-->
==== Performance improvements ==== <!--T:8-->
Binaries compiled with the Intel compiler now automatically support both AVX2 and AVX512 instruction sets. In technical terms, we say that they are multi-architecture binaries, also known as [https://en.wikipedia.org/wiki/Fat_binary fat binaries]. This means that when running on a cluster which has multiple generations of processors, such as Cedar and Graham, you don't have to manually load one of the <tt>arch</tt> modules if you use software packages compiled with the Intel compiler.  
Binaries compiled with the Intel compiler now automatically support both AVX2 and AVX512 instruction sets. In technical terms, we call them ''multi-architecture binaries'', also known as [https://en.wikipedia.org/wiki/Fat_binary fat binaries]. This means that when running on a cluster such as Cedar and Graham which has multiple generations of processors, you don't have to manually load one of the <tt>arch</tt> modules if you use software packages generated by the Intel compiler.  


<!--T:9-->
<!--T:9-->
Many software packages which were previously installed either with GCC or with Intel are now installed at a lower level of the software hierarchy, which makes the same module visible irrespective of which compiler is loaded. For example, this is the case for the [[R]] modules, which previously required loading the <code>gcc</code> module. This is also the case for many bioinformatics software packages. This could be done because we introduced optimizations specific to CPU architectures at a level of the software hierarchy lower than the compiler level.  
Many software packages which were previously installed either with GCC or with Intel are now installed at a lower level of the software hierarchy, which makes the same module visible, irrespective of which compiler is loaded. For example, this is the case for many bioinformatics software packages as well as the [[R]] modules, which previously required loading the <code>gcc</code> module. This could be done because we introduced optimizations specific to CPU architectures at a level of the software hierarchy lower than the compiler level.  


<!--T:10-->
<!--T:10-->
We also installed a more recent version of the [https://en.wikipedia.org/wiki/GNU_C_Library GNU C Library], which introduces optimizations in some mathematical functions. This has increased the requirement on the version of the Linux Kernel (see below).  
We also installed a more recent version of the [https://en.wikipedia.org/wiki/GNU_C_Library GNU C Library], which introduces optimizations in some mathematical functions. This has increased the requirement on the version of the Linux Kernel (see below).  


=== Change in the compatibility layer === <!--T:11-->
==== Change in the compatibility layer ==== <!--T:11-->
Another major change introduced in the <code>2020</code> release is a switch between two tools for our ''compatibility layer''. This is a layer of the software hierarchy which we provide to isolate it from the underlying operating system. For example, this ensures that our software packages will work whether it is run on CentOS, Ubuntu, or Fedora systems. For the <code>2016.4</code> and <code>2018.3</code> versions, we used the [https://en.wikipedia.org/wiki/Nix_package_manager Nix package manager], while for the <code>2020</code> version, we use [https://wiki.gentoo.org/wiki/Project:Prefix Gentoo Prefix].  
Another enhancement for the 2020 release was a change in tools for our compatibility layer. The compatibility layer is between the operating system and all other software packages. This layer is designed to ensure that compilers and scientific applications will work whether they run on CentOS, Ubuntu, or Fedora. For the 2016.4 and 2018.3 versions, we used the [https://en.wikipedia.org/wiki/Nix_package_manager Nix package manager], while for the 2020 version, we used [https://wiki.gentoo.org/wiki/Project:Prefix Gentoo Prefix].  


=== Change in kernel requirement === <!--T:12-->
==== Change in kernel requirement ==== <!--T:12-->
Versions <code>2016.4</code> and <code>2018.3</code> required a Linux kernel version 2.6.32 or more recent. This supported CentOS versions starting at CentOS 6. With the <code>2020</code> version, we require a Linux kernel 3.10 or better. This means it no longer supports CentOS 6, but requires CentOS 7 instead. Other distributions usually have kernels which are much more recent, so you probably do not need to change your distribution if you are using this standard environment on something other than CentOS.
Versions 2016.4 and 2018.3 required a Linux kernel version 2.6.32 or more recent. This supported CentOS versions starting at CentOS 6. With the 2020 version, we require a Linux kernel 3.10 or better. This means it no longer supports CentOS 6, but requires CentOS 7 instead. Other distributions usually have kernels which are much more recent, so you probably don't need to change your distribution if you are using this standard environment on something other than CentOS.


= How can I change which version of <code>StdEnv</code> is my default? = <!--T:13-->
==== Module extensions ==== <!--T:18-->
Our clusters use different versions of <code>StdEnv</code> as their default version. As of August 2020, [[Cedar]] and [[Graham]] use <code>StdEnv/2016.4</code>, while [[Béluga]] uses <code>StdEnv/2018.3</code>. [[Niagara]] also defaults to <code>StdEnv/2018.3</code> if you <code>module load CCEnv StdEnv</code>. In the future, we will probably switch all of them to use <code>StdEnv/2020</code>, but that time hasn't come yet. Users can however specify their own default by running the following command (example provided for the <code>2020</code> version):
With the 2020 environment, we started installing more Python extensions inside of their corresponding core modules. For example, we installed <tt>PyQt5</tt> inside of the <tt>qt/5.12.8</tt> module so that it supports multiple versions of Python. The module system has also been adjusted so you can find such extensions. For example, if you run
{{Command|echo "module-version StdEnv/2020 default" >> $HOME/.modulerc}}
{{Command|module spider pyqt5}}
it will tell you that you can get this by loading the <tt>qt/5.12.8</tt> module.
 
=== <code>StdEnv/2018.3</code> === <!--T:4-->
{{Template:Warning
|title=Deprecated
|content=This environment is no longer supported.}}
This is the second version of our software environment. It was released in 2018 with the deployment of [[Béluga/en|Béluga]], and shortly after the deployment of [[Niagara]]. Defaults were upgraded to GCC 7.3.0, Intel 2018.3, and Open MPI 3.1.2. This is the first version to support AVX512 instructions.
 
<!--T:5-->
To activate this environment, use the command
{{Command|module load StdEnv/2018.3}}
 
=== <code>StdEnv/2016.4</code> === <!--T:2-->
{{Template:Warning
|title=Deprecated
|content=This environment is no longer supported.}}
This is the initial version of our software environment released in 2016 with the deployment of [[Cedar]] and [[Graham]]. It features GCC 5.4.0 and Intel 2016.4 as default compilers, and Open MPI 2.1.1 as its default implementation of MPI. Most of the software compiled with this environment does not support AVX512 instructions provided by the Skylake processors on [[Béluga/en|Béluga]], [[Niagara]], as well as on the most recent additions to Cedar and Graham.
 
<!--T:3-->
To activate this environment, use the command
{{Command|module load StdEnv/2016.4}}


= Do I need to reinstall/recompile my code if the <code>StdEnv</code> version changes? = <!--T:14-->
If you compile your own code, or install R or Python packages, yes, you should recompile or reinstall the packages you need using the newer version of the standard environment.


</translate>
</translate>

Latest revision as of 15:38, 5 June 2024

Other languages:

For questions about migration to different standard environments, please see Migration to the new standard environment.

What are standard software environments?[edit]

Our software environments are provided through a set of modules which allow you to switch between different versions of software packages. These modules are organized in a tree structure with the trunk made up of typical utilities provided by any Linux environment. Branches are compiler versions and sub-branches are versions of MPI or CUDA.

Standard environments identify combinations of specific compiler and MPI modules that are used most commonly by our team to build other software. These combinations are grouped in modules named StdEnv.

As of February 2023, there are four such standard environments, versioned 2023, 2020, 2018.3 and 2016.4, with each new version incorporating major improvements. Only versions 2020 and 2023 are actively supported.

This page describes these changes and explains why you should upgrade to a more recent version.

In general, new versions of software packages will get installed with the newest software environment.

StdEnv/2023[edit]

This is the most recent iteration of our software environment. It uses GCC 12.3.0, Intel 2023.1, and Open MPI 4.1.5 as defaults.

To activate this environment, use the command

Question.png
[name@server ~]$ module load StdEnv/2023

Performance improvements[edit]

The minimum CPU instruction set supported by this environment is AVX2, or more generally, x86-64-v3. Even the compatibility layer which provides basic Linux commands is compiled with optimisations for this instruction set.

Changes of default modules[edit]

GCC becomes the default compiler, instead of Intel. We compile with Intel only software which have been known to offer better performance using Intel. CUDA becomes an add-on to OpenMPI, rather than the other way around, i.e. CUDA-aware MPI is loaded at run time if CUDA is loaded. This allows to share a lot of MPI libraries across CUDA and non-CUDA branches.

The following core modules have seen their default version upgraded:

  • GCC 9.3 => GCC 12.3
  • OpenMPI 4.0.3 => OpenMPI 4.1.5
  • Intel compilers 2020 => 2023
  • Intel MKL 2020 => Flexiblas 3.3.1 (with MKL 2023 or BLIS 0.9.0)
  • CUDA 11 => CUDA 12

StdEnv/2020[edit]

This is the most recent iteration of our software environment with the most changes so far. It uses GCC 9.3.0, Intel 2020.1, and Open MPI 4.0.3 as defaults.

To activate this environment, use the command

Question.png
[name@server ~]$ module load StdEnv/2020

Performance improvements[edit]

Binaries compiled with the Intel compiler now automatically support both AVX2 and AVX512 instruction sets. In technical terms, we call them multi-architecture binaries, also known as fat binaries. This means that when running on a cluster such as Cedar and Graham which has multiple generations of processors, you don't have to manually load one of the arch modules if you use software packages generated by the Intel compiler.

Many software packages which were previously installed either with GCC or with Intel are now installed at a lower level of the software hierarchy, which makes the same module visible, irrespective of which compiler is loaded. For example, this is the case for many bioinformatics software packages as well as the R modules, which previously required loading the gcc module. This could be done because we introduced optimizations specific to CPU architectures at a level of the software hierarchy lower than the compiler level.

We also installed a more recent version of the GNU C Library, which introduces optimizations in some mathematical functions. This has increased the requirement on the version of the Linux Kernel (see below).

Change in the compatibility layer[edit]

Another enhancement for the 2020 release was a change in tools for our compatibility layer. The compatibility layer is between the operating system and all other software packages. This layer is designed to ensure that compilers and scientific applications will work whether they run on CentOS, Ubuntu, or Fedora. For the 2016.4 and 2018.3 versions, we used the Nix package manager, while for the 2020 version, we used Gentoo Prefix.

Change in kernel requirement[edit]

Versions 2016.4 and 2018.3 required a Linux kernel version 2.6.32 or more recent. This supported CentOS versions starting at CentOS 6. With the 2020 version, we require a Linux kernel 3.10 or better. This means it no longer supports CentOS 6, but requires CentOS 7 instead. Other distributions usually have kernels which are much more recent, so you probably don't need to change your distribution if you are using this standard environment on something other than CentOS.

Module extensions[edit]

With the 2020 environment, we started installing more Python extensions inside of their corresponding core modules. For example, we installed PyQt5 inside of the qt/5.12.8 module so that it supports multiple versions of Python. The module system has also been adjusted so you can find such extensions. For example, if you run

Question.png
[name@server ~]$ module spider pyqt5

it will tell you that you can get this by loading the qt/5.12.8 module.

StdEnv/2018.3[edit]

Deprecated

This environment is no longer supported.



This is the second version of our software environment. It was released in 2018 with the deployment of Béluga, and shortly after the deployment of Niagara. Defaults were upgraded to GCC 7.3.0, Intel 2018.3, and Open MPI 3.1.2. This is the first version to support AVX512 instructions.

To activate this environment, use the command

Question.png
[name@server ~]$ module load StdEnv/2018.3

StdEnv/2016.4[edit]

Deprecated

This environment is no longer supported.



This is the initial version of our software environment released in 2016 with the deployment of Cedar and Graham. It features GCC 5.4.0 and Intel 2016.4 as default compilers, and Open MPI 2.1.1 as its default implementation of MPI. Most of the software compiled with this environment does not support AVX512 instructions provided by the Skylake processors on Béluga, Niagara, as well as on the most recent additions to Cedar and Graham.

To activate this environment, use the command

Question.png
[name@server ~]$ module load StdEnv/2016.4