Accessing CVMFS

From Alliance Doc
Revision as of 21:27, 25 July 2019 by Rptaylor (talk | contribs) (→‎Installation: provide separate instructions for Fedora)
Jump to navigation Jump to search
This article is a draft

This is not a complete article. Final details are still being put in place before you can access our software stack on your own computer or cluster.



Introduction[edit]

Compute Canada provides repositories of software and data via a file system called CVMFS. On Compute Canada systems, CVMFS is already set up for you, so the repositories are automatically available for your use. For more information on using the Compute Canada software environment, please refer to available software, using modules, Python, R and Installing software in your home directory pages.

The purpose of this page is to describe how you can install and configure CVMFS on your computer or cluster, so that you can access the same repositories (and software environment) on your system that are available on Compute Canada systems.

The software environment described on this page has been presented at Practices and Experience in Advanced Research Computing 2019 (PEARC 2019).

Before you start[edit]

Important

If you are planning to use our software environment, we ask that you first contact us at rsnt-software@computecanada.ca to let us know ahead of time.


Terms of support[edit]

The CVMFS client is provided by CERN. The Compute Canada CVMFS repositories are provided by Compute Canada without any warranty.

CVMFS requirements[edit]

  • Support for Filesystem in user space (FUSE) (in order to mount CVMFS)
  • A distribution of Linux that supports RPM or DEB packages (Ubuntu, Debian, CentOS, Fedora, Red Hat, openSUSE, etc.)
  • A caching SQUID proxy (for clusters)

Software environment requirements[edit]

Minimal requirements[edit]

  • Support operating systems:
    • Linux: with a Kernel 2.6.32 or more recent.
    • Windows: with Windows Subsystem for Linux version 2, with a distribution of Linux that matches the requirement above
    • Mac OS: only through a virtual machine
  • CPU: x86 CPU supporting at least one of SSE3, AVX, AVX2 or AVX512 instructions

Optimal requirements[edit]

  • Scheduler: Slurm or Torque (for tight integration with OpenMPI applications)
  • Network interconnect: Ethernet, InfiniBand or OmniPath (for parallel applications)
  • GPU: NVidia GPU with CUDA drivers (7.5 or more recent) installed (for CUDA-enabled applications). See below for caveats about CUDA
  • As few Linux packages installed as possible (fewer is better)

Mounting CVMFS on your computer[edit]

Pre-installation[edit]

It is recommended that the local CVMFS cache (located at /var/lib/cvmfs by default, configurable via the CVMFS_CACHE_BASE setting) be on a dedicated filesystem so that the storage usage of CVMFS is not shared with that of other applications. If you follow this recommendation, provision that filesystem before installing CVMFS. The cache should typically be about 50 GB in size, but more or less may be suitable in different situations. For more details see the client configuration documentation.

Installation[edit]

Follow the instructions relative to your operating system in order to install CVMFS. These instructions have been tested on the following distributions:

  • CentOS 6, CentOS 7
  • Fedora 28, Fedora 29
  • Debian 9
  • Ubuntu 18.04
  • Install the CERN YUM repository and GPG key:
Question.png
[name@server ~]$ sudo yum install https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest.noarch.rpm
  • Install the Compute Canada YUM repository and GPG keys:
Question.png
[name@server ~]$ sudo yum install https://package.computecanada.ca/yum/cc-cvmfs-public/Packages/computecanada-release-latest.noarch.rpm
  • Install the CVMFS client and configuration packages from those YUM repositories:
Question.png
[name@server ~]$ sudo yum install cvmfs cvmfs-config-default cvmfs-config-computecanada cvmfs-auto-setup
  • Download and install the CVMFS client RPM for your version of Fedora from https://cernvm.cern.ch/portal/filesystem/downloads.
    • Since a yum repository for CVMFS is not available for Fedora you will need to periodically check that webpage for updates and install them manually.
  • Install the Compute Canada YUM repository and GPG keys:
Question.png
[name@server ~]$ sudo yum install https://package.computecanada.ca/yum/cc-cvmfs-public/Packages/computecanada-release-latest.noarch.rpm
  • Install the Compute Canada CVMFS configuration from that YUM repository:
Question.png
[name@server ~]$ sudo yum install cvmfs-config-computecanada
  • Apply the initial client setup:
Question.png
[name@server ~]$ sudo cvmfs_config setup
  • Download and install the CVMFS repository
[name@server ~]$ sudo apt-get install lsb-release
[name@server ~]$ wget https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest_all.deb
[name@server ~]$ sudo dpkg -i cvmfs-release-latest_all.deb
[name@server ~]$ rm -f cvmfs-release-latest_all.deb
[name@server ~]$ sudo apt-get update
  • Download and install the CVMFS client from the repository you just installed.
[name@server ~]$ sudo apt-get install cvmfs cvmfs-config-default
[name@server ~]$ sudo cvmfs_config setup
  • Download and install the package for the Compute Canada CVMFS.
[name@server ~]$ wget https://git.computecanada.ca/cc-cvmfs-public/cvmfs-config/raw/1.3.2-repackaging/OtherPackages/cvmfs-config-computecanada-1.3-2.all.deb
[name@server ~]$ sudo dpkg -i cvmfs-config-computecanada-1.3-2.all.deb
  • For Windows, you first need to have Windows Subsystem for Linux, version 2. As of this writing (July 2019), this is supported only in a developer version of Windows. The instructions for installing it are here [1].
  • Once it is installed, install the Linux distribution of your choice, and follow the appropriate instructions from one of the other tabs.

Configuration[edit]

Do not create any CVMFS configuration files of the form *.conf. In order to avoid collisions with upstream configuration sources, all locally-applied configuration must be in .local files. See structure of /etc/cvmfs for more information. In particular, create the file /etc/cvmfs/default.local, with at least the following minimal configuration:

CVMFS_REPOSITORIES="cvmfs-config.computecanada.ca,soft.computecanada.ca"
CVMFS_QUOTA_LIMIT=44500
CVMFS_HTTP_PROXY="http://yourproxy:3128;http://anotherproxy:3128"
  • CVMFS_REPOSITORIES is a comma-separated list of the repositories that you are interested in.
  • CVMFS_QUOTA_LIMIT is the amount of space in MB that CVMFS will use for the local cache; it should be about 15% less than the size of the /var/lib/cvmfs filesystem.
  • The CVMFS_HTTP_PROXY setting should have at least one (preferably two) local proxy servers (such as Squid) first, then optionally regional proxies as backups. A group of proxies can be load-balanced using the syntax "proxy1|proxy2|proxy3". A proxy in the load-balancing group will be chosen randomly; if it fails, each of the other proxies in the load-balancing group will be tried before going to the next proxy outside of the load-balancing group. However, if round-robin DNS is used, this syntax isn't needed, because CVMFS will detect and understand the round-robin name resolution, so just use the name of the DNS alias and the normal semicolon delimiter instead of “|”.

For more information see the client parameters documentation.

Mounting our repositories on your own cluster[edit]

Enabling our environment in your session[edit]

Once you have mounted the CVMFS repository, enabling our environment in your sessions is as simple as running

Question.png
[name@server ~]$ source /cvmfs/soft.computecanada.ca/config/profile/bash.sh

The above command will not run anything if your user ID is below 1000. This is a safeguard, because you should not rely on our software environment for privileged operation. If you nevertheless want it to enable our environment, you can first define the environment variable FORCE_CC_CVMFS=1, with the command

Question.png
[name@server ~]$ export FORCE_CC_CVMFS=1

or you can create a file $HOME/.force_cc_cvmfs in your home folder if you want it to always be active, with

Question.png
[name@server ~]$ touch $HOME/.force_cc_cvmfs

If, on the contrary, you want to avoid enabling our environment, you can define SKIP_CC_CVMFS=1 or create the file $HOME/.skip_cc_cvmfs to ensure that the environment is never enabled in a given account.

Customizing your environment[edit]

By default, enabling our environment will automatically detect a number of features of your system, and load default modules. You can control the default behaviour by defining specific environment variables prior to enabling the environment. These are described below.

While our software environment strives to be as independent from the host operating system as possible, there are a number of system paths that are taken into account by our environment to facilitate interaction with tools installed on the host operating system.

Environment variables[edit]

CC_CLUSTER[edit]

This variable is used to identify a cluster. It is used to send some information to the system logs, as well as define behaviour relative to licensed software. By default, its value is computecanada. You may want to set the value of this variable if you want to have system logs tailored to the name of your system.

RSNT_ARCH[edit]

This environment variable is used to identify the set of CPU instructions supported by the system. By default, it will be automatically detected based on /proc/cpuinfo. You can however define it before enabling the environment if you want to force a specific set of instruction. The supported sets for our software environment are

  • sse3
  • avx
  • avx2
  • avx512

RSNT_INTERCONNECT[edit]

This environment variable is used to identify the type of interconnect supported by the system. By default, it will be automatically detected based on the presence of /sys/module/opa_vnic (for Intel OmniPath) or /sys/module/ib_core (for InfiniBand). The fall-back value is ethernet. The supported values are

  • omnipath
  • infiniband
  • ethernet

The value of this variable will trigger different options of transport protocol used in OpenMPI.

LMOD_SYSTEM_DEFAULT_MODULES[edit]

This environment variable defines which modules are loaded by default. If this is left undefined, our environment will define it to load the StdEnv module, while will load by default a version of the Intel compiler, and version of OpenMPI.

MODULERCFILE[edit]

This is an environment variable used by Lmod to define default version of modules and aliases. You can define your own modulerc file and add it to the environment variable MODULERCFILE. This will take precedence over what is defined in our environment.

System paths[edit]

/opt/software/modulefiles[edit]

If this path exists, it will automatically be added to the default MODULEPATH. This allows the use of our software environment while also maintaining locally installed modules.

$HOME/modulefiles[edit]

If this path exists, it will automatically be added to the default MODULEPATH. This allows the use of our software environment while also allowing installation of modules inside of a user's account.

/opt/software/slurm/bin, /opt/software/bin, /opt/slurm/bin[edit]

These paths are all automatically added to the default PATH. This allows your own executable to be added in the search path.

Caveats[edit]

Software packages that are not available[edit]

While on Compute Canada systems, we support a number of commercial software packages through agreements with the license owners, these will not be available through the instructions on this page. This include for example the Intel and Portland Group compilers. While the modules for the Intel and PGI compilers will be available, you will only have access to the redistributable parts of these packages, usually the shared objects. These are sufficient to run software packages compiled with these compilers, but not to compile new software.

CUDA location[edit]

For CUDA-enabled software packages, our software environment relies on having driver libraries installed in the path /usr/lib64/nvidia. On some platforms, recent NVidia drivers will install libraries in /usr/lib64. Because it is not possible to add /usr/lib64 to the LD_LIBRARY_PATH without also pulling all of the system libraries which may be incompatible with our software environment, we recommend you create symbolic links to the installed NVidia libraries into /usr/lib64/nvidia.

LD_LIBRARY_PATH[edit]

Our software environment is designed to use RUNPATH. Defining LD_LIBRARY_PATH is not recommended and can lead to the environment not working.

Missing libraries[edit]

Because we do not define LD_LIBRARY_PATH, and because our libraries are not installed in default Linux locations, binary packages, such as Anaconda, will often not find libraries that they would usually expect. Please see our documentation on Installing binary packages

dbus[edit]

For some applications, dbus needs to be installed. This needs to be installed locally, on the host operating system.