Anaconda/en: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
 
(81 intermediate revisions by 6 users not shown)
Line 1: Line 1:
<languages />
<languages />
[[Category:Software]]
Anaconda is a Python distribution. We ask our users to '''not install Anaconda on our clusters'''. We recommend that you consider other options like a virtual environment or a [[Apptainer]] container, for the most complicated cases.


== Description ==
==Do not install Anaconda on our clusters==


Anaconda is an open source distribution of [[Python]] and [[R]] which tries to simplify the management and deployment of modules.
We are aware of the fact that Anaconda is widely used in several domains, such as data science, AI, bioinformatics etc. Anaconda is a useful solution for simplifying the management of Python and scientific libraries on a personal computer. However, on a cluster like those supported by the Alliance, the management of these libraries and dependencies should be done by our staff, in order to ensure compatibility and optimal performance. Here is a list of reasons:


== Installation ==
* Anaconda very often installs software (compilers, scientific libraries etc.) which already exist on our clusters as modules, with a configuration that is not optimal.
* It installs binaries which are not optimized for the processor architecture on our clusters.
* It makes incorrect assumptions about the location of various system libraries.
* Anaconda uses the <tt>$HOME</tt> directory for its installation, where it writes an enormous number of files. A single Anaconda installation can easily absorb almost half of your quota for the number of files in your home directory.
* Anaconda is slower than the installation of packages via Python wheels.
* Anaconda modifies the <tt>$HOME/.bashrc</tt> file, which can easily cause conflicts.


The Python distributions installed on Compute Canada servers are compiled from the source available at the [http://www.python.org python.org site]. Users are however free to install Anaconda in their own directory. The following instructions should simplify this job and thus avoid compatibility errors.
==How to transition from Conda to virtualenv ==


In order to limit the installation time and the amount of storage needed, we encourage the installation of [https://conda.io/miniconda.html Miniconda] instead of Anaconda. Miniconda includes the conda package manager and Python. You can then use the conda command to install the software you need.
A [[Python#Creating_and_using_a_virtual_environment|virtual environment]] offers you all the functionality which you need to use Python on our clusters. Here is how to convert to the use of virtual environments if you use Anaconda on your personal computer:


=== Home directory (user-based installation) ===
# List the dependencies (requirements) of the application you want to use. To do so, you can:
## Run <code>pip show <package_name></code> from your virtual environment (if the package exists on [https://pypi.org/ PyPI])
## Or, check if there is a <tt>requirements.txt</tt> file in the Git repository.
## Or, check the variable <tt>install_requires</tt> of the file <tt>setup.py</tt>, which lists the requirements.
# Find which dependencies are Python modules and which are libraries provided by Anaconda. For example, CUDA and CuDNN are libraries which are available on Anaconda Cloud but which you should not install yourself on our clusters - they are already installed.
# Remove from the list of dependencies everything which is not a Python module (e.g. <tt>cudatoolkit</tt> and <tt>cudnn</tt>).
# Use a [[Python#Creating_and_using_a_virtual_environment|virtual environment]] in which you will install your dependencies.


By default the installation is performed in your home directory to which only you have access. If you want to share your installation with other members of your group, use the instructions in the following section.
Your software should run - if it doesn't, don't hesitate to [[Technical support|contact us]].


To install Miniconda with Python 2, execute the following command:
==Apptainer Use==
{{Command|eb Miniconda2-4.3.27.eb}}


For Miniconda with Python 3, execute the following command:
In some situations, the complexity of the dependencies of a program requires the use of a solution where you can control the entire software environment. In these situations, we recommend the tool [[Apptainer]]; note that a Docker image can be converted into an Apptainer image. The only disadvantage of Apptainer is its consumption of disk space. If your research group plans on using several images, it would be wise to collect all of them together in a single directory of the group's project space to avoid duplication.
{{Command|eb Miniconda3-4.3.27.eb}}


'''Be patient, the installation of Miniconda can take several minutes.'''
== Examples where Anaconda does not work ==
 
;R: A conda recipe forces the installation of R. This installation does not perform nearly as well as the version we provide as a module (which uses Intel MKL). This same R does not work well, and jobs launched with it may die and waste both computing resources as well as your time.
=== Project directory (group-based installation) ===
 
To install Anaconda with Python 2, use the following commands, replacing the field <code><project></code> by your project identifier:  
{{Command|eb --sticky-bit --set-gid-bit --prefix{{=}}$(readlink ~/projects/<project>) Miniconda2-4.3.27.eb}}
 
For Anaconda with Python 3, execute the following command, replacing the field <code><project></code> by your group identifier:
{{Command|eb --sticky-bit --set-gid-bit --prefix{{=}}$(readlink ~/projects/<project>) Miniconda3-4.3.27.eb}}
 
'''Soyez patient, l'installation de Miniconda peut prendre plusieurs minutes.'''
 
L'installation de Miniconda produit automatiquement un fichier module que vous pourrez charger à l'aide de la commande du même nom. Pour que la commande module trouve le fichier en question, vous devez lui indiquer où le trouver à l'aide de la commande suivante en remplaçant la chaîne <code><project></code> par l'identifiant du projet:
{{Command|module use ~/projects/<project>/modules/*/Core}}
 
Si vous souhaitez que le module soit disponible chaque fois que vous vous connectez, vous pouvez ajouter la commande précédente à la fin de votre fichier <code>.bashrc</code>.
 
=== Autre version ===
 
Pour l'installation d'une autre version d'Anaconda, nous vous recommandons de contacter l'équipe de support par courriel à  [mailto:support@calculcanada.ca support@calculcanada.ca] .
 
== Utilisation ==
 
Charger le module Miniconda 2
{{Command|module load miniconda2}}
 
ou Miniconda 3
{{Command|module load miniconda3}}
 
=== conda ===
 
Vous pouvez installer des modules Python dans votre installation de Miniconda en utilisant la commande conda directement. Par exemple, pour installer le module theano
{{Command|conda install theano}}

Latest revision as of 20:52, 29 February 2024

Other languages:

Anaconda is a Python distribution. We ask our users to not install Anaconda on our clusters. We recommend that you consider other options like a virtual environment or a Apptainer container, for the most complicated cases.

Do not install Anaconda on our clusters

We are aware of the fact that Anaconda is widely used in several domains, such as data science, AI, bioinformatics etc. Anaconda is a useful solution for simplifying the management of Python and scientific libraries on a personal computer. However, on a cluster like those supported by the Alliance, the management of these libraries and dependencies should be done by our staff, in order to ensure compatibility and optimal performance. Here is a list of reasons:

  • Anaconda very often installs software (compilers, scientific libraries etc.) which already exist on our clusters as modules, with a configuration that is not optimal.
  • It installs binaries which are not optimized for the processor architecture on our clusters.
  • It makes incorrect assumptions about the location of various system libraries.
  • Anaconda uses the $HOME directory for its installation, where it writes an enormous number of files. A single Anaconda installation can easily absorb almost half of your quota for the number of files in your home directory.
  • Anaconda is slower than the installation of packages via Python wheels.
  • Anaconda modifies the $HOME/.bashrc file, which can easily cause conflicts.

How to transition from Conda to virtualenv

A virtual environment offers you all the functionality which you need to use Python on our clusters. Here is how to convert to the use of virtual environments if you use Anaconda on your personal computer:

  1. List the dependencies (requirements) of the application you want to use. To do so, you can:
    1. Run pip show <package_name> from your virtual environment (if the package exists on PyPI)
    2. Or, check if there is a requirements.txt file in the Git repository.
    3. Or, check the variable install_requires of the file setup.py, which lists the requirements.
  2. Find which dependencies are Python modules and which are libraries provided by Anaconda. For example, CUDA and CuDNN are libraries which are available on Anaconda Cloud but which you should not install yourself on our clusters - they are already installed.
  3. Remove from the list of dependencies everything which is not a Python module (e.g. cudatoolkit and cudnn).
  4. Use a virtual environment in which you will install your dependencies.

Your software should run - if it doesn't, don't hesitate to contact us.

Apptainer Use

In some situations, the complexity of the dependencies of a program requires the use of a solution where you can control the entire software environment. In these situations, we recommend the tool Apptainer; note that a Docker image can be converted into an Apptainer image. The only disadvantage of Apptainer is its consumption of disk space. If your research group plans on using several images, it would be wise to collect all of them together in a single directory of the group's project space to avoid duplication.

Examples where Anaconda does not work

R
A conda recipe forces the installation of R. This installation does not perform nearly as well as the version we provide as a module (which uses Intel MKL). This same R does not work well, and jobs launched with it may die and waste both computing resources as well as your time.