Anaconda/en: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
 
(91 intermediate revisions by 6 users not shown)
Line 1: Line 1:
<languages />
<languages />
[[Category:Software]]
Anaconda is a Python distribution. We ask our users to '''not install Anaconda on our clusters'''. We recommend that you consider other options like a virtual environment or a [[Apptainer]] container, for the most complicated cases.


== Description ==
==Do not install Anaconda on our clusters==


Anaconda is an open source distribution of [[Python]] and [[R]] which tries to simplify the management and deployment of modules.
We are aware of the fact that Anaconda is widely used in several domains, such as data science, AI, bioinformatics etc. Anaconda is a useful solution for simplifying the management of Python and scientific libraries on a personal computer. However, on a cluster like those supported by the Alliance, the management of these libraries and dependencies should be done by our staff, in order to ensure compatibility and optimal performance. Here is a list of reasons:


== Installation ==
* Anaconda very often installs software (compilers, scientific libraries etc.) which already exist on our clusters as modules, with a configuration that is not optimal.
* It installs binaries which are not optimized for the processor architecture on our clusters.
* It makes incorrect assumptions about the location of various system libraries.
* Anaconda uses the <tt>$HOME</tt> directory for its installation, where it writes an enormous number of files. A single Anaconda installation can easily absorb almost half of your quota for the number of files in your home directory.
* Anaconda is slower than the installation of packages via Python wheels.
* Anaconda modifies the <tt>$HOME/.bashrc</tt> file, which can easily cause conflicts.


Les distributions de Python installées sur les serveurs de Calcul Canada sont compilées à partir des sources disponibles sur [http://www.python.org python.org]. Les usagers sont cependant libres d'installer Anaconda dans leur propre répertoire. Les instructions suivantes permettent de simplifier cette tâche et ainsi éviter des erreurs de compatibilité.
==How to transition from Conda to virtualenv ==


Afin de limiter le temps d'installation et le stockage requis, nous favorisons l'installation de [https://conda.io/miniconda.html Miniconda] à celle d'Anaconda. Miniconda fournit le gestionnaire de paquets conda et Python. Vous êtes ensuite libre d'utiliser la commande conda pour installer les logiciels dont vous avez besoin.
A [[Python#Creating_and_using_a_virtual_environment|virtual environment]] offers you all the functionality which you need to use Python on our clusters. Here is how to convert to the use of virtual environments if you use Anaconda on your personal computer:


=== Répertoire personnel (une installation par usager) ===
# List the dependencies (requirements) of the application you want to use. To do so, you can:
## Run <code>pip show <package_name></code> from your virtual environment (if the package exists on [https://pypi.org/ PyPI])
## Or, check if there is a <tt>requirements.txt</tt> file in the Git repository.
## Or, check the variable <tt>install_requires</tt> of the file <tt>setup.py</tt>, which lists the requirements.
# Find which dependencies are Python modules and which are libraries provided by Anaconda. For example, CUDA and CuDNN are libraries which are available on Anaconda Cloud but which you should not install yourself on our clusters - they are already installed.
# Remove from the list of dependencies everything which is not a Python module (e.g. <tt>cudatoolkit</tt> and <tt>cudnn</tt>).
# Use a [[Python#Creating_and_using_a_virtual_environment|virtual environment]] in which you will install your dependencies.


Par défaut, l'installation est réalisée dans votre répertoire personnel, seul vous pouvez y accéder. Si vous souhaitez partager votre installation avec l'ensemble de votre groupe, utilisez plutôt les instructions de la section suivante.
Your software should run - if it doesn't, don't hesitate to [[Technical support|contact us]].


Pour installer Miniconda avec Python 2, lancez la commande suivante:
==Apptainer Use==
{{Command|eb Miniconda2-4.3.27.eb}}


Pour Miniconda avec Python 3, lancez la commande suivante:
In some situations, the complexity of the dependencies of a program requires the use of a solution where you can control the entire software environment. In these situations, we recommend the tool [[Apptainer]]; note that a Docker image can be converted into an Apptainer image. The only disadvantage of Apptainer is its consumption of disk space. If your research group plans on using several images, it would be wise to collect all of them together in a single directory of the group's project space to avoid duplication.
{{Command|eb Miniconda3-4.3.27.eb}}


'''Soyez patient, l'installation de Miniconda peut prendre plusieurs minutes.'''
== Examples where Anaconda does not work ==
 
;R: A conda recipe forces the installation of R. This installation does not perform nearly as well as the version we provide as a module (which uses Intel MKL). This same R does not work well, and jobs launched with it may die and waste both computing resources as well as your time.
=== Répertoire de projet (une installation par groupe) ===
 
Pour installer Anaconda avec Python 2, utilisez la commandes suivante en remplaçant la chaîne <code><project></code> par l'identifiant du projet:
{{Command|eb --sticky-bit --set-gid-bit --prefix{{=}}$(readlink ~/projects/<project>) Miniconda2-4.3.27.eb}}
 
Pour Anaconda avec Python 3, lancez la commande suivante en remplaçant la chaîne <code><project></code> par l'identifiant du projet:
{{Command|eb --sticky-bit --set-gid-bit --prefix{{=}}$(readlink ~/projects/<project>) Miniconda3-4.3.27.eb}}
 
'''Soyez patient, l'installation de Miniconda peut prendre plusieurs minutes.'''
 
L'installation de Miniconda produit automatiquement un fichier module que vous pourrez charger à l'aide de la commande du même nom. Pour que la commande module trouve le fichier en question, vous devez lui indiquer où le trouver à l'aide de la commande suivante en remplaçant la chaîne <code><project></code> par l'identifiant du projet:
{{Command|module use ~/projects/<project>/modules/*/Core}}
 
Si vous souhaitez que le module soit disponible chaque fois que vous vous connectez, vous pouvez ajouter la commande précédente à la fin de votre fichier <code>.bashrc</code>.
 
=== Autre version ===
 
Pour l'installation d'une autre version d'Anaconda, nous vous recommandons de contacter l'équipe de support par courriel à  [mailto:support@calculcanada.ca support@calculcanada.ca] .
 
== Utilisation ==
 
Charger le module Miniconda 2
{{Command|module load miniconda2}}
 
ou Miniconda 3
{{Command|module load miniconda3}}
 
=== conda ===
 
Vous pouvez installer des modules Python dans votre installation de Miniconda en utilisant la commande conda directement. Par exemple, pour installer le module theano
{{Command|conda install theano}}

Latest revision as of 20:52, 29 February 2024

Other languages:

Anaconda is a Python distribution. We ask our users to not install Anaconda on our clusters. We recommend that you consider other options like a virtual environment or a Apptainer container, for the most complicated cases.

Do not install Anaconda on our clusters

We are aware of the fact that Anaconda is widely used in several domains, such as data science, AI, bioinformatics etc. Anaconda is a useful solution for simplifying the management of Python and scientific libraries on a personal computer. However, on a cluster like those supported by the Alliance, the management of these libraries and dependencies should be done by our staff, in order to ensure compatibility and optimal performance. Here is a list of reasons:

  • Anaconda very often installs software (compilers, scientific libraries etc.) which already exist on our clusters as modules, with a configuration that is not optimal.
  • It installs binaries which are not optimized for the processor architecture on our clusters.
  • It makes incorrect assumptions about the location of various system libraries.
  • Anaconda uses the $HOME directory for its installation, where it writes an enormous number of files. A single Anaconda installation can easily absorb almost half of your quota for the number of files in your home directory.
  • Anaconda is slower than the installation of packages via Python wheels.
  • Anaconda modifies the $HOME/.bashrc file, which can easily cause conflicts.

How to transition from Conda to virtualenv

A virtual environment offers you all the functionality which you need to use Python on our clusters. Here is how to convert to the use of virtual environments if you use Anaconda on your personal computer:

  1. List the dependencies (requirements) of the application you want to use. To do so, you can:
    1. Run pip show <package_name> from your virtual environment (if the package exists on PyPI)
    2. Or, check if there is a requirements.txt file in the Git repository.
    3. Or, check the variable install_requires of the file setup.py, which lists the requirements.
  2. Find which dependencies are Python modules and which are libraries provided by Anaconda. For example, CUDA and CuDNN are libraries which are available on Anaconda Cloud but which you should not install yourself on our clusters - they are already installed.
  3. Remove from the list of dependencies everything which is not a Python module (e.g. cudatoolkit and cudnn).
  4. Use a virtual environment in which you will install your dependencies.

Your software should run - if it doesn't, don't hesitate to contact us.

Apptainer Use

In some situations, the complexity of the dependencies of a program requires the use of a solution where you can control the entire software environment. In these situations, we recommend the tool Apptainer; note that a Docker image can be converted into an Apptainer image. The only disadvantage of Apptainer is its consumption of disk space. If your research group plans on using several images, it would be wise to collect all of them together in a single directory of the group's project space to avoid duplication.

Examples where Anaconda does not work

R
A conda recipe forces the installation of R. This installation does not perform nearly as well as the version we provide as a module (which uses Intel MKL). This same R does not work well, and jobs launched with it may die and waste both computing resources as well as your time.