Anaconda/en: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
(Created page with "==Do not install Anaconda on our clusters==")
No edit summary
 
(30 intermediate revisions by 4 users not shown)
Line 1: Line 1:
<languages />
<languages />
[[Category:Software]]
[[Category:Software]]
<div class="mw-translate-fuzzy">
Anaconda is a Python distribution. We ask our users to '''not install Anaconda on our clusters'''. We recommend that you consider other options like a virtual environment or a [[Apptainer]] container, for the most complicated cases.  
'''Attention:''' While Conda works well in a desktop environment, it tends to create more problems than it solves on a cluster. For example, Conda very often installs software (compilers, scientific libraries etc.) which is already available on the Compute Canada clusters in the form of modules but with a far from ideal configuration. With the installation of all of this additional software by Conda, you also risk exceeding the quota on the number of files in your home directory.
</div>


==Do not install Anaconda on our clusters==
==Do not install Anaconda on our clusters==


Nous sommes conscients qu'Anaconda est largement utilisé dans plusieurs domaines étudiés par nos utilisateurs (la science des données, l'IA, la bioinformatique, etc). Anaconda est une solution intéressante pour simplifier la gestion de Python et de librairies sur un ordinateur personnel. Cependant, sur une grappe comme celles maintenues par Calcul Canada, la gestion des librairies doit être faite par notre personnel, afin d'assurer une compatibilité et une performance maximales. Voici une liste de raisons:
We are aware of the fact that Anaconda is widely used in several domains, such as data science, AI, bioinformatics etc. Anaconda is a useful solution for simplifying the management of Python and scientific libraries on a personal computer. However, on a cluster like those supported by the Alliance, the management of these libraries and dependencies should be done by our staff, in order to ensure compatibility and optimal performance. Here is a list of reasons:


* Anaconda installe très souvent des logiciels (compilateurs, bibliothèques scientifiques etc.) qui existent déjà sur les grappes de Calcul Canada comme modules, avec une configuration qui n'est pas optimale.
* Anaconda very often installs software (compilers, scientific libraries etc.) which already exist on our clusters as modules, with a configuration that is not optimal.  
* installe des binaires qui ne sont pas optimisés pour les processeurs de nos grappes.
* It installs binaries which are not optimized for the processor architecture on our clusters.
* fait de mauvaises suppositions sur l'emplacement de bibliothèques.
* It makes incorrect assumptions about the location of various system libraries.  
* s'installe dans le <code>$HOME</code> par défaut, où il place une énorme quantité de fichiers. L'installation d'Anaconda seule peut prendre près de la moitié de votre quota sur le nombre de fichiers dans votre espace personnel.
* Anaconda uses the <tt>$HOME</tt> directory for its installation, where it writes an enormous number of files. A single Anaconda installation can easily absorb almost half of your quota for the number of files in your home directory.  
* est plus lent pour installer des paquets
* Anaconda is slower than the installation of packages via Python wheels.
* modifie <code>$HOME/.bashrc</code>, ce qui peut causer des conflits.
* Anaconda modifies the <tt>$HOME/.bashrc</tt> file, which can easily cause conflicts.


== Comment transitionner de Conda vers Virtualenv ==
==How to transition from Conda to virtualenv ==


[[Python#Creating_and_using_a_virtual_environment|Virtualenv]] vous offre toutes les fonctionnalités dont vous avez besoin pour utiliser Python sur nos grappes. Voici comment passer à Virtualenv si vous utilisez Anaconda sur votre ordinateur personnel:
A [[Python#Creating_and_using_a_virtual_environment|virtual environment]] offers you all the functionality which you need to use Python on our clusters. Here is how to convert to the use of virtual environments if you use Anaconda on your personal computer:


# Listez les dépendances (requirements) de l'application que vous voulez utiliser.
# List the dependencies (requirements) of the application you want to use. To do so, you can:
# Trouvez quelles dépendances sont des paquets Python, et lesquelles sont des librairies fournies par Anaconda. Par exemple, CUDA et CuDNN sont des librairies disponible sur l'Anaconda Cloud, mais que vous ne devez pas installer vous-même sur nos grappes. Elles sont déjà installées.
## Run <code>pip show <package_name></code> from your virtual environment (if the package exists on [https://pypi.org/ PyPI])
# Retirez de la liste de dépendance tout ce qui n'est pas un paquet Python (par exemple, retirez <code>cudatoolkit</code> et <code>cudnn</code>).
## Or, check if there is a <tt>requirements.txt</tt> file in the Git repository.
# Utilisez un [[Python#Creating_and_using_a_virtual_environment|virtualenv]], dans lequel vous installerez ces dépendances.
## Or, check the variable <tt>install_requires</tt> of the file <tt>setup.py</tt>, which lists the requirements.
# Find which dependencies are Python modules and which are libraries provided by Anaconda. For example, CUDA and CuDNN are libraries which are available on Anaconda Cloud but which you should not install yourself on our clusters - they are already installed.
# Remove from the list of dependencies everything which is not a Python module (e.g. <tt>cudatoolkit</tt> and <tt>cudnn</tt>).
# Use a [[Python#Creating_and_using_a_virtual_environment|virtual environment]] in which you will install your dependencies.


Votre application devrait fonctionner. Si ce n'est pas le cas, n'hésitez pas à contacter notre soutien technique.
Your software should run - if it doesn't, don't hesitate to [[Technical support|contact us]].
 
==Apptainer Use==
 
In some situations, the complexity of the dependencies of a program requires the use of a solution where you can control the entire software environment. In these situations, we recommend the tool [[Apptainer]]; note that a Docker image can be converted into an Apptainer image. The only disadvantage of Apptainer is its consumption of disk space. If your research group plans on using several images, it would be wise to collect all of them together in a single directory of the group's project space to avoid duplication.
 
== Examples where Anaconda does not work ==
;R: A conda recipe forces the installation of R. This installation does not perform nearly as well as the version we provide as a module (which uses Intel MKL). This same R does not work well, and jobs launched with it may die and waste both computing resources as well as your time.

Latest revision as of 20:52, 29 February 2024

Other languages:

Anaconda is a Python distribution. We ask our users to not install Anaconda on our clusters. We recommend that you consider other options like a virtual environment or a Apptainer container, for the most complicated cases.

Do not install Anaconda on our clusters

We are aware of the fact that Anaconda is widely used in several domains, such as data science, AI, bioinformatics etc. Anaconda is a useful solution for simplifying the management of Python and scientific libraries on a personal computer. However, on a cluster like those supported by the Alliance, the management of these libraries and dependencies should be done by our staff, in order to ensure compatibility and optimal performance. Here is a list of reasons:

  • Anaconda very often installs software (compilers, scientific libraries etc.) which already exist on our clusters as modules, with a configuration that is not optimal.
  • It installs binaries which are not optimized for the processor architecture on our clusters.
  • It makes incorrect assumptions about the location of various system libraries.
  • Anaconda uses the $HOME directory for its installation, where it writes an enormous number of files. A single Anaconda installation can easily absorb almost half of your quota for the number of files in your home directory.
  • Anaconda is slower than the installation of packages via Python wheels.
  • Anaconda modifies the $HOME/.bashrc file, which can easily cause conflicts.

How to transition from Conda to virtualenv

A virtual environment offers you all the functionality which you need to use Python on our clusters. Here is how to convert to the use of virtual environments if you use Anaconda on your personal computer:

  1. List the dependencies (requirements) of the application you want to use. To do so, you can:
    1. Run pip show <package_name> from your virtual environment (if the package exists on PyPI)
    2. Or, check if there is a requirements.txt file in the Git repository.
    3. Or, check the variable install_requires of the file setup.py, which lists the requirements.
  2. Find which dependencies are Python modules and which are libraries provided by Anaconda. For example, CUDA and CuDNN are libraries which are available on Anaconda Cloud but which you should not install yourself on our clusters - they are already installed.
  3. Remove from the list of dependencies everything which is not a Python module (e.g. cudatoolkit and cudnn).
  4. Use a virtual environment in which you will install your dependencies.

Your software should run - if it doesn't, don't hesitate to contact us.

Apptainer Use

In some situations, the complexity of the dependencies of a program requires the use of a solution where you can control the entire software environment. In these situations, we recommend the tool Apptainer; note that a Docker image can be converted into an Apptainer image. The only disadvantage of Apptainer is its consumption of disk space. If your research group plans on using several images, it would be wise to collect all of them together in a single directory of the group's project space to avoid duplication.

Examples where Anaconda does not work

R
A conda recipe forces the installation of R. This installation does not perform nearly as well as the version we provide as a module (which uses Intel MKL). This same R does not work well, and jobs launched with it may die and waste both computing resources as well as your time.