Anaconda/en: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
 
(48 intermediate revisions by 6 users not shown)
Line 1: Line 1:
<languages />
<languages />
[[Category:Software]]
[[Category:Software]]
== Description ==
Anaconda is a Python distribution. We ask our users to '''not install Anaconda on our clusters'''. We recommend that you consider other options like a virtual environment or a [[Apptainer]] container, for the most complicated cases.


Anaconda is an open source distribution of [[Python]] and [[R]] which aims to simplify the management and deployment of modules.
==Do not install Anaconda on our clusters==


== Installation ==
We are aware of the fact that Anaconda is widely used in several domains, such as data science, AI, bioinformatics etc. Anaconda is a useful solution for simplifying the management of Python and scientific libraries on a personal computer. However, on a cluster like those supported by the Alliance, the management of these libraries and dependencies should be done by our staff, in order to ensure compatibility and optimal performance. Here is a list of reasons:


Python distributions installed on Compute Canada servers are compiled from the source available at the [http://www.python.org python.org site]. Users are however free to install Anaconda in their own directory. The following instructions should simplify this job and thus avoid compatibility errors.
* Anaconda very often installs software (compilers, scientific libraries etc.) which already exist on our clusters as modules, with a configuration that is not optimal.
* It installs binaries which are not optimized for the processor architecture on our clusters.
* It makes incorrect assumptions about the location of various system libraries.
* Anaconda uses the <tt>$HOME</tt> directory for its installation, where it writes an enormous number of files. A single Anaconda installation can easily absorb almost half of your quota for the number of files in your home directory.  
* Anaconda is slower than the installation of packages via Python wheels.
* Anaconda modifies the <tt>$HOME/.bashrc</tt> file, which can easily cause conflicts.


In order to limit the installation time and the amount of storage needed, we encourage the installation of [https://conda.io/miniconda.html Miniconda] instead of Anaconda. Miniconda includes the conda package manager and Python. You can then use the conda command to install the software you need.
==How to transition from Conda to virtualenv ==


=== Home directory (user-based installation) ===
A [[Python#Creating_and_using_a_virtual_environment|virtual environment]] offers you all the functionality which you need to use Python on our clusters. Here is how to convert to the use of virtual environments if you use Anaconda on your personal computer:


By default the installation is performed in your home directory to which only you have access. If you want to share your installation with other members of your group, use the instructions in the following section.
# List the dependencies (requirements) of the application you want to use. To do so, you can:
## Run <code>pip show <package_name></code> from your virtual environment (if the package exists on [https://pypi.org/ PyPI])
## Or, check if there is a <tt>requirements.txt</tt> file in the Git repository.
## Or, check the variable <tt>install_requires</tt> of the file <tt>setup.py</tt>, which lists the requirements.
# Find which dependencies are Python modules and which are libraries provided by Anaconda. For example, CUDA and CuDNN are libraries which are available on Anaconda Cloud but which you should not install yourself on our clusters - they are already installed.
# Remove from the list of dependencies everything which is not a Python module (e.g. <tt>cudatoolkit</tt> and <tt>cudnn</tt>).
# Use a [[Python#Creating_and_using_a_virtual_environment|virtual environment]] in which you will install your dependencies.


To install Miniconda with Python 2, execute the following command:
Your software should run - if it doesn't, don't hesitate to [[Technical support|contact us]].
{{Command|eb Miniconda2-4.3.27.eb}}


For Miniconda with Python 3, execute the following command:
==Apptainer Use==
{{Command|eb Miniconda3-4.3.27.eb}}


'''Be patient, the installation of Miniconda can take several minutes.'''
In some situations, the complexity of the dependencies of a program requires the use of a solution where you can control the entire software environment. In these situations, we recommend the tool [[Apptainer]]; note that a Docker image can be converted into an Apptainer image. The only disadvantage of Apptainer is its consumption of disk space. If your research group plans on using several images, it would be wise to collect all of them together in a single directory of the group's project space to avoid duplication.


=== Project directory (group-based installation) ===
== Examples where Anaconda does not work ==
 
;R: A conda recipe forces the installation of R. This installation does not perform nearly as well as the version we provide as a module (which uses Intel MKL). This same R does not work well, and jobs launched with it may die and waste both computing resources as well as your time.
To install Anaconda with Python 2, use the following commands, replacing <code><project></code> by your project identifier:  
{{Command|eb --sticky-bit --set-gid-bit --prefix{{=}}$(readlink ~/projects/<project>) Miniconda2-4.3.27.eb}}
 
For Anaconda with Python 3, execute the following command, replacing <code><project></code> by your group identifier:
{{Command|eb --sticky-bit --set-gid-bit --prefix{{=}}$(readlink ~/projects/<project>) Miniconda3-4.3.27.eb}}
 
'''Be patient, the installation of Miniconda can take several minutes.'''
 
The installation of Miniconda automatically creates a module file that you can load with the command of the same name. For the module command to find this file, you need to tell it where to look with the following command, replacing the field <code><project></code> by your project identifier:
{{Command|module use ~/projects/<project>/modules/*/Core}}
 
If you want this module to be available every time you connect to the cluster, you can add the preceding command to the end of your <code>.bashrc</code> file.
 
=== Other versions ===
 
<div class="mw-translate-fuzzy">
To install any other version of Anaconda, we recommend that you contact the Compute Canada support staff by e-mail at [mailto:support@computecanada.ca support@calculcanada.ca].
</div>
 
== Usage ==
 
Load the Miniconda 2 module with
{{Command|module load miniconda2}}
 
or the Miniconda 3 module with
{{Command|module load miniconda3}}
 
=== conda ===
 
You can install Python modules in your Miniconda installation by using the command <tt>conda</tt> directly. For example, to install the theano module, use
{{Command|conda install theano}}

Latest revision as of 20:52, 29 February 2024

Other languages:

Anaconda is a Python distribution. We ask our users to not install Anaconda on our clusters. We recommend that you consider other options like a virtual environment or a Apptainer container, for the most complicated cases.

Do not install Anaconda on our clusters

We are aware of the fact that Anaconda is widely used in several domains, such as data science, AI, bioinformatics etc. Anaconda is a useful solution for simplifying the management of Python and scientific libraries on a personal computer. However, on a cluster like those supported by the Alliance, the management of these libraries and dependencies should be done by our staff, in order to ensure compatibility and optimal performance. Here is a list of reasons:

  • Anaconda very often installs software (compilers, scientific libraries etc.) which already exist on our clusters as modules, with a configuration that is not optimal.
  • It installs binaries which are not optimized for the processor architecture on our clusters.
  • It makes incorrect assumptions about the location of various system libraries.
  • Anaconda uses the $HOME directory for its installation, where it writes an enormous number of files. A single Anaconda installation can easily absorb almost half of your quota for the number of files in your home directory.
  • Anaconda is slower than the installation of packages via Python wheels.
  • Anaconda modifies the $HOME/.bashrc file, which can easily cause conflicts.

How to transition from Conda to virtualenv

A virtual environment offers you all the functionality which you need to use Python on our clusters. Here is how to convert to the use of virtual environments if you use Anaconda on your personal computer:

  1. List the dependencies (requirements) of the application you want to use. To do so, you can:
    1. Run pip show <package_name> from your virtual environment (if the package exists on PyPI)
    2. Or, check if there is a requirements.txt file in the Git repository.
    3. Or, check the variable install_requires of the file setup.py, which lists the requirements.
  2. Find which dependencies are Python modules and which are libraries provided by Anaconda. For example, CUDA and CuDNN are libraries which are available on Anaconda Cloud but which you should not install yourself on our clusters - they are already installed.
  3. Remove from the list of dependencies everything which is not a Python module (e.g. cudatoolkit and cudnn).
  4. Use a virtual environment in which you will install your dependencies.

Your software should run - if it doesn't, don't hesitate to contact us.

Apptainer Use

In some situations, the complexity of the dependencies of a program requires the use of a solution where you can control the entire software environment. In these situations, we recommend the tool Apptainer; note that a Docker image can be converted into an Apptainer image. The only disadvantage of Apptainer is its consumption of disk space. If your research group plans on using several images, it would be wise to collect all of them together in a single directory of the group's project space to avoid duplication.

Examples where Anaconda does not work

R
A conda recipe forces the installation of R. This installation does not perform nearly as well as the version we provide as a module (which uses Intel MKL). This same R does not work well, and jobs launched with it may die and waste both computing resources as well as your time.