Anaconda/en: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
(Updating to match new version of source page)
No edit summary
 
(70 intermediate revisions by 6 users not shown)
Line 1: Line 1:
<languages />
<languages />
[[Category:Software]]
[[Category:Software]]
== Description ==
Anaconda is a Python distribution. We ask our users to '''not install Anaconda on our clusters'''. We recommend that you consider other options like a virtual environment or a [[Apptainer]] container, for the most complicated cases.


<div class="mw-translate-fuzzy">
==Do not install Anaconda on our clusters==
Anaconda is an open source distribution of [[Python]] and [[R]] which tries to simplify the management and deployment of modules.
</div>


== Installation ==
We are aware of the fact that Anaconda is widely used in several domains, such as data science, AI, bioinformatics etc. Anaconda is a useful solution for simplifying the management of Python and scientific libraries on a personal computer. However, on a cluster like those supported by the Alliance, the management of these libraries and dependencies should be done by our staff, in order to ensure compatibility and optimal performance. Here is a list of reasons:


<div class="mw-translate-fuzzy">
* Anaconda very often installs software (compilers, scientific libraries etc.) which already exist on our clusters as modules, with a configuration that is not optimal.
The Python distributions installed on Compute Canada servers are compiled from the source available at the [http://www.python.org python.org site]. Users are however free to install Anaconda in their own directory. The following instructions should simplify this job and thus avoid compatibility errors.
* It installs binaries which are not optimized for the processor architecture on our clusters.
</div>
* It makes incorrect assumptions about the location of various system libraries.
* Anaconda uses the <tt>$HOME</tt> directory for its installation, where it writes an enormous number of files. A single Anaconda installation can easily absorb almost half of your quota for the number of files in your home directory.  
* Anaconda is slower than the installation of packages via Python wheels.
* Anaconda modifies the <tt>$HOME/.bashrc</tt> file, which can easily cause conflicts.


<div class="mw-translate-fuzzy">
==How to transition from Conda to virtualenv ==
In order to limit the installation time and the amount of storage needed, we encourage the installation of [https://conda.io/miniconda.html Miniconda] instead of Anaconda. Miniconda includes the conda package manager and Python. You can then use the conda command to install the software you need.
</div>


<div class="mw-translate-fuzzy">
A [[Python#Creating_and_using_a_virtual_environment|virtual environment]] offers you all the functionality which you need to use Python on our clusters. Here is how to convert to the use of virtual environments if you use Anaconda on your personal computer:
=== Home directory (user-based installation) ===
</div>


<div class="mw-translate-fuzzy">
# List the dependencies (requirements) of the application you want to use. To do so, you can:
By default the installation is performed in your home directory to which only you have access. If you want to share your installation with other members of your group, use the instructions in the following section.
## Run <code>pip show <package_name></code> from your virtual environment (if the package exists on [https://pypi.org/ PyPI])
</div>
## Or, check if there is a <tt>requirements.txt</tt> file in the Git repository.
## Or, check the variable <tt>install_requires</tt> of the file <tt>setup.py</tt>, which lists the requirements.
# Find which dependencies are Python modules and which are libraries provided by Anaconda. For example, CUDA and CuDNN are libraries which are available on Anaconda Cloud but which you should not install yourself on our clusters - they are already installed.
# Remove from the list of dependencies everything which is not a Python module (e.g. <tt>cudatoolkit</tt> and <tt>cudnn</tt>).
# Use a [[Python#Creating_and_using_a_virtual_environment|virtual environment]] in which you will install your dependencies.


<div class="mw-translate-fuzzy">
Your software should run - if it doesn't, don't hesitate to [[Technical support|contact us]].
To install Miniconda with Python 2, execute the following command:
{{Command|eb Miniconda2-4.3.27.eb}}
</div>


<div class="mw-translate-fuzzy">
==Apptainer Use==
For Miniconda with Python 3, execute the following command:
{{Command|eb Miniconda3-4.3.27.eb}}
</div>


<div class="mw-translate-fuzzy">
In some situations, the complexity of the dependencies of a program requires the use of a solution where you can control the entire software environment. In these situations, we recommend the tool [[Apptainer]]; note that a Docker image can be converted into an Apptainer image. The only disadvantage of Apptainer is its consumption of disk space. If your research group plans on using several images, it would be wise to collect all of them together in a single directory of the group's project space to avoid duplication.
'''Be patient, the installation of Miniconda can take several minutes.'''
</div>


=== Project directory (group-based installation) ===
== Examples where Anaconda does not work ==
 
;R: A conda recipe forces the installation of R. This installation does not perform nearly as well as the version we provide as a module (which uses Intel MKL). This same R does not work well, and jobs launched with it may die and waste both computing resources as well as your time.
<div class="mw-translate-fuzzy">
To install Anaconda with Python 2, use the following commands, replacing the field <code><project></code> by your project identifier:
{{Command|eb --sticky-bit --set-gid-bit --prefix{{=}}$(readlink ~/projects/<project>) Miniconda2-4.3.27.eb}}
</div>
 
<div class="mw-translate-fuzzy">
For Anaconda with Python 3, execute the following command, replacing the field <code><project></code> by your group identifier:
{{Command|eb --sticky-bit --set-gid-bit --prefix{{=}}$(readlink ~/projects/<project>) Miniconda3-4.3.27.eb}}
</div>
 
<div class="mw-translate-fuzzy">
'''Be patient, the installation of Miniconda can take several minutes.'''
</div>
 
<div class="mw-translate-fuzzy">
The installation of Miniconda automatically creates a module file that you can load with the command of the same name. So that the module command finds this file you need to tell it where to look by means of the following command, replacing the field <code><project></code> by your project identifier:
{{Command|module use ~/projects/<project>/modules/*/Core}}
</div>
 
If you want this module to be available every time you connect to the cluster, you can add the preceding command to the end of your <code>.bashrc</code> file.
 
<div class="mw-translate-fuzzy">
=== Other versions ===
</div>
 
<div class="mw-translate-fuzzy">
To install any other version of Anaconda, we recommend that you contact the Compute Canada support staff by writing an e-mail to [mailto:support@calculcanada.ca support@calculcanada.ca].
</div>
 
== Usage ==
 
<div class="mw-translate-fuzzy">
Load the Miniconda 2 module
{{Command|module load miniconda2}}
</div>
 
<div class="mw-translate-fuzzy">
or the Miniconda 3 module
{{Command|module load miniconda3}}
</div>
 
=== conda ===
 
<div class="mw-translate-fuzzy">
You can install Python modules in your Miniconda installation by using the command <tt>conda</tt> directly. For example, to install the theano module:
{{Command|conda install theano}}
</div>

Latest revision as of 20:52, 29 February 2024

Other languages:

Anaconda is a Python distribution. We ask our users to not install Anaconda on our clusters. We recommend that you consider other options like a virtual environment or a Apptainer container, for the most complicated cases.

Do not install Anaconda on our clusters

We are aware of the fact that Anaconda is widely used in several domains, such as data science, AI, bioinformatics etc. Anaconda is a useful solution for simplifying the management of Python and scientific libraries on a personal computer. However, on a cluster like those supported by the Alliance, the management of these libraries and dependencies should be done by our staff, in order to ensure compatibility and optimal performance. Here is a list of reasons:

  • Anaconda very often installs software (compilers, scientific libraries etc.) which already exist on our clusters as modules, with a configuration that is not optimal.
  • It installs binaries which are not optimized for the processor architecture on our clusters.
  • It makes incorrect assumptions about the location of various system libraries.
  • Anaconda uses the $HOME directory for its installation, where it writes an enormous number of files. A single Anaconda installation can easily absorb almost half of your quota for the number of files in your home directory.
  • Anaconda is slower than the installation of packages via Python wheels.
  • Anaconda modifies the $HOME/.bashrc file, which can easily cause conflicts.

How to transition from Conda to virtualenv

A virtual environment offers you all the functionality which you need to use Python on our clusters. Here is how to convert to the use of virtual environments if you use Anaconda on your personal computer:

  1. List the dependencies (requirements) of the application you want to use. To do so, you can:
    1. Run pip show <package_name> from your virtual environment (if the package exists on PyPI)
    2. Or, check if there is a requirements.txt file in the Git repository.
    3. Or, check the variable install_requires of the file setup.py, which lists the requirements.
  2. Find which dependencies are Python modules and which are libraries provided by Anaconda. For example, CUDA and CuDNN are libraries which are available on Anaconda Cloud but which you should not install yourself on our clusters - they are already installed.
  3. Remove from the list of dependencies everything which is not a Python module (e.g. cudatoolkit and cudnn).
  4. Use a virtual environment in which you will install your dependencies.

Your software should run - if it doesn't, don't hesitate to contact us.

Apptainer Use

In some situations, the complexity of the dependencies of a program requires the use of a solution where you can control the entire software environment. In these situations, we recommend the tool Apptainer; note that a Docker image can be converted into an Apptainer image. The only disadvantage of Apptainer is its consumption of disk space. If your research group plans on using several images, it would be wise to collect all of them together in a single directory of the group's project space to avoid duplication.

Examples where Anaconda does not work

R
A conda recipe forces the installation of R. This installation does not perform nearly as well as the version we provide as a module (which uses Intel MKL). This same R does not work well, and jobs launched with it may die and waste both computing resources as well as your time.