rsnt_translations
56,420
edits
No edit summary Tag: Manual revert |
No edit summary |
||
Line 41: | Line 41: | ||
* [https://www.docker.com/ Docker] | * [https://www.docker.com/ Docker] | ||
** Using Docker on an multi-user cluster creates security risks, therefore we do not make Docker available on our HPC clusters. | ** Using Docker on an multi-user cluster creates security risks, therefore we do not make Docker available on our HPC clusters. | ||
** You can install and use Docker on your own computer and use it to create an Apptainer image, which can then be uploaded to an HPC cluster as outlined in | ** You can install and use Docker on your own computer and use it to create an Apptainer image, which can then be uploaded to an HPC cluster as outlined in <b>[[#Creating_an_Apptainer_Container_From_a_Dockerfile|this section]]</b> later on this page. | ||
==Other items== <!--T:11--> | ==Other items== <!--T:11--> | ||
===General=== | ===General=== | ||
* In order to use Apptainer you must have a container | * In order to use Apptainer you must have a container <b>image</b>, e.g., a <code>.sif</code> file or a "sandbox" directory created previously. If you don't already have an image or a sandbox, see the section on <b>[[#Building_an_Apptainer_Container/Image|building an image]]</b> below. | ||
* While Apptainer is installed and available for use, using Apptainer will require you to install and/or build all software you will need to make use of in your container. In many instances, | * While Apptainer is installed and available for use, using Apptainer will require you to install and/or build all software you will need to make use of in your container. In many instances, <b>[[Available_software|we already have such software installed on our clusters]]</b> so there is often no need to create a container with the same installed in it. | ||
===<code>sudo</code>=== <!--T:12--> | ===<code>sudo</code>=== <!--T:12--> | ||
Line 76: | Line 76: | ||
<!--T:19--> | <!--T:19--> | ||
Software that is run inside a container runs in a different environment using different libraries and tools than what is installed on the host system. It is, therefore, important to run programs within containers by | Software that is run inside a container runs in a different environment using different libraries and tools than what is installed on the host system. It is, therefore, important to run programs within containers by <b>not</b> using any environment settings or software defined outside of the container. Unfortunately, by default, Apptainer will run adopting the shell environment of the host and this can result in issues when running programs. To avoid such issues, when using <code>apptainer run</code>, <code>apptainer shell</code>, <code>apptainer exec</code>, and/or </code>apptainer instance</code>, use one of these options (with more preference to those options listed earlier in the table below): | ||
<!--T:20--> | <!--T:20--> | ||
Line 102: | Line 102: | ||
* The workdir can be removed if there are no live containers using it. | * The workdir can be removed if there are no live containers using it. | ||
* When using Apptainer in an <code>salloc</code>, in an <code>sbatch</code> job, or when using [JupyterHub] on our clusters, use <code>${SLURM_TMPDIR}</code> for the "workdir" location, e.g., <code>-W ${SLURM_TMPDIR}</code>. | * When using Apptainer in an <code>salloc</code>, in an <code>sbatch</code> job, or when using [JupyterHub] on our clusters, use <code>${SLURM_TMPDIR}</code> for the "workdir" location, e.g., <code>-W ${SLURM_TMPDIR}</code>. | ||
** ASIDE: One should | ** ASIDE: One should <b>not</b> be running programs (including Apptainer) on a login node. Use an interactive <code>salloc</code> job. | ||
* When using bind mounts, see the [[#Bind_Mounts|section on bind mounts]] below since not all Alliance clusters are the same concerning the exact bind mounts needed to access <code>/home</code>, <code>/project</code>, and <code>/scratch</code>. | * When using bind mounts, see the [[#Bind_Mounts|section on bind mounts]] below since not all Alliance clusters are the same concerning the exact bind mounts needed to access <code>/home</code>, <code>/project</code>, and <code>/scratch</code>. | ||
Line 230: | Line 230: | ||
<!--T:61--> | <!--T:61--> | ||
<b>IMPORTANT:</b>In addition to choose to use the above options, if you are making use of a persistent overlay image (as a separate file or contained within the SIF file) and want changes to be written to that image, it is extremely important to pass the <code>-w</code> or <code>--writable</code> option to your container. If this option is not passed to it, any changes you make to the image in the <code>apptainer shell</code> session will not be saved! | |||
==Running daemons: <code>apptainer instance</code>== <!--T:62--> | ==Running daemons: <code>apptainer instance</code>== <!--T:62--> | ||
Line 239: | Line 239: | ||
<!--T:64--> | <!--T:64--> | ||
<b>NOTE 1:</b> Don't run daemons manually without using <code>apptainer instance</code> and related commands. Apptainer works properly with other tools such as the Slurm scheduler that run on our clusters. When a job is cancelled, killed, crashes, or is otherwise finished, daemons run using <code>apptainer instance</code> will not hang or result in defunct processes. Additionally by using the <code>apptainer instance</code> command you will be able to control the daemons and programs running in the same container. | |||
<!--T:65--> | <!--T:65--> | ||
<b>NOTE 2:</b> Running daemons in your job will only run those daemons while your job runs. The scheduler will kill all processes attached to a job when the granted job time expires. If you need to run daemons continuously for longer than a job, submit a ticket asking to discuss such with staff persons. Such may require creating a cloud virtual machine to achieve such. | |||
==Running MPI programs== <!--T:66--> | ==Running MPI programs== <!--T:66--> | ||
Line 250: | Line 250: | ||
<!--T:68--> | <!--T:68--> | ||
<b>NOTE:</b> When all MPI processes are running on a single shared-memory node, there is no need to use interconnection hardware and there will be no issues running MPI programs within an Apptainer container when all MPI processes run on a single cluster node, e.g., when the slurm option <code>--nodes=1</code> is used with an <code>sbatch</code> script. Unless one <b>explicitly</b> sets the maximum number of cluster nodes used to <code>1</code>, the scheduler can choose to run an MPI program over multiple nodes. If such will run from within an Apptainer container and has not been set up to properly run, then it is possible it will fail to run. | |||
<!--T:69--> | <!--T:69--> | ||
Line 462: | Line 462: | ||
followed by runninng <code>docker image rm ID</code> (where ID is the image ID output from the <code>docker images</code> command) in order to free up the disk space associated with those other image layers on the system you are using. | followed by runninng <code>docker image rm ID</code> (where ID is the image ID output from the <code>docker images</code> command) in order to free up the disk space associated with those other image layers on the system you are using. | ||
=Miscellaneous | =Miscellaneous items= <!--T:131--> | ||
==Cleaning Apptainer's cache directory== <!--T:132--> | ==Cleaning Apptainer's cache directory== <!--T:132--> |