Bureaucrats, cc_docs_admin, cc_staff
2,306
edits
No edit summary |
(Marked this version for translation) |
||
Line 1: | Line 1: | ||
<languages /> | <languages /> | ||
<translate> | <translate> | ||
<!--T:1--> | |||
When submitting a job to one of the clusters, it's important to choose appropriate values for various parameters in order to ensure that your job doesn't waste resources or create problems for other users and yourself. This will ensure your job starts more quickly and that it is likely to finish correctly, producing the output you need to move your research forward. | When submitting a job to one of the clusters, it's important to choose appropriate values for various parameters in order to ensure that your job doesn't waste resources or create problems for other users and yourself. This will ensure your job starts more quickly and that it is likely to finish correctly, producing the output you need to move your research forward. | ||
<!--T:2--> | |||
For your first jobs on the cluster, it's understandably difficult to estimate how much time or memory may be needed for your job to carry out a particular simulation or analysis. This page should provide you useful tips. | For your first jobs on the cluster, it's understandably difficult to estimate how much time or memory may be needed for your job to carry out a particular simulation or analysis. This page should provide you useful tips. | ||
=Typical job submission problems= | =Typical job submission problems= <!--T:3--> | ||
<!--T:4--> | |||
* The more resources - time, memory, CPU cores, GPUs - that your job asks for, the more difficult it will be for the scheduler to find these resources and so the longer your job will wait in queue. | * The more resources - time, memory, CPU cores, GPUs - that your job asks for, the more difficult it will be for the scheduler to find these resources and so the longer your job will wait in queue. | ||
* But if not enough resources are requested, the job can be stopped if it goes beyond its time limit or its memory limit. | * But if not enough resources are requested, the job can be stopped if it goes beyond its time limit or its memory limit. | ||
Line 15: | Line 18: | ||
** The processors are waiting after read-write operations. | ** The processors are waiting after read-write operations. | ||
=Best practice tips= | =Best practice tips= <!--T:5--> | ||
<!--T:6--> | |||
The best approach is to begin by submitting a few relatively small test jobs, asking for a fairly standard amount of memory (<tt>#SBATCH --mem-per-cpu=2G</tt>) and time, for example one or two hours. | The best approach is to begin by submitting a few relatively small test jobs, asking for a fairly standard amount of memory (<tt>#SBATCH --mem-per-cpu=2G</tt>) and time, for example one or two hours. | ||
* Ideally you should already know what the answer will be in these test jobs, allowing you to verify that the software is running correctly on the cluster. | * Ideally you should already know what the answer will be in these test jobs, allowing you to verify that the software is running correctly on the cluster. | ||
Line 22: | Line 26: | ||
* If your job ends with a message about an "OOM event" this means it ran out of memory (OOM), so try doubling the memory you've requested and see if this is enough. | * If your job ends with a message about an "OOM event" this means it ran out of memory (OOM), so try doubling the memory you've requested and see if this is enough. | ||
<!--T:7--> | |||
By means of these test jobs, you should gain some familiarity with how long certain analyses require on the cluster and how much memory is needed, so that for more realistic jobs you'll be able to make an intelligent estimate. | By means of these test jobs, you should gain some familiarity with how long certain analyses require on the cluster and how much memory is needed, so that for more realistic jobs you'll be able to make an intelligent estimate. | ||
==Job duration== | ==Job duration== <!--T:8--> | ||
<!--T:9--> | |||
* For jobs which are not tests, the duration should be '''at least one hour'''. | * For jobs which are not tests, the duration should be '''at least one hour'''. | ||
** If your computation requires less than an hour, you should consider using tools like [[GLOST]], [[META:_A_package_for_job_farming | META]] or [[GNU Parallel]] to regroup several of your computations into a single Slurm job with a duration of at least an hour. Hundreds or thousands of very short jobs place undue stress on the scheduler. | ** If your computation requires less than an hour, you should consider using tools like [[GLOST]], [[META:_A_package_for_job_farming | META]] or [[GNU Parallel]] to regroup several of your computations into a single Slurm job with a duration of at least an hour. Hundreds or thousands of very short jobs place undue stress on the scheduler. | ||
Line 49: | Line 55: | ||
}} | }} | ||
<translate> | <translate> | ||
<!--T:10--> | |||
* '''Increase the estimated duration by 5% or 10%''', just in case. | * '''Increase the estimated duration by 5% or 10%''', just in case. | ||
** It's natural to leave a certain amount of room for error in the estimate, but otherwise it's in your interest for your estimate of the job's duration to be as accurate as possible. | ** It's natural to leave a certain amount of room for error in the estimate, but otherwise it's in your interest for your estimate of the job's duration to be as accurate as possible. | ||
Line 54: | Line 61: | ||
** With a checkpoint, the program writes a snapshot of its state to a diskfile and the program can then be restarted from this diskfile, at that precise point in the calculation. In this way, even if there is a power outage or some other interruption of the compute node(s) being used by your job, you won't necessarily lose much work if your program writes a checkpoint file every six or eight hours. | ** With a checkpoint, the program writes a snapshot of its state to a diskfile and the program can then be restarted from this diskfile, at that precise point in the calculation. In this way, even if there is a power outage or some other interruption of the compute node(s) being used by your job, you won't necessarily lose much work if your program writes a checkpoint file every six or eight hours. | ||
==Memory consumption== | ==Memory consumption== <!--T:11--> | ||
<!--T:12--> | |||
* Your <tt>Memory Efficiency</tt> in the output from the <tt>seff</tt> command '''should be at least 80% to 85%''' in most cases. | * Your <tt>Memory Efficiency</tt> in the output from the <tt>seff</tt> command '''should be at least 80% to 85%''' in most cases. | ||
** Much like with the duration of your job, the goal when requesting the memory is to ensure that the amount is sufficient, with a certain margin of error. | ** Much like with the duration of your job, the goal when requesting the memory is to ensure that the amount is sufficient, with a certain margin of error. | ||
Line 63: | Line 71: | ||
** There are relatively few of these high-memory nodes so your job will wait much longer to run - make sure your job really needs all this extra memory. | ** There are relatively few of these high-memory nodes so your job will wait much longer to run - make sure your job really needs all this extra memory. | ||
==Parallelism== | ==Parallelism== <!--T:13--> | ||
<!--T:14--> | |||
* By default your job will get one core on one node and this is the most sensible policy because '''most software is serial''': it can only ever make use of a single core. | * By default your job will get one core on one node and this is the most sensible policy because '''most software is serial''': it can only ever make use of a single core. | ||
** Asking for more cores and/or nodes will not make the serial program run any faster because for it to run in parallel the program's source code needs to be modified, in some cases in a very profound manner requiring a substantial investment of developer time. | ** Asking for more cores and/or nodes will not make the serial program run any faster because for it to run in parallel the program's source code needs to be modified, in some cases in a very profound manner requiring a substantial investment of developer time. | ||
Line 71: | Line 80: | ||
** You can also '''contact the development team''' to ask if the software can be run in parallel and if not, to request that such a feature be added in a future version. | ** You can also '''contact the development team''' to ask if the software can be run in parallel and if not, to request that such a feature be added in a future version. | ||
<!--T:15--> | |||
* If the program can run in parallel, the next question is '''what number of cores to use'''? | * If the program can run in parallel, the next question is '''what number of cores to use'''? | ||
** Many of the programming techniques used to allow a program to run in parallel assume the existence of a ''shared memory environment'', i.e. multiple cores can be used but they must all be located on the same node. In this case, the maximum number of cores available on a single node provides a ceiling for the number of cores you can use. | ** Many of the programming techniques used to allow a program to run in parallel assume the existence of a ''shared memory environment'', i.e. multiple cores can be used but they must all be located on the same node. In this case, the maximum number of cores available on a single node provides a ceiling for the number of cores you can use. | ||
Line 76: | Line 86: | ||
** To choose the optimal number of CPU cores, you need to '''study the [[Scalability|software's scalability]]'''. | ** To choose the optimal number of CPU cores, you need to '''study the [[Scalability|software's scalability]]'''. | ||
<!--T:16--> | |||
* A further complication with parallel execution concerns '''the use of multiple nodes''' - the software you are running must support ''distributed memory parallelism''. | * A further complication with parallel execution concerns '''the use of multiple nodes''' - the software you are running must support ''distributed memory parallelism''. | ||
** Most software able to run over more than one node uses '''the [[MPI]] standard''', so if the documentation doesn't mention MPI or consistently refers to threading and thread-based parallelism, this likely means you will need to restrict yourself to a single node. | ** Most software able to run over more than one node uses '''the [[MPI]] standard''', so if the documentation doesn't mention MPI or consistently refers to threading and thread-based parallelism, this likely means you will need to restrict yourself to a single node. | ||
** Programs that have been parallelized to run across multiple nodes '''should be started using''' <tt>srun</tt> rather than <tt>mpirun</tt>. | ** Programs that have been parallelized to run across multiple nodes '''should be started using''' <tt>srun</tt> rather than <tt>mpirun</tt>. | ||
<!--T:17--> | |||
* A goal should also be to '''avoid scattering your parallel processes across more nodes than is necessary''': a more compact distribution will usually help your job's performance. | * A goal should also be to '''avoid scattering your parallel processes across more nodes than is necessary''': a more compact distribution will usually help your job's performance. | ||
** Highly fragmented parallel jobs often exhibit poor performance and also make the scheduler's job more complicated. This being the case, you should try to submit jobs where the number of parallel processes is equal to an integral multiple of the number of cores per node, assuming this is compatible with the parallel software your jobs run. | ** Highly fragmented parallel jobs often exhibit poor performance and also make the scheduler's job more complicated. This being the case, you should try to submit jobs where the number of parallel processes is equal to an integral multiple of the number of cores per node, assuming this is compatible with the parallel software your jobs run. | ||
Line 89: | Line 101: | ||
</source> | </source> | ||
<translate> | <translate> | ||
<!--T:18--> | |||
* Ultimately, the goal should be to '''ensure that the CPU efficiency of your jobs is very close to 100%''', as measured by the field <tt>CPU Efficiency</tt> in the output from the <tt>seff</tt> command. | * Ultimately, the goal should be to '''ensure that the CPU efficiency of your jobs is very close to 100%''', as measured by the field <tt>CPU Efficiency</tt> in the output from the <tt>seff</tt> command. | ||
** Any value of CPU efficiency less than 90% is poor and means that your use of whatever software your job executes needs to be improved. | ** Any value of CPU efficiency less than 90% is poor and means that your use of whatever software your job executes needs to be improved. | ||
==Using GPUs== | ==Using GPUs== <!--T:19--> | ||
<!--T:20--> | |||
The nodes with GPUs are relatively uncommon so that any job which asks for a GPU will wait significantly longer in most cases. | The nodes with GPUs are relatively uncommon so that any job which asks for a GPU will wait significantly longer in most cases. | ||
* Be sure that this GPU you had to wait so much longer to obtain is '''being used as efficiently as possible''' and that it is really contributing to improved performance in your jobs. | * Be sure that this GPU you had to wait so much longer to obtain is '''being used as efficiently as possible''' and that it is really contributing to improved performance in your jobs. | ||
Line 101: | Line 115: | ||
* '''Other tools for monitoring the efficiency''' of your GPU-based jobs include <tt>[https://developer.nvidia.com/nvidia-system-management-interface nvidia-smi]</tt>, <tt>nvtop</tt> and, if you're using software based on [[TensorFlow]], the [[TensorFlow#TensorBoard|TensorBoard]] utility. | * '''Other tools for monitoring the efficiency''' of your GPU-based jobs include <tt>[https://developer.nvidia.com/nvidia-system-management-interface nvidia-smi]</tt>, <tt>nvtop</tt> and, if you're using software based on [[TensorFlow]], the [[TensorFlow#TensorBoard|TensorBoard]] utility. | ||
==Avoid wasting resources== | ==Avoid wasting resources== <!--T:21--> | ||
<!--T:22--> | |||
* In general, your jobs should never contain the command <tt>sleep</tt>. | * In general, your jobs should never contain the command <tt>sleep</tt>. | ||
* We strongly recommend against the use of [[Anaconda/en|Conda]] and its variants on the clusters, in favour of solutions like a [[Python#Creating_and_using_a_virtual_environment|Python virtual environment]] or [[Singularity]]. | * We strongly recommend against the use of [[Anaconda/en|Conda]] and its variants on the clusters, in favour of solutions like a [[Python#Creating_and_using_a_virtual_environment|Python virtual environment]] or [[Singularity]]. | ||
* Read and write operations should be optimized by '''[[Using_node-local_storage|using node-local storage]]'''. | * Read and write operations should be optimized by '''[[Using_node-local_storage|using node-local storage]]'''. | ||
</translate> | </translate> |