Best practices for job submission: Difference between revisions

Best practices for job submission (view source)

Revision as of 18:21, 16 September 2022

88 bytes added , 2 years ago

no edit summary

Stubbsda

Bureaucrats, cc_docs_admin, cc_staff

2,320

edits

@@ Line 1: / Line 1: @@
+<languages />
+<translate>
 When submitting a job to one of the clusters, it's important to choose appropriate values for various parameters in order to ensure that your job doesn't waste resources or create problems for other users and yourself. This will ensure your job starts more quickly and that it is likely to finish correctly, producing the output you need to move your research forward.
@@ Line 30: / Line 32: @@
 * '''Use [[Running_jobs#Completed_jobs|monitoring tools]]''' to see how long completed jobs took.
 ** For example, the <tt>Job Wall-clock time</tt> field in the output of the <tt>seff</tt> command:
+</translate>
 {{Command
 |seff 1234567
@@ Line 44: / Line 47: @@
 Memory Utilized: 14.95 GB (estimated maximum)
 Memory Efficiency: 11.68% of 128.00 GB (8.00 GB/core)
 }}
+<translate>
 * '''Increase the estimated duration by 5% or 10%''', just in case.
 ** It's natural to leave a certain amount of room for error in the estimate, but otherwise it's in your interest for your estimate of the job's duration to be as accurate as possible.
@@ Line 79: / Line 83: @@
 ** Highly fragmented parallel jobs often exhibit poor performance and also make the scheduler's job more complicated. This being the case, you should try to submit jobs where the number of parallel processes is equal to an integral multiple of the number of cores per node, assuming this is compatible with the parallel software your jobs run.
 ** So on a cluster with 40 cores/node, you would always submit parallel jobs asking for 40, 80, 120, 160, 240 etc. processes. For example, with the following job script header, all 120 MPI processes would be assigned in the most compact fashion, using three whole nodes.
+</translate>
 <source>
 #SBATCH --nodes=3
 #SBATCH --ntasks-per-node=40
 </source>
+<translate>
 * Ultimately, the goal should be to '''ensure that the CPU efficiency of your jobs is very close to 100%''', as measured by the field <tt>CPU Efficiency</tt> in the output from the <tt>seff</tt> command.
 ** Any value of CPU efficiency less than 90% is poor and means that your use of whatever software your job executes needs to be improved.
@@ Line 101: / Line 106: @@
 * We strongly recommend against the use of [[Anaconda/en|Conda]] and its variants on the clusters, in favour of solutions like a [[Python#Creating_and_using_a_virtual_environment|Python virtual environment]] or [[Singularity]].
 * Read and write operations should be optimized by '''[[Using_node-local_storage|using node-local storage]]'''.
+</translate>

Best practices for job submission: Difference between revisions

Best practices for job submission (view source)

Revision as of 18:21, 16 September 2022

Navigation menu

Search