Best practices for job submission: Difference between revisions

Best practices for job submission (view source)

Revision as of 19:02, 29 August 2022

350 bytes added , 2 years ago

→‎Job duration

Stubbsda

Bureaucrats, cc_docs_admin, cc_staff

2,306

edits

@@ Line 4: / Line 4: @@
 For jobs which are not tests, the duration should be at least one hour. If your computation requires less than an hour, you should consider using tools like [[GLOST]], [[META:_A_package_for_job_farming | META]] or [[GNU Parallel]] to regroup several of your computations into a single Slurm job with a duration of at least an hour. Hundreds or thousands of very short jobs place undue stress on the scheduler.
-It is equally important that your estimate of the job duration be relatively accurate: asking for five days when the computation in reality finishes after just sixteen hours leads to your job spending much more time waiting to start than it would had you given a more accurate estimate of the duration. It's natural to leave a certain amount of room for error in the estimate and so to increase the duration by five or ten percent "just in case" but otherwise it's in your interest for your estimate of the job's duration to be as accurate as possible. You can see how long completed jobs took to run using the command <source>seff <jobid></source> in the field <i>Job Wall-clock time</i>.
+It is equally important that your estimate of the job duration be relatively accurate: asking for five days when the computation in reality finishes after just sixteen hours leads to your job spending much more time waiting to start than it would had you given a more accurate estimate of the duration. It's natural to leave a certain amount of room for error in the estimate and so to increase the duration by five or ten percent "just in case" but otherwise it's in your interest for your estimate of the job's duration to be as accurate as possible. You can see how long completed jobs took to run using the command
+<source>
+[stubbsda@beluga1 ~]$ seff 1234567
+Job ID: 1234567
+Cluster: beluga
+User/Group: jdoe/jdoe
+State: COMPLETED (exit code 0)
+Nodes: 1
+Cores per node: 16
+CPU Utilized: 58-22:54:16
+CPU Efficiency: 96.14% of 61-07:41:20 core-walltime
+Job Wall-clock time: 3-19:58:50
+Memory Utilized: 14.95 GB (estimated maximum)
+Memory Efficiency: 11.68% of 128.00 GB (8.00 GB/core)
+</source>
+in the field <i>Job Wall-clock time</i>.
 =Parallelism=