Best practices for job submission: Difference between revisions

No edit summary
Line 4: Line 4:
For jobs which are not tests, the duration should be at least one hour. If your computation requires less than an hour, you should consider using tools like [[GLOST]], [[META:_A_package_for_job_farming | META]] or [[GNU Parallel]] to regroup several of your computations into a single Slurm job with a duration of at least an hour. Hundreds or thousands of very short jobs place undue stress on the scheduler.  
For jobs which are not tests, the duration should be at least one hour. If your computation requires less than an hour, you should consider using tools like [[GLOST]], [[META:_A_package_for_job_farming | META]] or [[GNU Parallel]] to regroup several of your computations into a single Slurm job with a duration of at least an hour. Hundreds or thousands of very short jobs place undue stress on the scheduler.  


It is equally important that your estimate of the job duration be relatively accurate: asking for five days when the computation in reality finishes after just sixteen hours leads to your job spending much more time waiting to start than it would had you given a more accurate estimate of the duration. It's natural to leave a certain amount of room for error in the estimate and so to increase the duration by five or ten percent "just in case" but otherwise it's in your interest for your estimate of the job's duration to be as accurate as possible. You can see how long completed jobs took to run using the command <source>seff <jobid></source> in the field <i>Job Wall-clock time</i>.
It is equally important that your estimate of the job duration be relatively accurate: asking for five days when the computation in reality finishes after just sixteen hours leads to your job spending much more time waiting to start than it would had you given a more accurate estimate of the duration. It's natural to leave a certain amount of room for error in the estimate and so to increase the duration by five or ten percent "just in case" but otherwise it's in your interest for your estimate of the job's duration to be as accurate as possible. You can see how long completed jobs took to run using the command  
<source>
[stubbsda@beluga1 ~]$ seff 1234567
Job ID: 1234567
Cluster: beluga
User/Group: jdoe/jdoe
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 16
CPU Utilized: 58-22:54:16
CPU Efficiency: 96.14% of 61-07:41:20 core-walltime
Job Wall-clock time: 3-19:58:50
Memory Utilized: 14.95 GB (estimated maximum)
Memory Efficiency: 11.68% of 128.00 GB (8.00 GB/core)
</source>  
 
in the field <i>Job Wall-clock time</i>.


=Parallelism=
=Parallelism=
Bureaucrats, cc_docs_admin, cc_staff
2,306

edits