Best practices for job submission: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
Line 10: Line 10:
By default your job will get one core on one node and this is the most sensible policy because most software is serial: it can only ever make use of a single core. Asking for more cores and/or nodes will not make the program run any faster because for it to run in parallel the program's source code needs to be modified, in some cases in a very profound manner requiring a substantial investment of developer time. How can you determine if the software you're using can run in parallel? The best approach is to look in the software's documentation for a section on parallel execution: if you can't find anything, this is usually a sign that this program is serial. You can also contact the development team to ask if the software can be run in parallel and if not, to request that such a feature be added in a future version.  
By default your job will get one core on one node and this is the most sensible policy because most software is serial: it can only ever make use of a single core. Asking for more cores and/or nodes will not make the program run any faster because for it to run in parallel the program's source code needs to be modified, in some cases in a very profound manner requiring a substantial investment of developer time. How can you determine if the software you're using can run in parallel? The best approach is to look in the software's documentation for a section on parallel execution: if you can't find anything, this is usually a sign that this program is serial. You can also contact the development team to ask if the software can be run in parallel and if not, to request that such a feature be added in a future version.  


If the program can run in parallel, the next question is how to specify the number of CPU cores that the program should use. The right syntax to use will depend on the particular program: it might be an option to be added as a command line argument like <tt>--nthreads=4</tt>, an environment variable you need to set before calling the program (e.g. <tt>export OMP_NUM_THREADS=4</tt>) or perhaps a line you should add to the program's parameter file. Once you know how to specify the number of CPU cores that the program should use, the next logical question is what number of cores to use? It may be tempting to simply reply, "as many as possible" but this is often not the wisest approach. Just as having too many cooks trying to work together in a small kitchen to prepare a single meal can lead to chaos, so too adding too many CPU cores can have the perverse effect of slowing down a program. To choose the optimal number of CPU cores, you need to study the software's [[Scalability | scalability]].
If the program can run in parallel, the next question is how to specify the number of CPU cores that the program should use. The right syntax to use will depend on the particular program: it might be an option to be added as a command line argument like <tt>--nthreads=4</tt>, an environment variable you need to set before calling the program (e.g. <tt>export OMP_NUM_THREADS=4</tt>) or perhaps a line you should add to the program's parameter file. Once you know how to specify the number of CPU cores that the program should use, the next logical question is what number of cores to use? It may be tempting to simply reply, "as many as possible" but this is often not the wisest approach. Just as having too many cooks trying to work together in a small kitchen to prepare a single meal can lead to chaos, so too adding an excessive number of CPU cores can have the perverse effect of slowing down a program. To choose the optimal number of CPU cores, you need to study the software's [[Scalability | scalability]].


=Memory consumption=
=Memory consumption=

Revision as of 17:53, 29 August 2022

When submitting a job to one of the clusters, it's important to choose appropriate values for various parameters in order to ensure that your job doesn't waste resources or create problems for other users and yourself. This will ensure your job starts more quickly and that it is likely to finish correctly, producing the output you need to move your research forward.

Job duration

For jobs which are not tests, the duration should be at least one hour. If your computation requires less than an hour, you should consider using tools like GLOST, META or GNU Parallel to regroup several of your computations into a single Slurm job with a duration of at least an hour. Hundreds or thousands of very short jobs place undue stress on the scheduler.

It is equally important that your estimate of the job duration be relatively accurate: asking for five days when the computation in reality finishes after just sixteen hours leads to your job spending much more time waiting to start than it would had you given a more accurate estimate of the duration. It's natural to leave a certain amount of room for error in the estimate and so to increase the duration by five or ten percent "just in case" but otherwise it's in your interest for your estimate of the job's duration to be as accurate as possible. You can see how long completed jobs took to run using the command

seff <jobid>

in the field Job Wall-clock time.

Parallelism

By default your job will get one core on one node and this is the most sensible policy because most software is serial: it can only ever make use of a single core. Asking for more cores and/or nodes will not make the program run any faster because for it to run in parallel the program's source code needs to be modified, in some cases in a very profound manner requiring a substantial investment of developer time. How can you determine if the software you're using can run in parallel? The best approach is to look in the software's documentation for a section on parallel execution: if you can't find anything, this is usually a sign that this program is serial. You can also contact the development team to ask if the software can be run in parallel and if not, to request that such a feature be added in a future version.

If the program can run in parallel, the next question is how to specify the number of CPU cores that the program should use. The right syntax to use will depend on the particular program: it might be an option to be added as a command line argument like --nthreads=4, an environment variable you need to set before calling the program (e.g. export OMP_NUM_THREADS=4) or perhaps a line you should add to the program's parameter file. Once you know how to specify the number of CPU cores that the program should use, the next logical question is what number of cores to use? It may be tempting to simply reply, "as many as possible" but this is often not the wisest approach. Just as having too many cooks trying to work together in a small kitchen to prepare a single meal can lead to chaos, so too adding an excessive number of CPU cores can have the perverse effect of slowing down a program. To choose the optimal number of CPU cores, you need to study the software's scalability.

Memory consumption