rsnt_translations
57,772
edits
No edit summary |
No edit summary |
||
Line 22: | Line 22: | ||
<!--T:6--> | <!--T:6--> | ||
The best approach is to begin by submitting a few relatively small test jobs, asking for a fairly standard amount of memory (<tt>#SBATCH --mem-per-cpu=2G</tt>) and time, for example one or two hours. | The best approach is to begin by submitting a few relatively small test jobs, asking for a fairly standard amount of memory (<tt>#SBATCH --mem-per-cpu=2G</tt>) and time, for example one or two hours. | ||
* Ideally you should already know what the answer will be in these test jobs, allowing you to verify that the software is running correctly on the cluster. | * Ideally, you should already know what the answer will be in these test jobs, allowing you to verify that the software is running correctly on the cluster. | ||
* If the job ends before the computation finished, you can increase the duration by doubling it until the job's duration is sufficient. | * If the job ends before the computation finished, you can increase the duration by doubling it until the job's duration is sufficient. | ||
* If your job ends with a message about an "OOM event" this means it ran out of memory (OOM), so try doubling the memory you've requested and see if this is enough. | * If your job ends with a message about an "OOM event" this means it ran out of memory (OOM), so try doubling the memory you've requested and see if this is enough. | ||
Line 94: | Line 94: | ||
* A goal should also be to '''avoid scattering your parallel processes across more nodes than is necessary''': a more compact distribution will usually help your job's performance. | * A goal should also be to '''avoid scattering your parallel processes across more nodes than is necessary''': a more compact distribution will usually help your job's performance. | ||
** Highly fragmented parallel jobs often exhibit poor performance and also make the scheduler's job more complicated. This being the case, you should try to submit jobs where the number of parallel processes is equal to an integral multiple of the number of cores per node, assuming this is compatible with the parallel software your jobs run. | ** Highly fragmented parallel jobs often exhibit poor performance and also make the scheduler's job more complicated. This being the case, you should try to submit jobs where the number of parallel processes is equal to an integral multiple of the number of cores per node, assuming this is compatible with the parallel software your jobs run. | ||
** So on a cluster with 40 cores/node, you would always submit parallel jobs asking for 40, 80, 120, 160, 240 etc. processes. For example, with the following job script header, all 120 MPI processes would be assigned in the most compact fashion, using three whole nodes. | ** So on a cluster with 40 cores/node, you would always submit parallel jobs asking for 40, 80, 120, 160, 240, etc. processes. For example, with the following job script header, all 120 MPI processes would be assigned in the most compact fashion, using three whole nodes. | ||
</translate> | </translate> | ||
<source> | <source> | ||
Line 110: | Line 110: | ||
The nodes with GPUs are relatively uncommon so that any job which asks for a GPU will wait significantly longer in most cases. | The nodes with GPUs are relatively uncommon so that any job which asks for a GPU will wait significantly longer in most cases. | ||
* Be sure that this GPU you had to wait so much longer to obtain is '''being used as efficiently as possible''' and that it is really contributing to improved performance in your jobs. | * Be sure that this GPU you had to wait so much longer to obtain is '''being used as efficiently as possible''' and that it is really contributing to improved performance in your jobs. | ||
** A considerable amount of software does have a GPU option, for example such widely used packages as [[NAMD]] and [[GROMACS]], but only a small part of these | ** A considerable amount of software does have a GPU option, for example such widely used packages as [[NAMD]] and [[GROMACS]], but only a small part of these programs' functionality has been modified to make use of GPUs. For this reason, it is wiser to '''first test a small sample calculation both with and without a GPU''' to see what kind of speed-up you obtain from the use of this GPU. | ||
** Because of the high cost of GPU nodes, a job using '''a single GPU''' should run significantly faster than if it was using a full CPU node. | ** Because of the high cost of GPU nodes, a job using '''a single GPU''' should run significantly faster than if it was using a full CPU node. | ||
** If your job '''only finishes 5% or 10% more quickly with a GPU, it's probably not worth''' the effort of waiting to get a node with a GPU as it will be idle during much of your job's execution. | ** If your job '''only finishes 5% or 10% more quickly with a GPU, it's probably not worth''' the effort of waiting to get a node with a GPU as it will be idle during much of your job's execution. |