Best practices for job submission: Difference between revisions

Using GPUs with bullet points
(Examples of wasted resources.)
(Using GPUs with bullet points)
Line 89: Line 89:
==Using GPUs==
==Using GPUs==


Much like the case of nodes with a large amount of memory, the nodes with GPUs are relatively uncommon so that any job which asks for a GPU will wait significantly longer in most cases. It's therefore in your interest to be sure that this GPU you had to wait so much longer to obtain is being used as efficiently as possible and that it is really contributing to improved performance in your jobs. A considerable amount of software does have a GPU option, for example such widely used packages as [[NAMD]] and [[GROMACS]], but only a small part of these program's functionality has been modified to make use of GPUs. For this reason, it is wiser to first test a small sample calculation both with and without a GPU to see what kind of speed-up you obtain from the use of this GPU. If your job only finishes 5% or 10% more quickly with a GPU, it's probably not worth the effort of waiting to get a node with a GPU as it will be idle during much of your job's execution. Other tools for monitoring the efficiency of your GPU-based jobs include [https://developer.nvidia.com/nvidia-system-management-interface nvidia-smi] and, if you're using software based on [[TensorFlow]], the [[TensorFlow#TensorBoard|TensorBoard]] utility.
The nodes with GPUs are relatively uncommon so that any job which asks for a GPU will wait significantly longer in most cases.
* Be sure that this GPU you had to wait so much longer to obtain is '''being used as efficiently as possible''' and that it is really contributing to improved performance in your jobs.
** A considerable amount of software does have a GPU option, for example such widely used packages as [[NAMD]] and [[GROMACS]], but only a small part of these program's functionality has been modified to make use of GPUs. For this reason, it is wiser to '''first test a small sample calculation both with and without a GPU''' to see what kind of speed-up you obtain from the use of this GPU.
** Because of the high cost of GPU nodes, a job using '''a single GPU''' should run significantly faster than if it was using a full CPU node.
** If your job '''only finishes 5% or 10% more quickly with a GPU, it's probably not worth''' the effort of waiting to get a node with a GPU as it will be idle during much of your job's execution.
* '''Other tools for monitoring the efficiency''' of your GPU-based jobs include <tt>[https://developer.nvidia.com/nvidia-system-management-interface nvidia-smi]</tt>, <tt>nvtop</tt> and, if you're using software based on [[TensorFlow]], the [[TensorFlow#TensorBoard|TensorBoard]] utility.


==Avoid wasting resources==
==Avoid wasting resources==
cc_staff
782

edits