NAMD: Difference between revisions

Jump to navigation Jump to search
no edit summary
(Marked this version for translation)
No edit summary
Line 22: Line 22:
<!--T:25-->
<!--T:25-->
:* namd-multicore/2.12
:* namd-multicore/2.12
:* namd-verbs/2.12  ('''disabled on cedar''')
:* namd-verbs/2.12  ('''disabled on Cedar''')
:* namd-mpi/2.12  ('''disabled on graham''')
:* namd-mpi/2.12  ('''disabled on Graham''')


<!--T:26-->
<!--T:26-->
Line 30: Line 30:
<!--T:9-->
<!--T:9-->
:* namd-multicore/2.12
:* namd-multicore/2.12
:* namd-verbs-smp/2.12 ('''disabled on cedar''')
:* namd-verbs-smp/2.12 ('''disabled on Cedar''')


<!--T:27-->
<!--T:27-->
Line 77: Line 77:


<!--T:17-->
<!--T:17-->
'''NOTE''': Verbs versions will not run on Cedar because of its different interconnect.  Use the MPI version instead.
'''NOTES''':
'''NOTE''': Verbs versions will also not run on Béluga because of its incompatible infiniband kernel drivers.  Use the UCX version instead.
*Verbs versions will not run on Cedar because of its different interconnect; use the MPI version instead.
*Verbs versions will not run on Béluga either because of its incompatible infiniband kernel drivers; use the UCX version instead.
</translate>
</translate>
{{File
{{File
Line 218: Line 219:


<!--T:34-->
<!--T:34-->
The numbers below were obtained for the standard NAMD apoa1 benchmark.  The benchmarking was conducted on the graham cluster, which has CPU nodes with 32 cores and GPU nodes with 32 cores and 2 GPUs.  Performing the benchmark on other clusters will have to take account of the different structure of their nodes.
The numbers below were obtained for the standard NAMD apoa1 benchmark.  The benchmarking was conducted on the Graham cluster, which has CPU nodes with 32 cores and GPU nodes with 32 cores and 2 GPUs.  Performing the benchmark on other clusters will have to take account of the different structure of their nodes.


<!--T:35-->
<!--T:35-->
In the results shown in first table below we used NAMD 2.12 from verbs module. Efficiency is computed from  (time with 1 core) / (N * (time with N cores) ).
In the results shown in the first table below, we used NAMD 2.12 from the verbs module. Efficiency is computed from  (time with 1 core) / (N * (time with N cores) ).


<!--T:36-->
<!--T:36-->
Line 276: Line 277:


<!--T:40-->
<!--T:40-->
From this table it is clear that there is no point at all in using more than 1 node for this system, since performance actually becomes worse if we use 2 or more nodes.  Using only 1 node, it is best to use 1GPU/16 core as that has the greatest efficiency, but also acceptable to use 2GPU/32core if you need to get your results quickly.  Since on graham GPU nodes your priority is charged the same for any job using up to 16 cores and 1 GPU, there is no benefit from running with 8 cores and 4 cores in this case.
From this table it is clear that there is no point at all in using more than 1 node for this system, since performance actually becomes worse if we use 2 or more nodes.  Using only 1 node, it is best to use 1GPU/16 core as that has the greatest efficiency, but also acceptable to use 2GPU/32core if you need to get your results quickly.  Since on Graham GPU nodes your priority is charged the same for any job using up to 16 cores and 1 GPU, there is no benefit from running with 8 cores and 4 cores in this case.


<!--T:41-->
<!--T:41-->
Finally, you have to ask whether to run with or without GPUs for this simulation.  From our numbers we can see that using a full GPU node of graham (32 cores, 2 gpus) the job runs faster that it would on 4 non-GPU nodes of graham.  Since a GPU node of graham costs about two times what a non-GPU node costs, in this case it is more cost effective to run with GPUs.  So, you should run with GPUs if possible, however given that there are fewer GPU than CPU nodes, you may need to consider submitting non-GPU jobs if your wait for GPU jobs is too long.
Finally, you have to ask whether to run with or without GPUs for this simulation.  From our numbers we can see that using a full GPU node of Graham (32 cores, 2 gpus) the job runs faster than it would on 4 non-GPU nodes of Graham.  Since a GPU node on Graham costs about twice what a non-GPU node costs, in this case it is more cost effective to run with GPUs.  You should run with GPUs if possible, however, given that there are fewer GPU than CPU nodes, you may need to consider submitting non-GPU jobs if your waiting time for GPU jobs is too long.


= References = <!--T:23-->
= References = <!--T:23-->
rsnt_translations
56,420

edits

Navigation menu