rsnt_translations
56,420
edits
(Marked this version for translation) |
No edit summary |
||
Line 22: | Line 22: | ||
<!--T:25--> | <!--T:25--> | ||
:* namd-multicore/2.12 | :* namd-multicore/2.12 | ||
:* namd-verbs/2.12 ('''disabled on | :* namd-verbs/2.12 ('''disabled on Cedar''') | ||
:* namd-mpi/2.12 ('''disabled on | :* namd-mpi/2.12 ('''disabled on Graham''') | ||
<!--T:26--> | <!--T:26--> | ||
Line 30: | Line 30: | ||
<!--T:9--> | <!--T:9--> | ||
:* namd-multicore/2.12 | :* namd-multicore/2.12 | ||
:* namd-verbs-smp/2.12 ('''disabled on | :* namd-verbs-smp/2.12 ('''disabled on Cedar''') | ||
<!--T:27--> | <!--T:27--> | ||
Line 77: | Line 77: | ||
<!--T:17--> | <!--T:17--> | ||
''' | '''NOTES''': | ||
*Verbs versions will not run on Cedar because of its different interconnect; use the MPI version instead. | |||
*Verbs versions will not run on Béluga either because of its incompatible infiniband kernel drivers; use the UCX version instead. | |||
</translate> | </translate> | ||
{{File | {{File | ||
Line 218: | Line 219: | ||
<!--T:34--> | <!--T:34--> | ||
The numbers below were obtained for the standard NAMD apoa1 benchmark. The benchmarking was conducted on the | The numbers below were obtained for the standard NAMD apoa1 benchmark. The benchmarking was conducted on the Graham cluster, which has CPU nodes with 32 cores and GPU nodes with 32 cores and 2 GPUs. Performing the benchmark on other clusters will have to take account of the different structure of their nodes. | ||
<!--T:35--> | <!--T:35--> | ||
In the results shown in first table below we used NAMD 2.12 from verbs module. Efficiency is computed from (time with 1 core) / (N * (time with N cores) ). | In the results shown in the first table below, we used NAMD 2.12 from the verbs module. Efficiency is computed from (time with 1 core) / (N * (time with N cores) ). | ||
<!--T:36--> | <!--T:36--> | ||
Line 276: | Line 277: | ||
<!--T:40--> | <!--T:40--> | ||
From this table it is clear that there is no point at all in using more than 1 node for this system, since performance actually becomes worse if we use 2 or more nodes. Using only 1 node, it is best to use 1GPU/16 core as that has the greatest efficiency, but also acceptable to use 2GPU/32core if you need to get your results quickly. Since on | From this table it is clear that there is no point at all in using more than 1 node for this system, since performance actually becomes worse if we use 2 or more nodes. Using only 1 node, it is best to use 1GPU/16 core as that has the greatest efficiency, but also acceptable to use 2GPU/32core if you need to get your results quickly. Since on Graham GPU nodes your priority is charged the same for any job using up to 16 cores and 1 GPU, there is no benefit from running with 8 cores and 4 cores in this case. | ||
<!--T:41--> | <!--T:41--> | ||
Finally, you have to ask whether to run with or without GPUs for this simulation. From our numbers we can see that using a full GPU node of | Finally, you have to ask whether to run with or without GPUs for this simulation. From our numbers we can see that using a full GPU node of Graham (32 cores, 2 gpus) the job runs faster than it would on 4 non-GPU nodes of Graham. Since a GPU node on Graham costs about twice what a non-GPU node costs, in this case it is more cost effective to run with GPUs. You should run with GPUs if possible, however, given that there are fewer GPU than CPU nodes, you may need to consider submitting non-GPU jobs if your waiting time for GPU jobs is too long. | ||
= References = <!--T:23--> | = References = <!--T:23--> |