Using GPUs with Slurm: Difference between revisions

Using GPUs with Slurm (view source)

Revision as of 17:28, 4 August 2021

78 bytes added , 3 years ago

change advice on type specifiers, remove default column from table

Rdickson

Bureaucrats, cc_docs_admin, cc_staff

2,879

edits

@@ Line 11: / Line 11: @@
 {| class="wikitable"
 |-
-! rowspan=2|Cluster !! rowspan=2| # of Nodes !! rowspan=2|Slurm type specifier !! rowspan=2|Default? !! colspan=3|Per node !! rowspan=2|GPU model !! rowspan=2|GPU mem (GiB) !! rowspan=2|Notes
+! rowspan=2|Cluster !! rowspan=2| # of Nodes !! rowspan=2|Slurm type specifier !! colspan=3|Per node !! rowspan=2|GPU model !! rowspan=2|GPU mem (GiB) !! rowspan=2|Notes
 |-
 !                              CPU cores !! CPU memory !! GPUs
 |-
-| Béluga            || 172 || v100 || default || 40 || 191000M ||  4 || V100-SXM2 || 16 || All GPUs associated with the same CPU socket, connected via NVLink
+| Béluga            || 172 ||  v100 ||  40 || 191000M ||  4 || V100-SXM2 || 16 || All GPUs associated with the same CPU socket, connected via NVLink
 |-
-| rowspan=3|Cedar   || 114 || p100 || default || 24 || 128000M ||  4 || P100-PCIE || 12 || Two GPUs per CPU socket
+| rowspan=3|Cedar   || 114 ||  p100 ||  24 || 128000M ||  4 || P100-PCIE || 12 || Two GPUs per CPU socket
 |-
-|                      32  || p100l || &nbsp; || 24 || 257000M ||  4 || P100-PCIE || 16 || All GPUs associated with the same CPU socket
+|                      32  || p100l ||  24 || 257000M ||  4 || P100-PCIE || 16 || All GPUs associated with the same CPU socket
 |-
-|                      192 || v100l || &nbsp; || 32 || 192000M ||  4 || V100-SXM2 || 32 || Two GPUs per CPU socket; all GPUs connected via NVLink
+|                      192 || v100l ||  32 || 192000M ||  4 || V100-SXM2 || 32 || Two GPUs per CPU socket; all GPUs connected via NVLink
 |-
-| rowspan=5|Graham  || 160 || p100 || default || 32 || 127518M ||  2 || P100-PCIE || 12 || One GPU per CPU socket
+| rowspan=5|Graham  || 160 ||  p100 ||  32 || 127518M ||  2 || P100-PCIE || 12 || One GPU per CPU socket
 |-
-|                      7   || v100  || &nbsp; || 28 || 183105M ||  8 || V100-PCIE || 16 || See [[Graham#Volta_GPU_nodes_on_Graham|Graham: Volta GPU nodes]]
+|                      7   || v100  ||  28 || 183105M ||  8 || V100-PCIE || 16 || See [[Graham#Volta_GPU_nodes_on_Graham|Graham: Volta GPU nodes]]
 |-
-|                      2   || v100l ||  &nbsp;|| 28 || 183105M ||  8 || V100-?    || 32 || See [[Graham#Volta_GPU_nodes_on_Graham|Graham: Volta GPU nodes]]
+|                      2   || v100l ||  28 || 183105M ||  8 || V100-?    || 32 || See [[Graham#Volta_GPU_nodes_on_Graham|Graham: Volta GPU nodes]]
 |-
-|                      30  || t4    ||  &nbsp; || 44 || 192000M ||  4 || Tesla T4  || 16 || Two GPUs per CPU socket
+|                      30  ||  t4   ||  44 || 192000M ||  4 || Tesla T4  || 16 || Two GPUs per CPU socket
 |-
-|                      6   || t4    ||  &nbsp; || 16 || 192000M ||  4 || Tesla T4  || 16 || &nbsp;
+|                      6   ||  t4   ||  16 || 192000M ||  4 || Tesla T4  || 16 || &nbsp;
 |-
-| rowspan=2|Hélios  || 15  || k20   || default || 20 || 110000M ||  8 || K20       ||  5 || Four GPUs per CPU socket
+| rowspan=2|Hélios  || 15  ||  k20  ||  20 || 110000M ||  8 || K20       ||  5 || Four GPUs per CPU socket
 |-
-|                      6   || k80   || &nbsp;  || 24 || 257000M || 16 || K80       || 12 || Eight GPUs per CPU socket
+|                      6   ||  k80  ||  24 || 257000M || 16 || K80       || 12 || Eight GPUs per CPU socket
 |-
-| Mist              || 54 || (none) || default || 32 ||  256GiB ||  4 || V100-SXM2 || 32 || See [https://docs.scinet.utoronto.ca/index.php/Mist#Specifications Mist specifications]
+| Mist              || 54  || (none) || 32 ||  256GiB ||  4 || V100-SXM2 || 32 || See [https://docs.scinet.utoronto.ca/index.php/Mist#Specifications Mist specifications]
 |-
-| Arbutus           ||  9  || (none) || &nbsp; || 80 ||  384GiB ||  4 || V100      || 32 || Cloud resource, <b>not schedulable via Slurm</b>, included here for completeness
+| Arbutus           ||  9  || (none) || 80 ||  384GiB ||  4 || V100      || 32 || Cloud resource, <b>not schedulable via Slurm</b>, included here for completeness
 |}
@@ Line 51: / Line 51: @@
 <!--T:40-->
-If you do not specify a type, Slurm will send your GPU job to a type designated "default" in the table above.
+If you do not supply a type specifier, Slurm may send your job to a node equipped with any type of GPU.
+For certain workflows this may be undesirable.
+For example, molecular dynamics code requires high double-precision performance, and therefore T4 GPUs are not appropriate.
+In such a case, make sure you include a type specifier.
 === Mist === <!--T:38-->