Using GPUs with Slurm: Difference between revisions

change advice on type specifiers, remove default column from table
(Marked this version for translation)
(change advice on type specifiers, remove default column from table)
Line 11: Line 11:
{| class="wikitable"
{| class="wikitable"
|-
|-
! rowspan=2|Cluster !! rowspan=2| # of Nodes !! rowspan=2|Slurm type specifier !! rowspan=2|Default? !! colspan=3|Per node !! rowspan=2|GPU model !! rowspan=2|GPU mem (GiB) !! rowspan=2|Notes
! rowspan=2|Cluster !! rowspan=2| # of Nodes !! rowspan=2|Slurm type specifier !! colspan=3|Per node !! rowspan=2|GPU model !! rowspan=2|GPU mem (GiB) !! rowspan=2|Notes
|-
|-
!                              CPU cores !! CPU memory !! GPUs  
!                              CPU cores !! CPU memory !! GPUs  
|-
|-
| Béluga            || 172 || v100 || default || 40 || 191000M ||  4 || V100-SXM2 || 16 || All GPUs associated with the same CPU socket, connected via NVLink
| Béluga            || 172 || v100 || 40 || 191000M ||  4 || V100-SXM2 || 16 || All GPUs associated with the same CPU socket, connected via NVLink
|-
|-
| rowspan=3|Cedar  || 114 || p100 || default || 24 || 128000M ||  4 || P100-PCIE || 12 || Two GPUs per CPU socket
| rowspan=3|Cedar  || 114 || p100 || 24 || 128000M ||  4 || P100-PCIE || 12 || Two GPUs per CPU socket
|-
|-
|                      32  || p100l ||   || 24 || 257000M ||  4 || P100-PCIE || 16 || All GPUs associated with the same CPU socket
|                      32  || p100l || 24 || 257000M ||  4 || P100-PCIE || 16 || All GPUs associated with the same CPU socket
|-
|-
|                      192 || v100l ||   || 32 || 192000M ||  4 || V100-SXM2 || 32 || Two GPUs per CPU socket; all GPUs connected via NVLink
|                      192 || v100l || 32 || 192000M ||  4 || V100-SXM2 || 32 || Two GPUs per CPU socket; all GPUs connected via NVLink
|-
|-
| rowspan=5|Graham  || 160 || p100 || default || 32 || 127518M ||  2 || P100-PCIE || 12 || One GPU per CPU socket
| rowspan=5|Graham  || 160 || p100 || 32 || 127518M ||  2 || P100-PCIE || 12 || One GPU per CPU socket
|-
|-
|                      7  || v100  ||   || 28 || 183105M ||  8 || V100-PCIE || 16 || See [[Graham#Volta_GPU_nodes_on_Graham|Graham: Volta GPU nodes]]
|                      7  || v100  || 28 || 183105M ||  8 || V100-PCIE || 16 || See [[Graham#Volta_GPU_nodes_on_Graham|Graham: Volta GPU nodes]]
|-
|-
|                      2  || v100l ||   || 28 || 183105M ||  8 || V100-?    || 32 || See [[Graham#Volta_GPU_nodes_on_Graham|Graham: Volta GPU nodes]]
|                      2  || v100l ||  28 || 183105M ||  8 || V100-?    || 32 || See [[Graham#Volta_GPU_nodes_on_Graham|Graham: Volta GPU nodes]]
|-
|-
|                      30  || t4   ||    || 44 || 192000M ||  4 || Tesla T4  || 16 || Two GPUs per CPU socket
|                      30  || t4   ||  44 || 192000M ||  4 || Tesla T4  || 16 || Two GPUs per CPU socket
|-
|-
|                      6  || t4   ||    || 16 || 192000M ||  4 || Tesla T4  || 16 ||  
|                      6  || t4   ||  16 || 192000M ||  4 || Tesla T4  || 16 ||  
|-
|-
| rowspan=2|Hélios  || 15  || k20   || default || 20 || 110000M ||  8 || K20      ||  5 || Four GPUs per CPU socket
| rowspan=2|Hélios  || 15  || k20 || 20 || 110000M ||  8 || K20      ||  5 || Four GPUs per CPU socket
|-  
|-  
|                      6  || k80   ||   || 24 || 257000M || 16 || K80      || 12 || Eight GPUs per CPU socket
|                      6  || k80 ||  24 || 257000M || 16 || K80      || 12 || Eight GPUs per CPU socket
|-  
|-  
| Mist              || 54 || (none) || default || 32 ||  256GiB ||  4 || V100-SXM2 || 32 || See [https://docs.scinet.utoronto.ca/index.php/Mist#Specifications Mist specifications]
| Mist              || 54 || (none) || 32 ||  256GiB ||  4 || V100-SXM2 || 32 || See [https://docs.scinet.utoronto.ca/index.php/Mist#Specifications Mist specifications]
|-
|-
| Arbutus          ||  9  || (none) || &nbsp; || 80 ||  384GiB ||  4 || V100      || 32 || Cloud resource, <b>not schedulable via Slurm</b>, included here for completeness
| Arbutus          ||  9  || (none) || 80 ||  384GiB ||  4 || V100      || 32 || Cloud resource, <b>not schedulable via Slurm</b>, included here for completeness
|}
|}


Line 51: Line 51:


<!--T:40-->
<!--T:40-->
If you do not specify a type, Slurm will send your GPU job to a type designated "default" in the table above.
If you do not supply a type specifier, Slurm may send your job to a node equipped with any type of GPU. 
For certain workflows this may be undesirable.
For example, molecular dynamics code requires high double-precision performance, and therefore T4 GPUs are not appropriate.
In such a case, make sure you include a type specifier.


=== Mist === <!--T:38-->
=== Mist === <!--T:38-->
Bureaucrats, cc_docs_admin, cc_staff
2,879

edits