Using GPUs with Slurm: Difference between revisions

Jump to navigation Jump to search
change advice on type specifiers, remove default column from table
(Marked this version for translation)
(change advice on type specifiers, remove default column from table)
Line 11: Line 11:
{| class="wikitable"
{| class="wikitable"
|-
|-
! rowspan=2|Cluster !! rowspan=2| # of Nodes !! rowspan=2|Slurm type specifier !! rowspan=2|Default? !! colspan=3|Per node !! rowspan=2|GPU model !! rowspan=2|GPU mem (GiB) !! rowspan=2|Notes
! rowspan=2|Cluster !! rowspan=2| # of Nodes !! rowspan=2|Slurm type specifier !! colspan=3|Per node !! rowspan=2|GPU model !! rowspan=2|GPU mem (GiB) !! rowspan=2|Notes
|-
|-
!                              CPU cores !! CPU memory !! GPUs  
!                              CPU cores !! CPU memory !! GPUs  
|-
|-
| Béluga            || 172 || v100 || default || 40 || 191000M ||  4 || V100-SXM2 || 16 || All GPUs associated with the same CPU socket, connected via NVLink
| Béluga            || 172 || v100 || 40 || 191000M ||  4 || V100-SXM2 || 16 || All GPUs associated with the same CPU socket, connected via NVLink
|-
|-
| rowspan=3|Cedar  || 114 || p100 || default || 24 || 128000M ||  4 || P100-PCIE || 12 || Two GPUs per CPU socket
| rowspan=3|Cedar  || 114 || p100 || 24 || 128000M ||  4 || P100-PCIE || 12 || Two GPUs per CPU socket
|-
|-
|                      32  || p100l ||   || 24 || 257000M ||  4 || P100-PCIE || 16 || All GPUs associated with the same CPU socket
|                      32  || p100l || 24 || 257000M ||  4 || P100-PCIE || 16 || All GPUs associated with the same CPU socket
|-
|-
|                      192 || v100l ||   || 32 || 192000M ||  4 || V100-SXM2 || 32 || Two GPUs per CPU socket; all GPUs connected via NVLink
|                      192 || v100l || 32 || 192000M ||  4 || V100-SXM2 || 32 || Two GPUs per CPU socket; all GPUs connected via NVLink
|-
|-
| rowspan=5|Graham  || 160 || p100 || default || 32 || 127518M ||  2 || P100-PCIE || 12 || One GPU per CPU socket
| rowspan=5|Graham  || 160 || p100 || 32 || 127518M ||  2 || P100-PCIE || 12 || One GPU per CPU socket
|-
|-
|                      7  || v100  ||   || 28 || 183105M ||  8 || V100-PCIE || 16 || See [[Graham#Volta_GPU_nodes_on_Graham|Graham: Volta GPU nodes]]
|                      7  || v100  || 28 || 183105M ||  8 || V100-PCIE || 16 || See [[Graham#Volta_GPU_nodes_on_Graham|Graham: Volta GPU nodes]]
|-
|-
|                      2  || v100l ||   || 28 || 183105M ||  8 || V100-?    || 32 || See [[Graham#Volta_GPU_nodes_on_Graham|Graham: Volta GPU nodes]]
|                      2  || v100l ||  28 || 183105M ||  8 || V100-?    || 32 || See [[Graham#Volta_GPU_nodes_on_Graham|Graham: Volta GPU nodes]]
|-
|-
|                      30  || t4   ||    || 44 || 192000M ||  4 || Tesla T4  || 16 || Two GPUs per CPU socket
|                      30  || t4   ||  44 || 192000M ||  4 || Tesla T4  || 16 || Two GPUs per CPU socket
|-
|-
|                      6  || t4   ||    || 16 || 192000M ||  4 || Tesla T4  || 16 ||  
|                      6  || t4   ||  16 || 192000M ||  4 || Tesla T4  || 16 ||  
|-
|-
| rowspan=2|Hélios  || 15  || k20   || default || 20 || 110000M ||  8 || K20      ||  5 || Four GPUs per CPU socket
| rowspan=2|Hélios  || 15  || k20 || 20 || 110000M ||  8 || K20      ||  5 || Four GPUs per CPU socket
|-  
|-  
|                      6  || k80   ||   || 24 || 257000M || 16 || K80      || 12 || Eight GPUs per CPU socket
|                      6  || k80 ||  24 || 257000M || 16 || K80      || 12 || Eight GPUs per CPU socket
|-  
|-  
| Mist              || 54 || (none) || default || 32 ||  256GiB ||  4 || V100-SXM2 || 32 || See [https://docs.scinet.utoronto.ca/index.php/Mist#Specifications Mist specifications]
| Mist              || 54 || (none) || 32 ||  256GiB ||  4 || V100-SXM2 || 32 || See [https://docs.scinet.utoronto.ca/index.php/Mist#Specifications Mist specifications]
|-
|-
| Arbutus          ||  9  || (none) || &nbsp; || 80 ||  384GiB ||  4 || V100      || 32 || Cloud resource, <b>not schedulable via Slurm</b>, included here for completeness
| Arbutus          ||  9  || (none) || 80 ||  384GiB ||  4 || V100      || 32 || Cloud resource, <b>not schedulable via Slurm</b>, included here for completeness
|}
|}


Line 51: Line 51:


<!--T:40-->
<!--T:40-->
If you do not specify a type, Slurm will send your GPU job to a type designated "default" in the table above.
If you do not supply a type specifier, Slurm may send your job to a node equipped with any type of GPU. 
For certain workflows this may be undesirable.
For example, molecular dynamics code requires high double-precision performance, and therefore T4 GPUs are not appropriate.
In such a case, make sure you include a type specifier.


=== Mist === <!--T:38-->
=== Mist === <!--T:38-->
Bureaucrats, cc_docs_admin, cc_staff
2,879

edits

Navigation menu