Béluga/en: Difference between revisions

Updating to match new version of source page
(Updating to match new version of source page)
(Updating to match new version of source page)
Line 13: Line 13:
Béluga is a general purpose cluster designed for a variety of workloads and situated at the [http://www.etsmtl.ca/ École de technologie supérieure] in Montreal. The cluster is named in honour of the St. Lawrence River's [https://en.wikipedia.org/wiki/Beluga_whale Beluga whale] population.
Béluga is a general purpose cluster designed for a variety of workloads and situated at the [http://www.etsmtl.ca/ École de technologie supérieure] in Montreal. The cluster is named in honour of the St. Lawrence River's [https://en.wikipedia.org/wiki/Beluga_whale Beluga whale] population.


<div class="mw-translate-fuzzy">
=Site-specific policies=
=Site-specific policies=
By policy, Béluga's compute nodes cannot access the internet. If you need an exception to this rule, contact [[Technical_support|technical support]] with information about the IP address, port number(s) and protocol(s) needed as well as the duration and a contact person.  
By policy, Béluga's compute nodes cannot access the internet. If you need an exception to this rule, contact [[Technical_support|technical support]] with information about the IP address, port number(s) and protocol(s) needed as well as the duration and a contact person.  


Crontab is not offered on Béluga.
Crontab is not offered on Béluga.
</div>


Each job on Béluga should have a duration of at least one hour (five minutes for test jobs) and a user cannot have more than 1000 jobs, running and queued, at any given moment. The maximum duration for a job on Béluga is 7 days (168 hours).
Each job on Béluga should have a duration of at least one hour (five minutes for test jobs) and a user cannot have more than 1000 jobs, running and queued, at any given moment. The maximum duration for a job on Béluga is 7 days (168 hours).


<div class="mw-translate-fuzzy">
=Storage=
=Storage=
</div>


{| class="wikitable sortable"
{| class="wikitable sortable"
Line 58: Line 62:
For transferring data via Globus, you should use the endpoint <code>computecanada#beluga-dtn</code>, while for tools like rsync and scp you can use a login node.
For transferring data via Globus, you should use the endpoint <code>computecanada#beluga-dtn</code>, while for tools like rsync and scp you can use a login node.


<div class="mw-translate-fuzzy">
=High-performance interconnect=
=High-performance interconnect=
</div>


A Mellanox Infiniband EDR (100 Gb/s) network connects together all the nodes of the cluster. A central switch of 324 ports links the cluster's island topology with a maximum blocking factor of 5:1. The storage servers are networked with a non-blocking connection. The architecture permits multiple parallel jobs with up to 640 cores (or more) thanks to a non-blocking network. For jobs requiring greater parallelism, the blocking factor is 5:1 but even for jobs executed across several islands, the interconnection is high-performance.
A Mellanox Infiniband EDR (100 Gb/s) network connects together all the nodes of the cluster. A central switch of 324 ports links the cluster's island topology with a maximum blocking factor of 5:1. The storage servers are networked with a non-blocking connection. The architecture permits multiple parallel jobs with up to 640 cores (or more) thanks to a non-blocking network. For jobs requiring greater parallelism, the blocking factor is 5:1 but even for jobs executed across several islands, the interconnection is high-performance.


<div class="mw-translate-fuzzy">
=Node characteristics=
=Node characteristics=
Turbo mode is activated on all compute nodes of Béluga.
Turbo mode is activated on all compute nodes of Béluga.
Line 80: Line 87:
| 172 || 40 || 186G or 191000M ||2 x Intel Gold 6148 Skylake @ 2.4 GHz || 1 x NVMe SSD 1.6T || 4 x NVidia V100SXM2 (16G memory), connected via NVLink
| 172 || 40 || 186G or 191000M ||2 x Intel Gold 6148 Skylake @ 2.4 GHz || 1 x NVMe SSD 1.6T || 4 x NVidia V100SXM2 (16G memory), connected via NVLink
|}
|}
</div>


* To get a larger <code>$SLURM_TMPDIR</code> space, a job can be submitted with <code>--tmp=xG</code>, where <code>x</code> is a value between 350 and 2490.
* To get a larger <code>$SLURM_TMPDIR</code> space, a job can be submitted with <code>--tmp=xG</code>, where <code>x</code> is a value between 350 and 2490.
38,760

edits