Rorqual
Availability: June 19, 2025 |
Login node: rorqual.alliancecan.ca |
Data transfer node (rsync, scp, sftp ...): rorqual.alliancecan.ca |
Globus collection: alliancecan#rorqual |
JupyterHub: coming soon... |
Portal: coming soon... |
Rorqual is a heterogeneous and versatile cluster designed for a wide variety of scientific calculations. The cluster is located at the École de technologie supérieure in Montreal. Its name recalls the rorqual, a marine mammal of which several species can be observed in the St. Lawrence River.
Access
Each researcher must request access in CCDB, via Resources--> Access Services.
- Select Rorqual from the list on the left.
- Select I request access.
- Click on the button to accept each of the following agreements
- Calcul Québec Consent for the collection and use of personal information
- Rorqual Service Level Agreement
- Calcul Québec Terms of Use
It can take up to one hour for your access to be enabled.
Site-specific policies
Rorqual's compute nodes cannot access the internet. If you need an exception to this rule, contact technical support explaining what you need and why.
The crontab
tool is not offered.
Each job should have a duration of at least one hour (at least five minutes for test jobs) and you cannot have more than 1000 jobs, running or queued, at any given moment. The maximum duration is 7 days (168 hours).
Storage
HOME Lustre filesystem, 116 TB |
|
SCRATCH Lustre filesystem, 6.5 PB |
|
PROJECT Lustre filesystem, 62 PB |
|
For transferring data via Globus, use the endpoint specified at the top of this page; for tools like rsync and scp, please use the login node.
High-performance interconnect
- InfiniBand interconnect
- HDR 200Gb/s
- Maximum blocking factor 34:6 or 5.667:1
- CPU node island size, up to 31 nodes of 192 cores, fully non-blocking.
Node characteristics
nodes | cores | available memory | storage | CPU | GPU |
---|---|---|---|---|---|
670 | 192 | 750G or 768000M | 1 x SATA SSD, 480G (6Gbit/s) | 2 x AMD EPYC 9654 (Zen 4) @ 2.40 GHz, 384MB cache L3 | |
8 | 1 x NVMe SSD, 3.84TB | ||||
8 | 3013G or 3086250M | 1 x SATA SSD, 480G (6Gbit/s) | |||
81 | 64 | 498G or 510000M | 1 x NVMe SSD, 3.84TB | 2 x Intel Xeon Gold 6448Y @ 2.10 GHz, 60MB cache L3 | 4 x NVidia H100 SXM5 (80GB) |
CPU nodes
The 192 cores and the different memory spaces are not equidistant, which causes variable delays (of the order of nanoseconds) to access data. In each node, there are
- 2 sockets, each with 12 system memory channels
- 4 NUMA nodes per socket, each connected to 3 system memory channels
- 3 chiplets per NUMA node, each with its own 32 MiB L3 cache memory
- 8 cores per chiplet, each with its own 1 MiB L2 cache memory and 32+32 KiB L1 cache memory
- 3 chiplets per NUMA node, each with its own 32 MiB L3 cache memory
- 4 NUMA nodes per socket, each connected to 3 system memory channels
In other words, we have
- groups of 8 closely spaced cores sharing a single L3 cache, which is ideal for multithreaded parallel programs (for example, with the
--cpus-per-task=8
option) - NUMA nodes of 3x8 = 24 cores sharing a trio of system memory channels
- a total of 2x4x3x8 = 192 cores per node
To fully benefit from this topology, full nodes must be reserved (e.g., with --ntasks-per-node=24 --cpus-per-task=8
) and the place of processes and threads must be explicitly controlled. Depending on the parallel program and the number of cores used, gains can be marginal or significant.
GPU nodes
The architecture is not as hierarchical.
- 2 sockets, each with
- 8 system memory channels
- 60 MiB L3 cache memory
- 32 equidistant cores, each each with its own 2 MiB L2 cache memory and 32+48 KiB L1 cache memory
- 2 NVidia H100 accelerators
The 4 node accelerators are interconnected by SXM5.
GPU instances
Approximately half of the GPU nodes are configured with MIG technology, and only 3 GPU instance sizes are available:
- 1g.10gb: 1/8th of the computing power with 10GB GPU memory
- 2g.20gb: 2/8th of the computing power with 20GB GPU memory
- 3g.40gb: 3/8th of the computing power with 40GB GPU memory
To request one and only one GPU instance for your compute job, use the corresponding option:
- 1g.10gb :
--gpus=nvidia_h100_80gb_hbm3_1g.10gb:1
- 2g.20gb :
--gpus=nvidia_h100_80gb_hbm3_2g.20gb:1
- 3g.40gb :
--gpus=nvidia_h100_80gb_hbm3_3g.40gb:1