Nibi

From Alliance Doc
Jump to navigation Jump to search
This page is a translated version of the page Nibi and the translation is 14% complete.
Other languages:


Availability: Spring 2025
Login node: to be determined
Globus endpoint: to be determined
Data transfer node (rsync, scp, sftp,...): to be determined
Portal: to be determined

Nibi, the Anishinaabemowin word for water, is the successor of Graham and a general purpose cluster of 134,400 CPU cores and 288 H100 NVIDIA GPUs built by Hypertec, hosted and operated by SHARCNET at University of Waterloo. It is expected that Nibi will come online by July 1, 2025.

Installation and transition

The data centre hosting the new cluster Nibi is currently under renovation. Due to the limits on available power and cooling capacity, a significantly reduced, small portion of the old cluster Graham will keep operating, yielding power to the new system Nibi’s acceptance testing and transition.

Storage

Parallel storage: 25 petabytes, all SSD flash from VAST Data for home, project and scratch.

Interconnect fabric

  • Nokia 200/400G ethernet
    • 200 Gbit/s network bandwidth for CPU nodes.
    • 400 Gbit/s non-blocking network bandwidth between all Nvidia GPU nodes.
    • 200 Gbit/s network bandwidth between all AMD GPU nodes.
    • 24x100 Gbit/s connection to the VAST storage nodes.
    • 2:1 blocking at 400 Gbit/s uplinks for all compute nodes.

Node characteristics

nodes cores available memory CPU GPU
700 192 768GB DDR5 2 x Intel 6972P @ 2.4 GHz, 384MB cache L3
10 192 6TB DDR5 2 x Intel 6972P @ 2.4 GHz, 384MB cache L3
36 192 1.5TB 1 x Intel 8570 @ 2.1 GHz, 300MB cache L3 8 x Nvidia H100 SXM (80 GB memory)
6 96 512GB 4 x AMD MI300A 16? x AMD CDNA 3 (128 GB HBM3 memory - unified memory model)