User:Ppomorsk/Graham2 draft

From Alliance Doc
Jump to navigation Jump to search

Graham 2 draft.



Availability: Spring 2025
Login node: to be determined
Globus endpoint: to be determined
Data transfer node (rsync, scp, sftp,...): to be determined
Portal: to be determined

Nibi, the successor of Graham, named after …, is a general purpose cluster of 134,400 CPU cores and 280 H100 NVIDIA GPUs built by Hypertec, hosted and operated by SHARCNET at University of Waterloo. It is expected that nibi will come online by July 1, 2025.

Installation and transition[edit]

The data centre hosting the new cluster nibi is currently under renovation. Due to the limits on available power and cooling capacity, a significantly reduced, small portion of the old cluster Graham will keep operating, yielding power to the new system nibi’s acceptance testing and transition.

Storage[edit]

Parallel storage: 25 petabytes, all SSD flash from VAST Data for home, project and scratch.

Interconnect fabric[edit]

  • Nokia 200/400G eithernet
    • 2x200 Gbit/s network bandwidth for CPU nodes.
    • 400 Gbit/s network bandwidth for GPU nodes.
    • 3x100 Gbit/s connection to the VAST storage nodes.
    • 2:1 blocking at 400 Gbit/s uplinks for compute nodes, non-blocking 2x400 Gbit/s for large memory nodes.

Node characteristics[edit]

nodes cores available memory CPU GPU
700 192 768GB DDR5 2 x Intel XXX @ 2.6 GHz, 384MB cache L3
10 192 6TB DDR5 2 x Intel XXX @ 2.6 GHz, 384MB cache L3
36 192 1.5TB 1 x Intel YYY @ 2.4 GHz, 384MB cache L3 8 x NVidia H100 SXM (80 GB memory)