Infrastructure renewal completed events

From Alliance Doc
Jump to navigation Jump to search
Other languages:


This page provides details of completed events which are part of the infrastructure renewal activities.

Start Time End Time Status System Type Description
Jan 13, 2025 Feb 14, 2025 Complete Béluga, Narval Temporary Reduction Performance and stability tests on Rorqual will require the shutdown of all Béluga compute nodes and about half of Narval compute nodes from 8 a.m. on January 13 until 12 p.m. (noon) on January 31, 2025 (EST). Login nodes and data access will remain operational. On Narval, approximately 50% of nodes from each category (CPU, GPU, and large memory) will be shut down. During the shutdown time, Béluga Storage will be mounted to Narval (/lustre01, /lustre02, /lustre03, /lustre04 of Beluga). Béluga and Juno cloud instances are unaffected. Jobs on Béluga scheduled to complete after 8 a.m. on January 13 will remain queued until the cluster resumes.

Jan 30, 2025 UPDATE: Narval's compute capacity is at 100% until February 6, then again at 30% for the last Rorqual tests. Béluga and Narval should be back to 100% capacity on February 14. For updated information, please see https://status.alliancecan.ca.

Jan 22, 2025 Jan 22, 2025 (1 day) Complete Niagara, Mist Outage Niagara and Mist compute nodes will be shut down on January 22, 2025 from 8 AM to 5 PM EST to support ongoing system improvements and the integration with the new system, Trillium.

The login nodes, file systems, and the HPSS system will remain available. The scheduler will hold jobs that are submitted until the maintenance has finished.

Jan 13, 2025 Jan 21, 2025 (9 days) Complete Cedar (100%) Outage The Cedar compute cluster will be shut down in preparation for the infrastructure renewal. Jobs submitted to the cluster will queue and may start running if they can complete before the shutdown. Jobs that cannot run will remain in the queue until the cluster is fully operational on January 21. The Cedar /scratch filesystem will be migrated to new storage. Please move any important data immediately to your /project, /nearline, or /home directory.

Cedar cloud will remain operational during this period.

Nov 25, 2024 Nov 26, 2024 (1 day) Complete Niagara Outage A full power shutdown will take place for main panel upgrades ahead of Trillium cluster setup. All Niagara services, including the cluster and scheduler, will pause during this time. The scheduler will hold jobs that cannot finish before the start of the shutdown. Users are encouraged to submit smaller, short-duration jobs to optimize idle node usage before the maintenance begins.
Nov 7, 2024 Nov 8, 2024 (1 day) Complete Niagara Outage All systems and storage at the SciNet Datacenter (Niagara, Mist, HPSS, Rouge, Teach, JupyterHub, Balam) will be unavailable from 7 a.m. to 5 p.m. ET. This outage is necessary for installing new electrical equipment (UPS) as part of a systems refresh. The scheduler will pause jobs unable to finish before the shutdown. Users can prioritize short jobs to utilize otherwise idle nodes prior to maintenance.
Nov 7, 2024, 6 a.m. PST Nov 8, 2024, 6 a.m. PST Complete Cedar Outage Cedar compute nodes will be unavailable during this period. However, Cedar login nodes, storage, and cloud services will remain operational and unaffected.