Infrastructure renewal: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 6: Line 6:


<!--T:2-->
<!--T:2-->
Our Advanced Research Computing infrastructure is undergoing major changes to always provide better High Performance Computing (HPC) and Cloud services for Canadian research. This page will be regularly updated to keep you informed of the activities concerning the transition to the new equipment.<br><br>
Our Advanced Research Computing infrastructure is undergoing major changes to always provide better High Performance Computing (HPC) and Cloud services for Canadian research. This page will be regularly updated to keep you informed of the activities concerning the transition to the new equipment.<br>
The infrastructure renewal will replace the nearly 80% of our current equipment that is approaching end-of-life. The new equipment will offer faster processing speeds, greater storage capacity, and improved reliability.
The infrastructure renewal will replace the nearly 80% of our current equipment that is approaching end-of-life. The new equipment will offer faster processing speeds, greater storage capacity, and improved reliability.


Line 19: Line 19:
<!--T:4-->
<!--T:4-->
=Technical specifications=
=Technical specifications=
Technical specifications for each new system will be provided further down this page in future updates. Generally, they will be similar in architecture to the current systems, but with considerably increased capacity and performance.<br><br>
Technical specifications for each new system will be provided further down this page in future updates. Generally, they will be similar in architecture to the current systems, but with considerably increased capacity and performance.<br>
For example, we expect to have fewer compute nodes, but each node will have a significant increase in the number of its cores, for an overall increase in the total number of CPU cores.
For example, we expect to have fewer compute nodes, but each node will have a significant increase in the number of its cores, for an overall increase in the total number of CPU cores.


Line 27: Line 27:
<!--T:6-->
<!--T:6-->
==System outages==
==System outages==
An intense period of work will be conducted in the winter of 2024-2025 and spring of 2025. During the installation and the transition to the new systems, outages will be unavoidable due to constraints on space and electrical power. <br><br>
An intense period of work will be conducted in the winter of 2024-2025 and spring of 2025. During the installation and the transition to the new systems, outages will be unavoidable due to constraints on space and electrical power. <br>
We recommend that you consider the possibility of outages when you plan research programs, graduate examinations, etc.
We recommend that you consider the possibility of outages when you plan research programs, graduate examinations, etc.


<!--T:7-->
<!--T:7-->
==Resource Allocation Competition (RAC)==
==Resource Allocation Competition (RAC)==
The Resource Allocation Competition will be impacted by this transition, but the application process remains the same. Application deadline this year is October 30, 2024.<br><br>
The [https://www.alliancecan.ca/en/services/advanced-research-computing/accessing-resources/resource-allocation-competition Resource Allocation Competitionwill be impacted by this transition, but the application process remains the same. Application deadline this year is October 30, 2024.<br>
2024/25 allocations will remain in effect on retiring clusters while each cluster remains in service.  The 2025/26 allocations will be implemented everywhere once all new clusters are in service.<br><br>
2024/25 allocations will remain in effect on retiring clusters while each cluster remains in service.  The 2025/26 allocations will be implemented everywhere once all new clusters are in service.<br>
Because the old clusters will mostly be out of service before all new ones are available, if you hold both a 2024 and a 2025 RAC award you will experience a period when neither award is available to you. You will be able to compute with your default allocation (<code>def-xxxxxx</code>) on each new cluster as soon as it goes into service, but the 2025 RAC allocations will only become available when all new clusters are in service.  
Because the old clusters will mostly be out of service before all new ones are available, if you hold both a 2024 and a 2025 RAC award you will experience a period when neither award is available to you. You will be able to compute with your default allocation (<code>def-xxxxxx</code>) on each new cluster as soon as it goes into service, but the 2025 RAC allocations will only become available when all new clusters are in service.  


= Status updates = <!--T:8-->
<!--T:8-->
=General progress updates=
{| class="wikitable"
|-
| Sep 13, 2024 || The RFP processes for all sites except for Rorqual (replacing Béluga) have been completed, and purchase orders have been sent to vendors. The Rorqual storage Request for Proposals is still open and is scheduled to complete on September 18.
All sites are working on infrastructure design (power and cooling) and implementation. We are expecting some outages throughout the fall for cabling and plumbing upgrades.
|-
| Sep 3, 2024 || All sites have completed their Requests for Proposals, and are working with the vendors on deliverables and purchase orders. 
|}


<!--T:9-->
<!--T:9-->
For current outages please see the [https://status.computecanada.ca system status page].
=Activities by system=


<!--T:10-->
<!--T:10-->
==Resource Allocation Competition (RAC)==
==Arbutus, cloud==
 
[[Arbutus]]
{| class="wikitable"
|-
| Sep 13, 2024 || The RFP processes for all sites except for Rorqual (the replacement of Béluga) have been completed, and purchase orders to vendors have been sent. The Rorqual storage RFP is still open and is scheduled to complete on Sep 18.
All sites are working on infrastructure (power and cooling) design and implementation. We are expecting some outages over the autumn for cabling and plumbing upgrades, and will update this page when we know more. 
|-
| Sep 3, 2024 || Currently all sites are completing their Requests for Proposals, and have been working with the vendors on deliverables and purchase orders. 
|}


= Technical specifications = <!--T:11-->


<!--T:12-->
The sites cannot yet provide detailed technical specifications of the new systems. Generally, the new systems will be similar in architecture to the old systems but with considerably increased capacity and performance. For instance, we expect to have fewer compute nodes, but each node will have a significant increase in the number of cores due to the increase in the size of multi-core CPUs since 2017.


= Resource Allocation Competition and renewals = <!--T:13-->
<!--T:13-->
The Resource Allocation Competition (RAC) and RAC renewals will be affected by this transition, but we are not changing the normal RAC process. Expect to see the usual announcements for the competition in September 2024. We expect to implement the 2025/26 allocations on the new machines when they become available so there may be some delay in RAC implementation. See RAC documentation available [https://www.alliancecan.ca/en/services/advanced-research-computing/accessing-resources/resource-allocation-competition here].
T


= System-specific updates = <!--T:14-->
= System-specific updates = <!--T:14-->
rsnt_translations
56,420

edits