Infrastructure renewal/fr: Difference between revisions

Jump to navigation Jump to search
Updating to match new version of source page
(Updating to match new version of source page)
Tags: Mobile edit Mobile web edit
(Updating to match new version of source page)
Line 2: Line 2:


<div lang="en" dir="ltr" class="mw-content-ltr">
<div lang="en" dir="ltr" class="mw-content-ltr">
Welcome to the ARC/Cloud renewal transition documentation for the Digital Research Alliance of Canada (the Alliance). This is the primary source for users with questions about the upgrade of our HPC/Cloud infrastructure. The upgrade will replace the nearly 80% of our current HPC and Community Cloud equipment which is approaching end-of-life.
=Major upgrade of our Advanced Research Computing infrastructure=
</div>  
</div>


<div lang="en" dir="ltr" class="mw-content-ltr">
<div lang="en" dir="ltr" class="mw-content-ltr">
= New for 2025 =
Our Advanced Research Computing infrastructure is undergoing major changes to always provide better High Performance Computing (HPC) and Cloud services for Canadian researchers. This page will be regularly updated to keep you informed of the activities concerning the transition to the new equipment.<br>
In 2023, The Digital Research Alliance of Canada was given formal approval and funding for a complete replacement of aging national systems.  
The infrastructure renewal will replace the nearly 80% of our current equipment that is approaching end-of-life. The new equipment will offer faster processing speeds, greater storage capacity, and improved reliability.
The new equipment will offer:
* Increased processing capacity
* Increased storage capacity
* Improved reliability
</div>
</div>


<div lang="en" dir="ltr" class="mw-content-ltr">
<div lang="en" dir="ltr" class="mw-content-ltr">
This new infrastructure will better support your computational tasks, providing a better-performing and more efficient environment for your research.
The systems involved are
*Arbutus, cloud
*Béluga, compute cluster only (not cloud)
*Cedar, compute cluster and cloud
*Graham, compute cluster and cloud
*Niagara, compute cluster
</div>
</div>


<div lang="en" dir="ltr" class="mw-content-ltr">
<div lang="en" dir="ltr" class="mw-content-ltr">
The systems being replaced are [[Arbutus]], [[Béluga]], [[Cedar]], [[Graham]] and [[Niagara]]. The new systems will be broadly comparable to the old systems, but with significantly increased capacity.
=Technical specifications=
Technical specifications for each new system will be provided further down this page in future updates. Generally, they will be similar in architecture to the current systems, but with considerably increased capacity and performance.<br>
For example, we expect to have fewer compute nodes, but each node will have a significant increase in the number of its cores, for an overall increase in the total number of CPU cores.
</div>
</div>


<div lang="en" dir="ltr" class="mw-content-ltr">
<div lang="en" dir="ltr" class="mw-content-ltr">
= Outages during the transition =
=Impacts=
This renewal will be implemented during an intense period in the winter of 2024-2025. Constraints on space and electrical power mean that there will have to be service outages during the installation and transition to the new systems. Each site will develop a transition plan for their new system. We expect to hear more details in the autumn and will continue to update this landing page as those details become known.
</div>
</div>  


<div lang="en" dir="ltr" class="mw-content-ltr">
<div lang="en" dir="ltr" class="mw-content-ltr">
{{Callout
==System outages==
  |title=Important information
An intense period of work will be conducted in the winter of 2024-2025 and spring of 2025. During the installation and the transition to the new systems, outages will be unavoidable due to constraints on space and electrical power. <br>
  |content=
We recommend that you consider the possibility of outages when you plan research programs, graduate examinations, etc.
There will be outages in the winter of 2024-25 and spring of 2025. We recommend that researchers consider the possibility of such outages when planning research programs, graduate examinations, etc., for next winter and spring.
}}
</div>
</div>
<div lang="en" dir="ltr" class="mw-content-ltr">
==Resource Allocation Competition (RAC)==
The [https://www.alliancecan.ca/en/services/advanced-research-computing/accessing-resources/resource-allocation-competition Resource Allocation Competition]  will be impacted by this transition, but the application process remains the same. Application deadline this year is October 30, 2024.<br>
2024/25 allocations will remain in effect on retiring clusters while each cluster remains in service.  The 2025/26 allocations will be implemented everywhere once all new clusters are in service.<br>
Because the old clusters will mostly be out of service before all new ones are available, if you hold both a 2024 and a 2025 RAC award you will experience a period when neither award is available to you. You will be able to compute with your default allocation (<code>def-xxxxxx</code>) on each new cluster as soon as it goes into service, but the 2025 RAC allocations will only become available when all new clusters are in service.
</div>


<div class="mw-translate-fuzzy">
<div class="mw-translate-fuzzy">
Line 40: Line 47:


<div lang="en" dir="ltr" class="mw-content-ltr">
<div lang="en" dir="ltr" class="mw-content-ltr">
For current outages please see the [https://status.computecanada.ca system status page].
=Activities by system=
</div>
</div>


<div lang="en" dir="ltr" class="mw-content-ltr">
<div lang="en" dir="ltr" class="mw-content-ltr">
{| class="wikitable"
==Arbutus, cloud==
|-
[[Arbutus]]
| Sep 13, 2024 || The RFP processes for all sites except for Rorqual (the replacement of Béluga) have been completed, and purchase orders to vendors have been sent. The Rorqual storage RFP is still open and is scheduled to complete on Sep 18.
<i>coming soon</i>
All sites are working on infrastructure (power and cooling) design and implementation. We are expecting some outages over the autumn for cabling and plumbing upgrades, and will update this page when we know more. 
|-
| Sep 3, 2024 || Currently all sites are completing their Requests for Proposals, and have been working with the vendors on deliverables and purchase orders. 
|}
</div>
</div>


Line 58: Line 61:


<div lang="en" dir="ltr" class="mw-content-ltr">
<div lang="en" dir="ltr" class="mw-content-ltr">
The sites cannot yet provide detailed technical specifications of the new systems. Generally, the new systems will be similar in architecture to the old systems but with considerably increased capacity and performance. For instance, we expect to have fewer compute nodes, but each node will have a significant increase in the number of cores due to the increase in the size of multi-core CPUs since 2017.
==Cedar, compute cluster and cloud==
[[Cedar]]
<i>coming soon</i>
</div>
</div>


<div lang="en" dir="ltr" class="mw-content-ltr">
<div lang="en" dir="ltr" class="mw-content-ltr">
= Resource Allocation Competition and renewals =
==Graham, compute cluster and cloud==
The Resource Allocation Competition (RAC) and RAC renewals will be affected by this transition, but we are not changing the normal RAC process. Expect to see the usual announcements for the competition in September 2024. We expect to implement the 2025/26 allocations on the new machines when they become available so there may be some delay in RAC implementation. See RAC documentation available [https://www.alliancecan.ca/en/services/advanced-research-computing/accessing-resources/resource-allocation-competition here].
[[Graham]]
<i>coming soon</i>
</div>
</div>


<div lang="en" dir="ltr" class="mw-content-ltr">
<div lang="en" dir="ltr" class="mw-content-ltr">
= System-specific updates =
==Niagara, compute cluster==
[[Niagara]]
<i>coming soon</i>
</div>
</div>


<div class="mw-translate-fuzzy">
== Arbutus ==
== Arbutus ==
<i>en préparation</i>
<i>en préparation</i>
</div>


<div class="mw-translate-fuzzy">
== Béluga ==
== Béluga ==
La grappe qui remplace Béluga se nomme Rorqual.  
La grappe qui remplace Béluga se nomme Rorqual.
 
</div>
<i>les renseignements sont en préparation</i>


<div class="mw-translate-fuzzy">
== Cedar, grappe de calcul et  =nuage ==
== Cedar, grappe de calcul et  =nuage ==
en préparation
en préparation
</div>


<div class="mw-translate-fuzzy">
== Graham, grappe de calcul et  nuage ==
== Graham, grappe de calcul et  nuage ==
<i>en préparation</i>
<i>en préparation</i>
</div>


<div class="mw-translate-fuzzy">
== Niagara ==
== Niagara ==
<i>en préparation</i>
<i>en préparation</i>
<div lang="en" dir="ltr" class="mw-content-ltr">
= Frequently asked questions =
As we work on finalizing the details, here are a few key points to keep in mind.
{{Note|We are committed to providing the most up-to-date information. Please check back regularly as this section will be updated frequently to reflect any new developments}}
</div>
<div lang="en" dir="ltr" class="mw-content-ltr">
== Will data be copied to the new systems? ==
Data migration to the new systems is a site responsibility. Each site will let you know what you need to do and what will be done for you once the details are finalized.
</div>
<div lang="en" dir="ltr" class="mw-content-ltr">
== When will outages occur? ==
Each site will have their own schedule for outages as the new equipment is installed and transitioned. Specific outages will as usual be described on the status pages (https://status.alliancecan.ca). We will also provide more general updates through this wiki page as we know more, probably in early autumn 2024.
We will also periodically send emails with updates and outage notices.
</div>
<div lang="en" dir="ltr" class="mw-content-ltr">
== Who should I contact for questions about the transition? ==
Contact our [[Technical support]], but don't expect them to know a great deal more than you read here.
</div>
<div lang="en" dir="ltr" class="mw-content-ltr">
== Will my jobs/applications run without change on the new system? ==
Generally yes, but with new CPUs and GPUs some codes may need recompiling or reconfiguring. More details will be provided during the transition.
</div>
</div>


<div lang="en" dir="ltr" class="mw-content-ltr">
<div lang="en" dir="ltr" class="mw-content-ltr">
== Will the software from the old systems still be available? ==
== Will the software from the current systems still be available? ==
Yes, our [[Standard software environments|standard software environment]] will be available on the new systems.
Yes, our [[Standard software environments|standard software environment]] will be available on the new systems.
</div>
</div>
Line 121: Line 110:
<div lang="en" dir="ltr" class="mw-content-ltr">
<div lang="en" dir="ltr" class="mw-content-ltr">
== Will there be staggered outages? ==
== Will there be staggered outages? ==
We will do our best to limit overlapping outages, but we are very constrained by delivery schedules and funding deadlines so there will probably be periods when many of our systems are simultaneously out. We’ll communicate all outages as early as possible.
We will do our best to limit overlapping outages, but because we are very constrained by delivery schedules and funding deadlines, there will probably be periods when several of our systems are simultaneously offline. Outages will be announced as early as possible.
</div>
</div>
38,757

edits

Navigation menu