Infrastructure renewal: Difference between revisions
(add translation tags) |
(Marked this version for translation) |
||
Line 2: | Line 2: | ||
<translate> | <translate> | ||
<!--T:1--> | |||
{{Draft}} | {{Draft}} | ||
<!--T:2--> | |||
Welcome to the ARC/Cloud renewal transition documentation for the Digital Research Alliance of Canada (the Alliance). This is the primary source for users with questions about the upgrade of our HPC/Cloud infrastructure. The upgrade will replace the nearly 80% of our current HPC and Community Cloud equipment which is approaching end-of-life. | Welcome to the ARC/Cloud renewal transition documentation for the Digital Research Alliance of Canada (the Alliance). This is the primary source for users with questions about the upgrade of our HPC/Cloud infrastructure. The upgrade will replace the nearly 80% of our current HPC and Community Cloud equipment which is approaching end-of-life. | ||
= What's coming in 2025? = | = What's coming in 2025? = <!--T:3--> | ||
In 2023, The Digital Research Alliance of Canada was given formal approval and funding for a complete replacement of aging national systems. | In 2023, The Digital Research Alliance of Canada was given formal approval and funding for a complete replacement of aging national systems. | ||
The new equipment will offer: | The new equipment will offer: | ||
Line 13: | Line 15: | ||
* Improved reliability | * Improved reliability | ||
<!--T:4--> | |||
This new infrastructure will better support your computational tasks, providing a better-performing and more efficient environment for your research. | This new infrastructure will better support your computational tasks, providing a better-performing and more efficient environment for your research. | ||
<!--T:5--> | |||
The systems being replaced are [[Arbutus]], [[Béluga]], [[Cedar]], [[Graham]] and [[Niagara]]. The new systems will be broadly comparable to the old systems, but with significantly increased capacity. | The systems being replaced are [[Arbutus]], [[Béluga]], [[Cedar]], [[Graham]] and [[Niagara]]. The new systems will be broadly comparable to the old systems, but with significantly increased capacity. | ||
= Outages during the transition = | = Outages during the transition = <!--T:6--> | ||
This renewal will be implemented during an intense period in the winter of 2024-2025. Constraints on space and electrical power mean that there will have to be service outages during the installation and transition to the new systems. Each site will develop a transition plan for their new system. We expect to hear more details in the autumn and will continue to update this landing page as those details become known. | This renewal will be implemented during an intense period in the winter of 2024-2025. Constraints on space and electrical power mean that there will have to be service outages during the installation and transition to the new systems. Each site will develop a transition plan for their new system. We expect to hear more details in the autumn and will continue to update this landing page as those details become known. | ||
<!--T:7--> | |||
{{Callout | {{Callout | ||
|title=Important information | |title=Important information | ||
Line 26: | Line 31: | ||
}} | }} | ||
= Status Updates = | = Status Updates = <!--T:8--> | ||
<!--T:9--> | |||
For current outages please see the [https://status.computecanada.ca system status page]. | For current outages please see the [https://status.computecanada.ca system status page]. | ||
<!--T:10--> | |||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
Line 38: | Line 45: | ||
|} | |} | ||
= Technical Specifications = | = Technical Specifications = <!--T:11--> | ||
<!--T:12--> | |||
The sites cannot yet provide detailed technical specifications of the new systems. Generally, the new systems will be similar in architecture to the old systems but with considerably increased capacity and performance. For instance, we expect to have fewer compute nodes, but each node will have a significant increase in the number of cores due to the increase in the size of multi-core CPUs since 2017. | The sites cannot yet provide detailed technical specifications of the new systems. Generally, the new systems will be similar in architecture to the old systems but with considerably increased capacity and performance. For instance, we expect to have fewer compute nodes, but each node will have a significant increase in the number of cores due to the increase in the size of multi-core CPUs since 2017. | ||
= Resource Allocation Competition and renewals = | = Resource Allocation Competition and renewals = <!--T:13--> | ||
The Resource Allocation Competition (RAC) and RAC renewals will be affected by this transition, but we are not changing the normal RAC process. Expect to see the usual announcements for the competition in September 2024. We expect to implement the 2025/26 allocations on the new machines when they become available so there may be some delay in RAC implementation. See RAC documentation available [https://www.alliancecan.ca/en/services/advanced-research-computing/accessing-resources/resource-allocation-competition here]. | The Resource Allocation Competition (RAC) and RAC renewals will be affected by this transition, but we are not changing the normal RAC process. Expect to see the usual announcements for the competition in September 2024. We expect to implement the 2025/26 allocations on the new machines when they become available so there may be some delay in RAC implementation. See RAC documentation available [https://www.alliancecan.ca/en/services/advanced-research-computing/accessing-resources/resource-allocation-competition here]. | ||
= System-specific updates = | = System-specific updates = <!--T:14--> | ||
== Arbutus == | == Arbutus == <!--T:15--> | ||
Coming soon | Coming soon | ||
== Béluga == | == Béluga == <!--T:16--> | ||
Coming soon | Coming soon | ||
== Cedar and Cedar Cloud == | == Cedar and Cedar Cloud == <!--T:17--> | ||
Coming soon | Coming soon | ||
== Graham and Graham Cloud == | == Graham and Graham Cloud == <!--T:18--> | ||
Coming soon | Coming soon | ||
== Niagara == | == Niagara == <!--T:19--> | ||
Coming soon | Coming soon | ||
= Frequently asked questions = | = Frequently asked questions = <!--T:20--> | ||
As we work on finalizing the details, here are a few key points to keep in mind. | As we work on finalizing the details, here are a few key points to keep in mind. | ||
{{Note|We are committed to providing the most up-to-date information. Please check back regularly as this section will be updated frequently to reflect any new developments}} | {{Note|We are committed to providing the most up-to-date information. Please check back regularly as this section will be updated frequently to reflect any new developments}} | ||
== Will data be copied to the new systems? == | == Will data be copied to the new systems? == <!--T:21--> | ||
Data migration to the new systems is a site responsibility. Each site will let you know what you need to do and what will be done for you once the details are finalized. | Data migration to the new systems is a site responsibility. Each site will let you know what you need to do and what will be done for you once the details are finalized. | ||
== When will outages occur? == | == When will outages occur? == <!--T:22--> | ||
Each site will have their own schedule for outages as the new equipment is installed and transitioned. Specific outages will as usual be described on the status pages (https://status.alliancecan.ca). We will also provide more general updates through this wiki page as we know more, probably in early autumn 2024. | Each site will have their own schedule for outages as the new equipment is installed and transitioned. Specific outages will as usual be described on the status pages (https://status.alliancecan.ca). We will also provide more general updates through this wiki page as we know more, probably in early autumn 2024. | ||
We will also periodically send emails with updates and outage notices. | We will also periodically send emails with updates and outage notices. | ||
== Who should I contact for questions about the transition? == | == Who should I contact for questions about the transition? == <!--T:23--> | ||
Contact our [[Technical support]], but don't expect them to know a great deal more than you read here. | Contact our [[Technical support]], but don't expect them to know a great deal more than you read here. | ||
== Will my jobs/applications run without change on the new system? == | == Will my jobs/applications run without change on the new system? == <!--T:24--> | ||
Generally yes, but with new CPUs and GPUs some codes may need recompiling or reconfiguring. More details will be provided during the transition. | Generally yes, but with new CPUs and GPUs some codes may need recompiling or reconfiguring. More details will be provided during the transition. | ||
== Will the software from the old systems still be available? == | == Will the software from the old systems still be available? == <!--T:25--> | ||
Yes, our standard software environment will be available on the new systems. | Yes, our standard software environment will be available on the new systems. | ||
== Will there be staggered outages? == | == Will there be staggered outages? == <!--T:26--> | ||
We will do our best to limit overlapping outages, but we are very constrained by delivery schedules and funding deadlines so there will probably be periods when many of our systems are simultaneously out. We’ll communicate all outages as early as possible. | We will do our best to limit overlapping outages, but we are very constrained by delivery schedules and funding deadlines so there will probably be periods when many of our systems are simultaneously out. We’ll communicate all outages as early as possible. | ||
</translate> | </translate> |
Revision as of 22:39, 16 September 2024
This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.
Welcome to the ARC/Cloud renewal transition documentation for the Digital Research Alliance of Canada (the Alliance). This is the primary source for users with questions about the upgrade of our HPC/Cloud infrastructure. The upgrade will replace the nearly 80% of our current HPC and Community Cloud equipment which is approaching end-of-life.
What's coming in 2025?[edit]
In 2023, The Digital Research Alliance of Canada was given formal approval and funding for a complete replacement of aging national systems. The new equipment will offer:
- Increased processing capacity
- Increased storage capacity
- Improved reliability
This new infrastructure will better support your computational tasks, providing a better-performing and more efficient environment for your research.
The systems being replaced are Arbutus, Béluga, Cedar, Graham and Niagara. The new systems will be broadly comparable to the old systems, but with significantly increased capacity.
Outages during the transition[edit]
This renewal will be implemented during an intense period in the winter of 2024-2025. Constraints on space and electrical power mean that there will have to be service outages during the installation and transition to the new systems. Each site will develop a transition plan for their new system. We expect to hear more details in the autumn and will continue to update this landing page as those details become known.
There will be outages in the winter of 2024-25 and spring of 2025. We recommend that researchers consider the possibility of such outages when planning research programs, graduate examinations, etc., for next winter and spring.
Status Updates[edit]
For current outages please see the system status page.
Sep 13, 2024 | All sites except McGill have completed their RFP processes and have sent Purchase Orders to vendors. The McGill (Rorqual) storage RFP is still open and is scheduled to complete on Sep 18.
All sites are working on infrastructure (power and cooling) design and implementation. We are expecting some outages over the autumn for cabling and plumbing upgrades, and will update this page when we know more. |
Sep 3, 2024 | Currently all sites are completing their Requests for Proposals, and have been working with the vendors on deliverables and purchase orders. |
Technical Specifications[edit]
The sites cannot yet provide detailed technical specifications of the new systems. Generally, the new systems will be similar in architecture to the old systems but with considerably increased capacity and performance. For instance, we expect to have fewer compute nodes, but each node will have a significant increase in the number of cores due to the increase in the size of multi-core CPUs since 2017.
Resource Allocation Competition and renewals[edit]
The Resource Allocation Competition (RAC) and RAC renewals will be affected by this transition, but we are not changing the normal RAC process. Expect to see the usual announcements for the competition in September 2024. We expect to implement the 2025/26 allocations on the new machines when they become available so there may be some delay in RAC implementation. See RAC documentation available here.
System-specific updates[edit]
Arbutus[edit]
Coming soon
Béluga[edit]
Coming soon
Cedar and Cedar Cloud[edit]
Coming soon
Graham and Graham Cloud[edit]
Coming soon
Niagara[edit]
Coming soon
Frequently asked questions[edit]
As we work on finalizing the details, here are a few key points to keep in mind.
We are committed to providing the most up-to-date information. Please check back regularly as this section will be updated frequently to reflect any new developments |
Will data be copied to the new systems?[edit]
Data migration to the new systems is a site responsibility. Each site will let you know what you need to do and what will be done for you once the details are finalized.
When will outages occur?[edit]
Each site will have their own schedule for outages as the new equipment is installed and transitioned. Specific outages will as usual be described on the status pages (https://status.alliancecan.ca). We will also provide more general updates through this wiki page as we know more, probably in early autumn 2024. We will also periodically send emails with updates and outage notices.
Who should I contact for questions about the transition?[edit]
Contact our Technical support, but don't expect them to know a great deal more than you read here.
Will my jobs/applications run without change on the new system?[edit]
Generally yes, but with new CPUs and GPUs some codes may need recompiling or reconfiguring. More details will be provided during the transition.
Will the software from the old systems still be available?[edit]
Yes, our standard software environment will be available on the new systems.
Will there be staggered outages?[edit]
We will do our best to limit overlapping outages, but we are very constrained by delivery schedules and funding deadlines so there will probably be periods when many of our systems are simultaneously out. We’ll communicate all outages as early as possible.