Infrastructure renewal/fr: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
Tags: Mobile edit Mobile web edit
Line 3: Line 3:
=Importante mise à jour de notre infrastructure de calcul informatique de pointe=
=Importante mise à jour de notre infrastructure de calcul informatique de pointe=


L'impoprtante mise à jour de notre infrastructure de calcul informatique de pointe permettra d’améliorer nos services de calcul de haute performance et nos services infonuagiques pour soutenir la recherche au Canada.
L'importante mise à jour de notre infrastructure de calcul informatique de pointe permettra d’améliorer nos services de calcul de haute performance et nos services infonuagiques pour soutenir la recherche au Canada.
<br>
<br>
Cette mise à jour verra le remplacement de près de 80&nbsp;% de nos équipements actuels qui approchent de leur fin de vie. Le nouveau matériel offrira une vitesse de traitement plus rapide, une plus grande capacité de stockage et une fiabilité améliorée.
Cette mise à jour verra le remplacement de près de 80&nbsp;% de nos équipements actuels qui approchent de leur fin de vie. Le nouveau matériel offrira une vitesse de traitement plus rapide, une plus grande capacité de stockage et une fiabilité améliorée.

Revision as of 20:03, 20 September 2024

Other languages:

Importante mise à jour de notre infrastructure de calcul informatique de pointe

L'importante mise à jour de notre infrastructure de calcul informatique de pointe permettra d’améliorer nos services de calcul de haute performance et nos services infonuagiques pour soutenir la recherche au Canada.
Cette mise à jour verra le remplacement de près de 80 % de nos équipements actuels qui approchent de leur fin de vie. Le nouveau matériel offrira une vitesse de traitement plus rapide, une plus grande capacité de stockage et une fiabilité améliorée.

Les systèmes touchés sont

Spécifications techniques

Technical specifications for each new system will be provided further down this page in future updates. Generally, they will be similar in architecture to the current systems, but with considerably increased capacity and performance.
For example, we expect to have fewer compute nodes, but each node will have a significant increase in the number of its cores, for an overall increase in the total number of CPU cores.

Impacts

System outages

An intense period of work will be conducted in the winter of 2024-2025 and spring of 2025. During the installation and the transition to the new systems, outages will be unavoidable due to constraints on space and electrical power.
We recommend that you consider the possibility of outages when you plan research programs, graduate examinations, etc.

Resource Allocation Competition (RAC)

The Resource Allocation Competition will be impacted by this transition, but the application process remains the same. Application deadline this year is October 30, 2024.
2024/25 allocations will remain in effect on retiring clusters while each cluster remains in service. The 2025/26 allocations will be implemented everywhere once all new clusters are in service.
Because the old clusters will mostly be out of service before all new ones are available, if you hold both a 2024 and a 2025 RAC award you will experience a period when neither award is available to you. You will be able to compute with your default allocation (def-xxxxxx) on each new cluster as soon as it goes into service, but the 2025 RAC allocations will only become available when all new clusters are in service.

Activités générales

2024-10-13 The RFP processes for all sites except for Rorqual (replacing Béluga) have been completed, and purchase orders have been sent to vendors. The Rorqual storage Request for Proposals is still open and is scheduled to complete on September 18.

All sites are working on infrastructure design (power and cooling) and implementation. We are expecting some outages throughout the fall for cabling and plumbing upgrades.

2024-10-03 All sites have completed their Requests for Proposals, and are working with the vendors on deliverables and purchase orders.

Activités par système

Arbutus, nuage

Arbutus en préparation

Béluga, grappe de calcul (aucun changement au nuage)

Beluga Le nom du système qui remplacera béluga est Rorqual. en préparation

Cedar, grappe de calcul et nuage

Cedar/fr en préparation

Graham, grappe de calcul et nuage

Graham/fr en préparation

Niagara, grappe de calcul

Niagara/fr en préparation

Foire aux questions

Will my data be copied to its new system?

Data migration to the new systems is the responsibility of each National Host Site who will inform you of what you need to do.

When will outages occur?

Each National Host Site will have its own schedule for outages as the installation of and transition to new equipment proceeds. As usual, specific outages will be described on our system status web page. We will also provide more general updates through this wiki page as we know more, probably in early autumn 2024. We will provide more general updates on this wiki page and you will periodically receive emails with updates and outage notices.

Who can I contact for questions about the transition?

Contact our technical support. They will try their best to answer any questions they can.

Will my jobs and applications still be able to run on the new system?

Generally yes, but the new CPUs and GPUs may require recompilation or reconfiguration of some applications. More details will be provided as the transition unfolds.

Will the software from the current systems still be available?

Yes, our standard software environment will be available on the new systems.

Will there be staggered outages?

We will do our best to limit overlapping outages, but because we are very constrained by delivery schedules and funding deadlines, there will probably be periods when several of our systems are simultaneously offline. Outages will be announced as early as possible.