Translations:Points de contrôle/1/en: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
 
Line 1: Line 1:
The execution time for a program is sometimes too long for the maximum duration of a job permitted by the job schedulers used on Compute Canada clusters. Long-running jobs are also subject to all of the risks of system instability due to power outages, hardware defects and so forth. A program with a short execution time can easily be restarted with little concern but for long-running software it is preferable to use checkpoints to minimize the risk of losing several days' worth of computation. These checkpoints take the form of binary disk files from which the program can be restarted at the point in the computation where the checkpoint file was initially created.
The execution time for a program is sometimes too long for the maximum duration of a job permitted by the job schedulers used on the clusters. Long-running jobs are also subject to all of the risks of system instability due to power outages, hardware defects and so forth. A program with a short execution time can easily be restarted with little concern but for long-running software it is preferable to use checkpoints to minimize the risk of losing several days' worth of computation. These checkpoints take the form of binary disk files from which the program can be restarted at the point in the computation where the checkpoint file was initially created.

Latest revision as of 14:23, 27 September 2024

Information about message (contribute)
This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.
Message definition (Points de contrôle)
L’exécution d’un programme est parfois trop longue pour la durée permise par les systèmes de soumissions qui sont sur les grappes. L’exécution d’un long programme est également tributaire des aléas des systèmes. Un programme ayant une courte durée d’exécution peut aisément être redémarré. Par contre, lorsque l’exécution du programme devient très longue, il est préférable de faire des points de contrôle pour minimiser les chances de perdre plusieurs semaines de calcul. Ceux-ci permettront par la suite le redémarrage du programme.

The execution time for a program is sometimes too long for the maximum duration of a job permitted by the job schedulers used on the clusters. Long-running jobs are also subject to all of the risks of system instability due to power outages, hardware defects and so forth. A program with a short execution time can easily be restarted with little concern but for long-running software it is preferable to use checkpoints to minimize the risk of losing several days' worth of computation. These checkpoints take the form of binary disk files from which the program can be restarted at the point in the computation where the checkpoint file was initially created.