Bureaucrats, cc_docs_admin, cc_staff
2,306
edits
No edit summary |
No edit summary |
||
Line 2: | Line 2: | ||
The execution time for a program is sometimes too long for the maximum duration of a job permitted by the job schedulers used on Compute Canada clusters. Long-running jobs are also subject to all of the risks of system instability due to power outages, hardware defects and so forth. A program with a short execution time can easily be restarted with little concern but for long-running software it is preferable to use checkpoints to minimize the risk of losing several days' worth of computation. These checkpoints take the form of binary disk files from which the program can be restarted at the point in the computation where the checkpoint file was initially created. | The execution time for a program is sometimes too long for the maximum duration of a job permitted by the job schedulers used on Compute Canada clusters. Long-running jobs are also subject to all of the risks of system instability due to power outages, hardware defects and so forth. A program with a short execution time can easily be restarted with little concern but for long-running software it is preferable to use checkpoints to minimize the risk of losing several days' worth of computation. These checkpoints take the form of binary disk files from which the program can be restarted at the point in the computation where the checkpoint file was initially created. | ||
== | == Creating and Loading a Checkpoint == | ||
The creation and loading of a checkpoint may already be taken care of by the application you're using. In this case you simply need to read the relevant documentation about how to use this functionality. | |||
Cependant, si vous avez accès au code source de l'application et/ou que vous en êtes l'auteur, vous pouvez implémenter la création et le chargement de points de contrôle. À la base: | Cependant, si vous avez accès au code source de l'application et/ou que vous en êtes l'auteur, vous pouvez implémenter la création et le chargement de points de contrôle. À la base: |