Translations:Best practices for job submission/10/en: Difference between revisions
Jump to navigation
Jump to search
(Importing a new version from external source) |
(Importing a new version from external source) |
||
Line 1: | Line 1: | ||
* | * <b>Increase the estimated duration by 5% or 10%</b>, just in case. | ||
** It's natural to leave a certain amount of room for error in the estimate, but otherwise it's in your interest for your estimate of the job's duration to be as accurate as possible. | ** It's natural to leave a certain amount of room for error in the estimate, but otherwise it's in your interest for your estimate of the job's duration to be as accurate as possible. | ||
* Longer jobs, such as those with a duration exceeding 48 hours, should | * Longer jobs, such as those with a duration exceeding 48 hours, should <b>consider using [[Points_de_contrôle/en|checkpoints]]</b> if the software permits this. | ||
** With a checkpoint, the program writes a snapshot of its state to a diskfile and the program can then be restarted from this diskfile, at that precise point in the calculation. In this way, even if there is a power outage or some other interruption of the compute node(s) being used by your job, you won't necessarily lose much work if your program writes a checkpoint file every six or eight hours. | ** With a checkpoint, the program writes a snapshot of its state to a diskfile and the program can then be restarted from this diskfile, at that precise point in the calculation. In this way, even if there is a power outage or some other interruption of the compute node(s) being used by your job, you won't necessarily lose much work if your program writes a checkpoint file every six or eight hours. |
Latest revision as of 19:29, 17 July 2023
- Increase the estimated duration by 5% or 10%, just in case.
- It's natural to leave a certain amount of room for error in the estimate, but otherwise it's in your interest for your estimate of the job's duration to be as accurate as possible.
- Longer jobs, such as those with a duration exceeding 48 hours, should consider using checkpoints if the software permits this.
- With a checkpoint, the program writes a snapshot of its state to a diskfile and the program can then be restarted from this diskfile, at that precise point in the calculation. In this way, even if there is a power outage or some other interruption of the compute node(s) being used by your job, you won't necessarily lose much work if your program writes a checkpoint file every six or eight hours.