Translations:Best practices for job submission/10/en: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
(Importing a new version from external source)
 
(Importing a new version from external source)
 
Line 1: Line 1:
* '''Increase the estimated duration by 5% or 10%''', just in case.
* <b>Increase the estimated duration by 5% or 10%</b>, just in case.
** It's natural to leave a certain amount of room for error in the estimate, but otherwise it's in your interest for your estimate of the job's duration to be as accurate as possible.
** It's natural to leave a certain amount of room for error in the estimate, but otherwise it's in your interest for your estimate of the job's duration to be as accurate as possible.
* Longer jobs, such as those with a duration exceeding 48 hours, should '''consider using [[Points_de_contrôle/en|checkpoints]]''' if the software permits this.
* Longer jobs, such as those with a duration exceeding 48 hours, should <b>consider using [[Points_de_contrôle/en|checkpoints]]</b> if the software permits this.
** With a checkpoint, the program writes a snapshot of its state to a diskfile and the program can then be restarted from this diskfile, at that precise point in the calculation. In this way, even if there is a power outage or some other interruption of the compute node(s) being used by your job, you won't necessarily lose much work if your program writes a checkpoint file every six or eight hours.
** With a checkpoint, the program writes a snapshot of its state to a diskfile and the program can then be restarted from this diskfile, at that precise point in the calculation. In this way, even if there is a power outage or some other interruption of the compute node(s) being used by your job, you won't necessarily lose much work if your program writes a checkpoint file every six or eight hours.

Latest revision as of 19:29, 17 July 2023

Information about message (contribute)
This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.
Message definition (Best practices for job submission)
* <b>Increase the estimated duration by 5% or 10%</b>, just in case.
** It's natural to leave a certain amount of room for error in the estimate, but otherwise it's in your interest for your estimate of the job's duration to be as accurate as possible.
* Longer jobs, such as those with a duration exceeding 48 hours, should <b>consider using [[Points_de_contrôle/en|checkpoints]]</b> if the software permits this.
** With a checkpoint, the program writes a snapshot of its state to a diskfile and the program can then be restarted from this diskfile, at that precise point in the calculation. In this way, even if there is a power outage or some other interruption of the compute node(s) being used by your job, you won't necessarily lose much work if your program writes a checkpoint file every six or eight hours.
  • Increase the estimated duration by 5% or 10%, just in case.
    • It's natural to leave a certain amount of room for error in the estimate, but otherwise it's in your interest for your estimate of the job's duration to be as accurate as possible.
  • Longer jobs, such as those with a duration exceeding 48 hours, should consider using checkpoints if the software permits this.
    • With a checkpoint, the program writes a snapshot of its state to a diskfile and the program can then be restarted from this diskfile, at that precise point in the calculation. In this way, even if there is a power outage or some other interruption of the compute node(s) being used by your job, you won't necessarily lose much work if your program writes a checkpoint file every six or eight hours.