Translations:Running jobs/75/en
When a computation is going to require a long time to complete, so long that it cannot be done within the time limits on the system, the application you are running must support checkpointing. The application should be able to save its state to a file, called a >i>checkpoint file, and then it should be able to restart and continue the computation from that saved state.