Bureaucrats, cc_docs_admin, cc_staff
2,306
edits
No edit summary |
No edit summary |
||
Line 18: | Line 18: | ||
=== DMTCP === | === DMTCP === | ||
The software [http://dmtcp.sourceforge.net/ DMTCP] (Distributed Multithreaded CheckPointing) allows you to checkpoint applications without having to recompile them. In order to use it, you first need to load the DMTCP module. The initial execution of the application software is done using the command <tt>dmtcp_launch</tt> where you can specify the amount of time between checkpoints. The restart functionality can be used by executing the script <tt>dmtcp_restart_script.sh</tt>. By default this script and the checkpoint files are written in the directory where the program was started but you can change this by using the option <tt>--ckptdir <checkpoint directory></tt>. You can also use the command <tt>dmtcp_launch --help</tt> to get more information on all the options. Note that for the moment the DMTCP software cannot be used to checkpoint applications parallelized using MPI. | |||
An example of a job script: | An example of a job script: |