cc_staff
782
edits
No edit summary |
(Number of jobs is per node) |
||
Line 93: | Line 93: | ||
You can also use GNU Parallel to distribute a workload across multiple nodes in a cluster, such as in the context of a job on a Compute Canada server. An example of this use is the following: | You can also use GNU Parallel to distribute a workload across multiple nodes in a cluster, such as in the context of a job on a Compute Canada server. An example of this use is the following: | ||
{{Command | {{Command | ||
|scontrol show hostname | |scontrol show hostname > ./node_list_${SLURM_JOB_ID} | ||
}} | }} | ||
{{Command | {{Command | ||
|parallel --jobs | |parallel --jobs $SLURM_CPUS_PER_TASK --sshloginfile ./node_list_${SLURM_JOB_ID} --env MY_VARIABLE --workdir $PWD ./my_program | ||
}} | }} | ||
In this case, | In this case, we create a file containing the list of nodes, and we use this file to tell GNU Parallel which nodes to use for the distribution of tasks. The <tt>--env</tt> option allows us to transfer a named environment variable to all the nodes while the <tt>--workdir</tt> option ensures that the GNU Parallel tasks will start in the same directory as the main node. | ||
==Keeping Track of Completed and Failed Commands, and Restart Capabilities== <!--T:11--> | ==Keeping Track of Completed and Failed Commands, and Restart Capabilities== <!--T:11--> |