GNU Parallel: Difference between revisions

Number of jobs is per node
No edit summary
(Number of jobs is per node)
Line 93: Line 93:
You can also use GNU Parallel to distribute a workload across multiple nodes in a cluster, such as in the context of a job on a Compute Canada server. An example of this use is the following:
You can also use GNU Parallel to distribute a workload across multiple nodes in a cluster, such as in the context of a job on a Compute Canada server. An example of this use is the following:
{{Command
{{Command
|scontrol show hostname ${SLURM_JOB_NODELIST} > ./node_list_${SLURM_JOB_ID}
|scontrol show hostname > ./node_list_${SLURM_JOB_ID}
}}
}}
{{Command
{{Command
|parallel --jobs 32 --sshloginfile ./node_list_${SLURM_JOB_ID} --env MY_VARIABLE --workdir $PWD ./my_program
|parallel --jobs $SLURM_CPUS_PER_TASK --sshloginfile ./node_list_${SLURM_JOB_ID} --env MY_VARIABLE --workdir $PWD ./my_program
}}
}}
In this case, we suppose that each node has 32 CPU cores and we create a file containing the list of nodes from <tt>$SLURM_JOB_NODELIST</tt> (which is created automatically by the job scheduler), and we use this file to tell GNU Parallel which nodes to use for the distribution of tasks. The <tt>--env</tt> option allows us to transfer a named environment variable to all the nodes while the <tt>--workdir</tt> option ensures that the GNU Parallel tasks will start in the same directory as the main node.
In this case, we create a file containing the list of nodes, and we use this file to tell GNU Parallel which nodes to use for the distribution of tasks. The <tt>--env</tt> option allows us to transfer a named environment variable to all the nodes while the <tt>--workdir</tt> option ensures that the GNU Parallel tasks will start in the same directory as the main node.


==Keeping Track of Completed and Failed Commands, and Restart Capabilities== <!--T:11-->
==Keeping Track of Completed and Failed Commands, and Restart Capabilities== <!--T:11-->
cc_staff
782

edits