38,760
edits
(Updating to match new version of source page) |
(Updating to match new version of source page) |
||
Line 12: | Line 12: | ||
* Operations will occasionally time out with a message like "Socket timed out on send/recv operation" or "Unable to contact slurm controller (connect failure)". As a temporary workaround, attempt to resubmit your jobs/commands, they should go through in a few seconds. ([[User:Nathanw|Nathan Wielenga]]) 08:50, 18 July 2017 (MDT)) | * Operations will occasionally time out with a message like "Socket timed out on send/recv operation" or "Unable to contact slurm controller (connect failure)". As a temporary workaround, attempt to resubmit your jobs/commands, they should go through in a few seconds. ([[User:Nathanw|Nathan Wielenga]]) 08:50, 18 July 2017 (MDT)) | ||
** Should be resolved after a VHD migration to a new backend for slurmctl. (NW) | ** Should be resolved after a VHD migration to a new backend for slurmctl. (NW) | ||
* The environment of the shell in which a job was submitted is exported to the job. This can lead to irreproducible results. | |||
** Solution/workaround: Add the option <tt>#SBATCH --export=NONE</tt> to your job script. | |||
== Quota and filesystem problems == | == Quota and filesystem problems == | ||
Line 36: | Line 37: | ||
* Compute nodes cannot access Internet | * Compute nodes cannot access Internet | ||
** Solution: Request exceptions to be made at support@computecanada.ca Describe what you need to access and why. | ** Solution: Request exceptions to be made at support@computecanada.ca Describe what you need to access and why. | ||
* Intel compiler does not work on compute nodes | |||
** Solution/workaround: Compile your code on the login node. | |||
= Other issues = | = Other issues = |