Known issues/en: Difference between revisions

Jump to navigation Jump to search
Updating to match new version of source page
(Updating to match new version of source page)
(Updating to match new version of source page)
Line 12: Line 12:
* Operations will occasionally time out with a message like "Socket timed out on send/recv operation" or "Unable to contact slurm controller (connect failure)". As a temporary workaround, attempt to resubmit your jobs/commands, they should go through in a few seconds. ([[User:Nathanw|Nathan Wielenga]]) 08:50, 18 July 2017 (MDT))
* Operations will occasionally time out with a message like "Socket timed out on send/recv operation" or "Unable to contact slurm controller (connect failure)". As a temporary workaround, attempt to resubmit your jobs/commands, they should go through in a few seconds. ([[User:Nathanw|Nathan Wielenga]]) 08:50, 18 July 2017 (MDT))
** Should be resolved after a VHD migration to a new backend for slurmctl. (NW)
** Should be resolved after a VHD migration to a new backend for slurmctl. (NW)
 
* The environment of the shell in which a job was submitted is exported to the job. This can lead to irreproducible results.
** Solution/workaround: Add the option <tt>#SBATCH --export=NONE</tt> to your job script.


== Quota and filesystem problems ==
== Quota and filesystem problems ==
Line 36: Line 37:
* Compute nodes cannot access Internet
* Compute nodes cannot access Internet
** Solution: Request exceptions to be made at support@computecanada.ca  Describe what you need to access and why.
** Solution: Request exceptions to be made at support@computecanada.ca  Describe what you need to access and why.
* Intel compiler does not work on compute nodes
** Solution/workaround: Compile your code on the login node.


= Other issues =
= Other issues =
38,760

edits

Navigation menu