Known issues: Difference between revisions

Known issues (view source)

Revision as of 15:02, 4 August 2017

92 bytes added , 7 years ago

m

no edit summary

Hahn

cc_staff

176

edits

@@ Line 8: / Line 8: @@
 == Scheduler errors == <!--T:6-->
-* The CC slurm configuration preferentially encourages whole-node jobs. Users should if possible request whole-nodes rather than per-core resources. See [[Job_scheduling_policies#Whole_nodes_versus_cores;|Job Scheduling - Whole Node Scheduling]] ([[User:Pjmann|Patrick Mann]] 20:15, 17 July 2017 (UTC))
+* The CC slurm configuration preferentially encourages whole-node jobs. Users should, if appropriate, request whole-node rather than per-core resources. See [[Job_scheduling_policies#Whole_nodes_versus_cores;|Job Scheduling - Whole Node Scheduling]] ([[User:Pjmann|Patrick Mann]] 20:15, 17 July 2017 (UTC))
 ** Cpu and Gpu backfill partitions have been created on both clusters. If a job is submitted with <24hr runtime, it will be automatically entered into the cluster-wide backfill partition. This partition has a low priority, but will allow increased utilization of the cluster by serial jobs. ([[User:Nathanw|Nathan Wielenga]])
 * SLURM epilog does not fully clean up processes from ended jobs, especially if the job did not exit normally.    ([[User:Gbnewby|Greg Newby]]) Fri Jul 14 19:32:48 UTC 2017)
@@ Line 14: / Line 14: @@
 * Operations will occasionally time out with a message like "Socket timed out on send/recv operation" or "Unable to contact slurm controller (connect failure)". As a temporary workaround, attempt to resubmit your jobs/commands, they should go through in a few seconds. ([[User:Nathanw|Nathan Wielenga]]) 08:50, 18 July 2017 (MDT))
 ** Should be resolved after a VHD migration to a new backend for slurmctl. (NW)
-* The environment of the shell in which a job was submitted is exported to the job. This can lead to irreproducible results.
+* By default, the job receives environment settings from the submitting shell. This can lead to irreproducible results if it's not what you expect.
-** Solution/workaround: Add the option <tt>#SBATCH --export=NONE</tt> to your job script.
+ To force the job to run with a fresh-like-login environment, you can submit with --export=none or add <tt>#SBATCH --export=NONE</tt> to your job script.
 == Quota and filesystem problems == <!--T:7-->