Known issues/en: Difference between revisions

Known issues/en (view source)

Revision as of 13:24, 7 September 2017

1,855 bytes removed , 7 years ago

Updating to match new version of source page

FuzzyBot

Bots

38,760

edits

@@ Line 7: / Line 7: @@
 * Utilization accounting is not currently being forwarded to the [https://ccdb.computecanada.ca Compute Canada database].
-== Scheduler problems ==
+== Scheduler issues ==
 * The CC Slurm configuration encourages whole-node jobs. When appropriate, users should request whole-node rather than per-core resources. See [[Job_scheduling_policies#Whole_nodes_versus_cores;|Job Scheduling - Whole Node Scheduling]].
-* CPU and GPU backfill partitions have been created on the Cedar and Graham clusters. If a job is submitted with <24hr runtime, it will be automatically entered into the cluster-wide backfill partition. This partition has low priority, but will allow increased utilization of the cluster by serial jobs.
 * Slurm epilog does not fully clean up processes from ended jobs, especially if the job did not exit normally.
 ** This has been greatly improved after the addition of the epilog.clean script, but there are still nodes occasionally marked down for epilog failure.
 * By default, the job receives environment settings from the submitting shell. This can lead to irreproducible results if it's not what you expect. To force the job to run with a fresh-like login environment, you can submit with <tt>--export=none</tt> or add <tt>#SBATCH --export=NONE</tt> to your job script.
 == Quota and filesystem problems ==
@@ Line 33: / Line 31: @@
 = Cedar only =
-* Some I/O operations on /project yield the error, "Cannot send after transport endpoint shutdown". We are working with the vendor to resolve this.
-* <del>Slurm operations will occasionally time out with a message such as ''Socket timed out on send/recv operation'' or ''Unable to contact slurm controller (connect failure)''. As a temporary workaround, attempt to resubmit your jobs/commands; they should go through in a few seconds. </del>
-** Should be resolved after a VHD migration to a new backend for slurmctl.
-*Some people are getting the message ''error: Job submit/allocate failed: Invalid account or account/partition combination specified''.
-** For users with only one account, this should be resolved in the latest version of the job submission script (Nathan Wielenga 2017-08-18).
-** If you still see this, specify <tt>--account=<accounting group></tt> in you job submission. See [https://docs.computecanada.ca/wiki/Running_jobs#Accounts_and_projects https://docs.computecanada.ca/wiki/Running_jobs]
 = Graham only =
@@ Line 47: / Line 39: @@
 ** Solution: Request exceptions to be made at  [mailto:support@computecanada.ca CC support] describing what you need to access and why.
-* Crontab is not offered on Graham. When attempting the addition of a new item, there is an error during saving:
+* Crontab is not offered on Graham.
-<pre>
-[rozmanov@gra-login1 ~]$ crontab -e
-no crontab for rozmanov - using an empty one
-crontab: installing new crontab
-/var/spool/cron/#tmp.gra-login1.XXXXKsp8LU: Read-only file system
-crontab: edits left in /tmp/crontab.u0ljzU
-</pre>
-As crontab works on Cedar, there must be some kind of common approach for CC systems.
-Clearly, the main issue is how to handle user's crontabs on multiple login nodes; it is not clear however if this should be implemented.
 = Other issues =