Known issues/en: Difference between revisions

Updating to match new version of source page
(Updating to match new version of source page)
(Updating to match new version of source page)
Line 7: Line 7:
* Utilization accounting is not currently being forwarded to the [https://ccdb.computecanada.ca Compute Canada database].
* Utilization accounting is not currently being forwarded to the [https://ccdb.computecanada.ca Compute Canada database].


== Scheduler problems ==
== Scheduler issues ==
* The CC Slurm configuration encourages whole-node jobs. When appropriate, users should request whole-node rather than per-core resources. See [[Job_scheduling_policies#Whole_nodes_versus_cores;|Job Scheduling - Whole Node Scheduling]].
* The CC Slurm configuration encourages whole-node jobs. When appropriate, users should request whole-node rather than per-core resources. See [[Job_scheduling_policies#Whole_nodes_versus_cores;|Job Scheduling - Whole Node Scheduling]].
* CPU and GPU backfill partitions have been created on the Cedar and Graham clusters. If a job is submitted with <24hr runtime, it will be automatically entered into the cluster-wide backfill partition. This partition has low priority, but will allow increased utilization of the cluster by serial jobs.
* Slurm epilog does not fully clean up processes from ended jobs, especially if the job did not exit normally.   
* Slurm epilog does not fully clean up processes from ended jobs, especially if the job did not exit normally.   
** This has been greatly improved after the addition of the epilog.clean script, but there are still nodes occasionally marked down for epilog failure.  
** This has been greatly improved after the addition of the epilog.clean script, but there are still nodes occasionally marked down for epilog failure.  
* By default, the job receives environment settings from the submitting shell. This can lead to irreproducible results if it's not what you expect. To force the job to run with a fresh-like login environment, you can submit with <tt>--export=none</tt> or add <tt>#SBATCH --export=NONE</tt> to your job script.
* By default, the job receives environment settings from the submitting shell. This can lead to irreproducible results if it's not what you expect. To force the job to run with a fresh-like login environment, you can submit with <tt>--export=none</tt> or add <tt>#SBATCH --export=NONE</tt> to your job script.


== Quota and filesystem problems ==
== Quota and filesystem problems ==
Line 33: Line 31:


= Cedar only =
= Cedar only =
* Some I/O operations on /project yield the error, "Cannot send after transport endpoint shutdown". We are working with the vendor to resolve this.
* <del>Slurm operations will occasionally time out with a message such as ''Socket timed out on send/recv operation'' or ''Unable to contact slurm controller (connect failure)''. As a temporary workaround, attempt to resubmit your jobs/commands; they should go through in a few seconds. </del>
** Should be resolved after a VHD migration to a new backend for slurmctl.
*Some people are getting the message ''error: Job submit/allocate failed: Invalid account or account/partition combination specified''.
** For users with only one account, this should be resolved in the latest version of the job submission script (Nathan Wielenga 2017-08-18).
** If you still see this, specify <tt>--account=<accounting group></tt> in you job submission. See [https://docs.computecanada.ca/wiki/Running_jobs#Accounts_and_projects https://docs.computecanada.ca/wiki/Running_jobs]


= Graham only =
= Graham only =
Line 47: Line 39:
** Solution: Request exceptions to be made at  [mailto:support@computecanada.ca CC support] describing what you need to access and why.
** Solution: Request exceptions to be made at  [mailto:support@computecanada.ca CC support] describing what you need to access and why.


* Crontab is not offered on Graham. When attempting the addition of a new item, there is an error during saving:
* Crontab is not offered on Graham.
<pre>
[rozmanov@gra-login1 ~]$ crontab -e
no crontab for rozmanov - using an empty one
crontab: installing new crontab
/var/spool/cron/#tmp.gra-login1.XXXXKsp8LU: Read-only file system
crontab: edits left in /tmp/crontab.u0ljzU
</pre>
As crontab works on Cedar, there must be some kind of common approach for CC systems.
Clearly, the main issue is how to handle user's crontabs on multiple login nodes; it is not clear however if this should be implemented.


= Other issues =
= Other issues =
38,760

edits