cc_staff
28
edits
(Detailed backfill paritition.) |
No edit summary |
||
Line 6: | Line 6: | ||
== Shared issues == <!--T:2--> | == Shared issues == <!--T:2--> | ||
# The CC slurm configuration preferentially encourages whole-node jobs. Users should if possible request whole-nodes rather than per-core resources. See [[Job_scheduling_policies#Whole_nodes_versus_cores;|Job Scheduling - Whole Node Scheduling]] ([[User:Pjmann|Patrick Mann]] ([[User talk:Pjmann|talk]]) 20:15, 17 July 2017 (UTC)) | # The CC slurm configuration preferentially encourages whole-node jobs. Users should if possible request whole-nodes rather than per-core resources. See [[Job_scheduling_policies#Whole_nodes_versus_cores;|Job Scheduling - Whole Node Scheduling]] ([[User:Pjmann|Patrick Mann]] ([[User talk:Pjmann|talk]]) 20:15, 17 July 2017 (UTC)) | ||
# | #* Cpu and Gpu backfill partitions have been created on both clusters. If a job is submitted with <24hr runtime, it will be automatically entered into the cluster-wide backfill partition. This partition has a low priority, but will allow increased utilization of the cluster by serial jobs. ([[User:Nathanw|Nathan Wielenga]]) | ||
# Quotas on <code>/project</code> are all 1 TB. The Storage National team is working on a project/RAC based schema. Fortunately Lustre have announced group-based quotas but that will need installation. ([[User:Pjmann|Patrick Mann]] ([[User talk:Pjmann|talk]]) 20:12, 17 July 2017 (UTC)) | # Quotas on <code>/project</code> are all 1 TB. The Storage National team is working on a project/RAC based schema. Fortunately Lustre have announced group-based quotas but that will need installation. ([[User:Pjmann|Patrick Mann]] ([[User talk:Pjmann|talk]]) 20:12, 17 July 2017 (UTC)) | ||
# SLURM epilog does not fully clean up processes from ended jobs, especially if the job did not exit normally. ([[User:Gbnewby|Greg Newby]]) Fri Jul 14 19:32:48 UTC 2017) | # SLURM epilog does not fully clean up processes from ended jobs, especially if the job did not exit normally. ([[User:Gbnewby|Greg Newby]]) Fri Jul 14 19:32:48 UTC 2017) | ||
Line 17: | Line 17: | ||
#* Update July 17: still not working. If you need your nearline RAC2017 quota then please ask [mailto:support@computecanada.ca CC support]. ([[User:Pjmann|Patrick Mann]] ([[User talk:Pjmann|talk]]) 20:45, 17 July 2017 (UTC)) | #* Update July 17: still not working. If you need your nearline RAC2017 quota then please ask [mailto:support@computecanada.ca CC support]. ([[User:Pjmann|Patrick Mann]] ([[User talk:Pjmann|talk]]) 20:45, 17 July 2017 (UTC)) | ||
# Operations will occasionally time out with a message like "Socket timed out on send/recv operation" or "Unable to contact slurm controller (connect failure)". As a temporary workaround, attempt to resubmit your jobs/commands, they should go through in a few seconds. ([[User:Nathanw|Nathan Wielenga]]) 08:50, 18 July 2017 (MDT)) | # Operations will occasionally time out with a message like "Socket timed out on send/recv operation" or "Unable to contact slurm controller (connect failure)". As a temporary workaround, attempt to resubmit your jobs/commands, they should go through in a few seconds. ([[User:Nathanw|Nathan Wielenga]]) 08:50, 18 July 2017 (MDT)) | ||
#* Should be resolved after a VHD migration to a new backend for slurmctl. (NW) | |||
# Auto-creation of project directories such as /project/$USER was an interim solution. Soon there will be /project/gid where gid is the project group identifier. This will be symlinked to /project/projects/pname where pname is the "friendly" project (RAPI) name). And then, /project/gid/$USER can be where user subdirectories for that project will live. Note that quotas in /project are project-based, not user-based. ([[User:Gbnewby|Greg Newby]]) Thu Jul 20 00:45:00 UTC 2017) | # Auto-creation of project directories such as /project/$USER was an interim solution. Soon there will be /project/gid where gid is the project group identifier. This will be symlinked to /project/projects/pname where pname is the "friendly" project (RAPI) name). And then, /project/gid/$USER can be where user subdirectories for that project will live. Note that quotas in /project are project-based, not user-based. ([[User:Gbnewby|Greg Newby]]) Thu Jul 20 00:45:00 UTC 2017) | ||