Known issues: Difference between revisions
Jump to navigation
Jump to search
mNo edit summary |
mNo edit summary |
||
Line 5: | Line 5: | ||
== Shared issues == <!--T:2--> | == Shared issues == <!--T:2--> | ||
# The CC slurm configuration preferentially encourages whole-node jobs. Users should if possible request whole-nodes rather than per-core resources. ([[User:Pjmann|Patrick Mann]] ([[User talk:Pjmann|talk]]) 20:15, 17 July 2017 (UTC)) | # The CC slurm configuration preferentially encourages whole-node jobs. Users should if possible request whole-nodes rather than per-core resources. See [[Job_scheduling_policies#Whole_nodes_versus_cores;|Job Scheduling - Whole Node Scheduling]] ([[User:Pjmann|Patrick Mann]] ([[User talk:Pjmann|talk]]) 20:15, 17 July 2017 (UTC)) | ||
# Quotas on <code>/project</code> are all 1 TB. The Storage National team is working on a project/RAC based schema. Fortunately Lustre have announced group-based quotas but that will need installation. ([[User:Pjmann|Patrick Mann]] ([[User talk:Pjmann|talk]]) 20:12, 17 July 2017 (UTC)) | # Quotas on <code>/project</code> are all 1 TB. The Storage National team is working on a project/RAC based schema. Fortunately Lustre have announced group-based quotas but that will need installation. ([[User:Pjmann|Patrick Mann]] ([[User talk:Pjmann|talk]]) 20:12, 17 July 2017 (UTC)) | ||
# SLURM epilog does not fully clean up processes from ended jobs, especially if the job did not exit normally. ([[User:Gbnewby|Greg Newby]]) Fri Jul 14 19:32:48 UTC 2017) | # SLURM epilog does not fully clean up processes from ended jobs, especially if the job did not exit normally. ([[User:Gbnewby|Greg Newby]]) Fri Jul 14 19:32:48 UTC 2017) |
Revision as of 20:46, 17 July 2017
Intro[edit]
- Please report issues to support@computecanada.ca
[edit]
- The CC slurm configuration preferentially encourages whole-node jobs. Users should if possible request whole-nodes rather than per-core resources. See Job Scheduling - Whole Node Scheduling (Patrick Mann (talk) 20:15, 17 July 2017 (UTC))
- Quotas on
/project
are all 1 TB. The Storage National team is working on a project/RAC based schema. Fortunately Lustre have announced group-based quotas but that will need installation. (Patrick Mann (talk) 20:12, 17 July 2017 (UTC)) - SLURM epilog does not fully clean up processes from ended jobs, especially if the job did not exit normally. (Greg Newby) Fri Jul 14 19:32:48 UTC 2017)
- Email from graham and cedar is still undergoing configuration. Therefore email job notifications from Slurm are failing. (Patrick Mann (talk) 17:17, 26 June 2017 (UTC))
- Cedar email is working now (Patrick Mann (talk) 16:11, 6 July 2017 (UTC))
- Graham email is working
- The SLURM 'sinfo' command yields different resource-type detail on graham and cedar. (Greg Newby) 16:05, 23 June 2017 (UTC))
- Local scratch on compute nodes has inconsistent naming. Cedar has /local and Graham has /localscratch.
- The status page at http://status.computecanada.ca/ is not updated automatically yet, so does not necessarily show correct, current status.
- "Nearline" capabilities are not yet available (see https://docs.computecanada.ca/wiki/National_Data_Cyberinfrastructure for a brief description of the intended functionality)
- Update July 17: still not working. If you need your nearline RAC2017 quota then please ask CC support. (Patrick Mann (talk) 20:45, 17 July 2017 (UTC))
Cedar only[edit]
- Environment variables such as $SCRATCH and $PROJECT are not yet set, although the filesystem are available. (Greg Newby) 16:10, 21 June 2017 (UTC))
Graham only[edit]
- Graham scheduling is not properly running small jobs. Working on the problem. (Patrick Mann (talk) 20:14, 17 July 2017 (UTC))
- big memory nodes need to be added to the scheduler
- no network topology information in the scheduler