cc_staff
50
edits
No edit summary |
No edit summary |
||
Line 9: | Line 9: | ||
* Quotas on <code>/project</code> are all 1 TB. The Storage National team is working on a project/RAC based schema. Fortunately Lustre have announced group-based quotas but that will need installation. ([[User:Pjmann|Patrick Mann]] 20:12, 17 July 2017 (UTC)) | * Quotas on <code>/project</code> are all 1 TB. The Storage National team is working on a project/RAC based schema. Fortunately Lustre have announced group-based quotas but that will need installation. ([[User:Pjmann|Patrick Mann]] 20:12, 17 July 2017 (UTC)) | ||
* SLURM epilog does not fully clean up processes from ended jobs, especially if the job did not exit normally. ([[User:Gbnewby|Greg Newby]]) Fri Jul 14 19:32:48 UTC 2017) | * SLURM epilog does not fully clean up processes from ended jobs, especially if the job did not exit normally. ([[User:Gbnewby|Greg Newby]]) Fri Jul 14 19:32:48 UTC 2017) | ||
* The status page at http://status.computecanada.ca/ is not updated automatically yet, so does not necessarily show correct, current status. | * The status page at http://status.computecanada.ca/ is not updated automatically yet, so does not necessarily show correct, current status. | ||
* "Nearline" capabilities are not yet available (see https://docs.computecanada.ca/wiki/National_Data_Cyberinfrastructure for a brief description of the intended functionality) | * "Nearline" capabilities are not yet available (see https://docs.computecanada.ca/wiki/National_Data_Cyberinfrastructure for a brief description of the intended functionality) | ||
Line 15: | Line 14: | ||
* Operations will occasionally time out with a message like "Socket timed out on send/recv operation" or "Unable to contact slurm controller (connect failure)". As a temporary workaround, attempt to resubmit your jobs/commands, they should go through in a few seconds. ([[User:Nathanw|Nathan Wielenga]]) 08:50, 18 July 2017 (MDT)) | * Operations will occasionally time out with a message like "Socket timed out on send/recv operation" or "Unable to contact slurm controller (connect failure)". As a temporary workaround, attempt to resubmit your jobs/commands, they should go through in a few seconds. ([[User:Nathanw|Nathan Wielenga]]) 08:50, 18 July 2017 (MDT)) | ||
** Should be resolved after a VHD migration to a new backend for slurmctl. (NW) | ** Should be resolved after a VHD migration to a new backend for slurmctl. (NW) | ||
== Cedar only == <!--T:3--> | == Cedar only == <!--T:3--> |