Known issues: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
mNo edit summary
Line 5: Line 5:


== Shared issues == <!--T:2-->
== Shared issues == <!--T:2-->
* SLURM epilog does not fully clean up processes from ended jobs, especially if the job did not exit normally.    ([[User:Gbnewby|Greg Newby]]) Fri Jul 14 19:32:48 UTC 2017)
# Quotas on <code>/project</code> are all 1 TB. The Storage National team is working on a project/RAC based schema. Fortunately Lustre have announced group-based quotas but that will need installation. ([[User:Pjmann|Patrick Mann]] ([[User talk:Pjmann|talk]]) 20:12, 17 July 2017 (UTC))
* Email from graham and cedar is still undergoing configuration. Therefore email job notifications from Slurm are failing. ([[User:Pjmann|Patrick Mann]] ([[User talk:Pjmann|talk]]) 17:17, 26 June 2017 (UTC))  
# SLURM epilog does not fully clean up processes from ended jobs, especially if the job did not exit normally.    ([[User:Gbnewby|Greg Newby]]) Fri Jul 14 19:32:48 UTC 2017)
** Cedar email is working now ([[User:Pjmann|Patrick Mann]] ([[User talk:Pjmann|talk]]) 16:11, 6 July 2017 (UTC))
# Email from graham and cedar is still undergoing configuration. Therefore email job notifications from Slurm are failing. ([[User:Pjmann|Patrick Mann]] ([[User talk:Pjmann|talk]]) 17:17, 26 June 2017 (UTC))  
** Graham email is working  
#* Cedar email is working now ([[User:Pjmann|Patrick Mann]] ([[User talk:Pjmann|talk]]) 16:11, 6 July 2017 (UTC))
* The SLURM 'sinfo' command yields different resource-type detail on graham and cedar.    ([[User:Gbnewby|Greg Newby]]) 16:05, 23 June 2017 (UTC))
#* Graham email is working  
* Local scratch on compute nodes has inconsistent naming.  Cedar has /local and Graham has /localscratch.
# The SLURM 'sinfo' command yields different resource-type detail on graham and cedar.    ([[User:Gbnewby|Greg Newby]]) 16:05, 23 June 2017 (UTC))
* The status page at http://status.computecanada.ca/ is not updated automatically yet, so does not necessarily show correct, current status.
# Local scratch on compute nodes has inconsistent naming.  Cedar has /local and Graham has /localscratch.
* "Nearline" capabilities are not yet available (see https://docs.computecanada.ca/wiki/National_Data_Cyberinfrastructure for a brief description of the intended functionality)
# The status page at http://status.computecanada.ca/ is not updated automatically yet, so does not necessarily show correct, current status.
# "Nearline" capabilities are not yet available (see https://docs.computecanada.ca/wiki/National_Data_Cyberinfrastructure for a brief description of the intended functionality)


== Cedar only == <!--T:3-->
== Cedar only == <!--T:3-->

Revision as of 20:12, 17 July 2017

Other languages:

Intro[edit]

Shared issues[edit]

  1. Quotas on /project are all 1 TB. The Storage National team is working on a project/RAC based schema. Fortunately Lustre have announced group-based quotas but that will need installation. (Patrick Mann (talk) 20:12, 17 July 2017 (UTC))
  2. SLURM epilog does not fully clean up processes from ended jobs, especially if the job did not exit normally. (Greg Newby) Fri Jul 14 19:32:48 UTC 2017)
  3. Email from graham and cedar is still undergoing configuration. Therefore email job notifications from Slurm are failing. (Patrick Mann (talk) 17:17, 26 June 2017 (UTC))
    • Cedar email is working now (Patrick Mann (talk) 16:11, 6 July 2017 (UTC))
    • Graham email is working
  4. The SLURM 'sinfo' command yields different resource-type detail on graham and cedar. (Greg Newby) 16:05, 23 June 2017 (UTC))
  5. Local scratch on compute nodes has inconsistent naming. Cedar has /local and Graham has /localscratch.
  6. The status page at http://status.computecanada.ca/ is not updated automatically yet, so does not necessarily show correct, current status.
  7. "Nearline" capabilities are not yet available (see https://docs.computecanada.ca/wiki/National_Data_Cyberinfrastructure for a brief description of the intended functionality)

Cedar only[edit]

  • Environment variables such as $SCRATCH and $PROJECT are not yet set, although the filesystem are available. (Greg Newby) 16:10, 21 June 2017 (UTC))

Graham only[edit]

  • big memory nodes need to be added to the scheduler
  • no network topology information in the scheduler

Other issues[edit]