Talk:Running jobs: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
(Created page with "This [https://computing.llnl.gov/tutorials/slurm/slurm.pdf slide deck] from LLNL is old (2004), covers some architectural aspects probably not of interest to most users, and h...")
 
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
==What's a job?==
The requirement for <code>--account=project-name-cpu</code> is still under discussion and may change before release.
:[[User:Rdickson|Ross Dickson]] ([[User talk:Rdickson|talk]]) 13:55, 3 April 2017 (UTC)
==Memory Management==
* According to Kamil it is advisable for users to express their job in 1000s on MB instead of GB (i.e. 1000MB ~ 1GB). This will leave some RAM for the OS when reaching the memory limit on core nodes. e.g. <code>--ntasks=32 --mem-per-cpu=4000M</code> will fit on a base node with 128G of RAM, while <code>--ntasks=32 --mem-per-cpu=4G</code> requires a Large-memory node.
* At least on Graham, one needs to use <code>--mem=xxxG</code> to request a large-memory node and with using <code>--mem-per-cpu=yyyG</code> one can only ever get base-nodes.
:[[User:Stuekero|Oliver Stueker]] ([[User talk:Stuekero|talk]]) 19:25, 11 July 2017 (UTC)
==External links==
This [https://computing.llnl.gov/tutorials/slurm/slurm.pdf slide deck] from LLNL is old (2004), covers some architectural aspects probably not of interest to most users, and has some LLNL-local information that will not apply at CC (e.g. FIFO scheduling). Not recommended.
This [https://computing.llnl.gov/tutorials/slurm/slurm.pdf slide deck] from LLNL is old (2004), covers some architectural aspects probably not of interest to most users, and has some LLNL-local information that will not apply at CC (e.g. FIFO scheduling). Not recommended.
:[[User:Rdickson|Ross Dickson]] ([[User talk:Rdickson|talk]]) 15:36, 23 January 2017 (UTC)
:[[User:Rdickson|Ross Dickson]] ([[User talk:Rdickson|talk]]) 15:36, 23 January 2017 (UTC)
All the SchedMD videos I've looked at so far are just slide talks, mostly directed at administrators rather than users. Job submission commands come up in [https://www.youtube.com/watch?v=MI9jHavOt5o Introduction to SLURM, Part 3].
:[[User:Rdickson|Ross Dickson]] ([[User talk:Rdickson|talk]]) 14:35, 27 January 2017 (UTC)
Doug P also suggests [https://sites.google.com/a/case.edu/hpc-upgraded-cluster/slurm-cluster-commands https://sites.google.com/a/case.edu/hpc-upgraded-cluster/slurm-cluster-commands] from CWRU, specifically on moving from Torque to SLURM.
:[[User:Rdickson|Ross Dickson]] ([[User talk:Rdickson|talk]]) 13:07, 7 March 2017 (UTC)
== Discussion about the ordering of example job scripts ==
Would it make sense to put the array job information first in the list examples? It's likely one of the first techniques a novice HPC user would use to break up an embarrassing parallel computation where you might just break up the input data. It is not common in Bioinformatics and Genomics software stack to find any examples of MPI and even threaded applications are less common. If this segment of our user base is not that high, then the current order makes the most sense.

Latest revision as of 13:09, 10 May 2018

What's a job?[edit]

The requirement for --account=project-name-cpu is still under discussion and may change before release.

Ross Dickson (talk) 13:55, 3 April 2017 (UTC)

Memory Management[edit]

  • According to Kamil it is advisable for users to express their job in 1000s on MB instead of GB (i.e. 1000MB ~ 1GB). This will leave some RAM for the OS when reaching the memory limit on core nodes. e.g. --ntasks=32 --mem-per-cpu=4000M will fit on a base node with 128G of RAM, while --ntasks=32 --mem-per-cpu=4G requires a Large-memory node.
  • At least on Graham, one needs to use --mem=xxxG to request a large-memory node and with using --mem-per-cpu=yyyG one can only ever get base-nodes.
Oliver Stueker (talk) 19:25, 11 July 2017 (UTC)

External links[edit]

This slide deck from LLNL is old (2004), covers some architectural aspects probably not of interest to most users, and has some LLNL-local information that will not apply at CC (e.g. FIFO scheduling). Not recommended.

Ross Dickson (talk) 15:36, 23 January 2017 (UTC)

All the SchedMD videos I've looked at so far are just slide talks, mostly directed at administrators rather than users. Job submission commands come up in Introduction to SLURM, Part 3.

Ross Dickson (talk) 14:35, 27 January 2017 (UTC)

Doug P also suggests https://sites.google.com/a/case.edu/hpc-upgraded-cluster/slurm-cluster-commands from CWRU, specifically on moving from Torque to SLURM.

Ross Dickson (talk) 13:07, 7 March 2017 (UTC)

Discussion about the ordering of example job scripts[edit]

Would it make sense to put the array job information first in the list examples? It's likely one of the first techniques a novice HPC user would use to break up an embarrassing parallel computation where you might just break up the input data. It is not common in Bioinformatics and Genomics software stack to find any examples of MPI and even threaded applications are less common. If this segment of our user base is not that high, then the current order makes the most sense.