Talk:Running jobs

What's a job?

The requirement for --account=project-name-cpu is still under discussion and may change before release.

Ross Dickson (talk) 13:55, 3 April 2017 (UTC)

Memory Management

According to Kamil it is advisable for users to express their job in 1000s on MB instead of GB (i.e. 1000MB ~ 1GB). This will leave some RAM for the OS when reaching the memory limit on core nodes. e.g. --ntasks=32 --mem-per-cpu=4000M will fit on a base node with 128G of RAM, while --ntasks=32 --mem-per-cpu=4G requires a Large-memory node.
At least on Graham, one needs to use --mem=xxxG to request a large-memory node and with using --mem-per-cpu=yyyG one can only ever get base-nodes.

Oliver Stueker (talk) 19:25, 11 July 2017 (UTC)

External links

This slide deck from LLNL is old (2004), covers some architectural aspects probably not of interest to most users, and has some LLNL-local information that will not apply at CC (e.g. FIFO scheduling). Not recommended.

Ross Dickson (talk) 15:36, 23 January 2017 (UTC)

All the SchedMD videos I've looked at so far are just slide talks, mostly directed at administrators rather than users. Job submission commands come up in Introduction to SLURM, Part 3.

Ross Dickson (talk) 14:35, 27 January 2017 (UTC)

Doug P also suggests https://sites.google.com/a/case.edu/hpc-upgraded-cluster/slurm-cluster-commands from CWRU, specifically on moving from Torque to SLURM.

Ross Dickson (talk) 13:07, 7 March 2017 (UTC)