Frequently Asked Questions: Difference between revisions

Jump to navigation Jump to search
no edit summary
(Marked this version for translation)
No edit summary
Line 68: Line 68:


== Why are my jobs taking so long to start? == <!--T:20-->
== Why are my jobs taking so long to start? == <!--T:20-->
You can see the reason why your jobs are in the state <tt>PD</tt> (pending) int the column <tt>(REASON)</tt>, which will typically have the value <tt>Resources</tt> or <tt>Priority</tt>. In the former case, the cluster is simply very busy and you will have to be patient or perhaps consider where you might submit a job that asks for fewer resources (e.g. nodes, memory, time). In the latter case however, your job is waiting to start due to its lower priority. This is because you and other members of your research group have been over-consuming your just share of the cluster resources in the recent past, something you can track using the command <tt>sshare</tt> as explained in [[Job scheduling policies]]. The column <tt>LevelFS</tt> gives you information about your over- or under-consumption of cluster resources; when <tt>LevelFS</tt> is greater than unity you are consuming fewer resources than your just share, while if it is less than unity you are consuming more. The closer <tt>LevelFS</tt> becomes to zero, the more you are over-consuming resources and greater the degree to which your jobs will have their priority diminished. There is a memory effect to this calculation so the scheduler gradually forgets about any potential over- or under-consumption of resources from months past. Finally, note that this scheduler priority is unique to a specific cluster - your <tt>LevelFS</tt> on one cluster is independent of its value on another.   
You can see why your jobs are in the <tt>PD</tt> (pending) state by running the <tt>squeue -u <username></tt> command on the cluster.
The <tt>(REASON)</tt> column typically has the values <tt>Resources</tt> or <tt>Priority</tt>.
* <tt>Resources</tt>ː The cluster is simply very busy and you will have to be patient or perhaps consider if you can submit a job that asks for fewer resources (e.g. nodes, memory, time).
*  <tt>Priority</tt>ː Your job is waiting to start due to its lower priority. This is because you and other members of your research group have been over-consuming your fair share of the cluster resources in the recent past, something you can track using the command <tt>sshare</tt> as explained in [[Job scheduling policies]].  
 
The column <tt>LevelFS</tt> gives you information about your over- or under-consumption of cluster resources; when <tt>LevelFS</tt> is greater than one you are consuming fewer resources than your just share, while if it is less than one you are consuming more. The more you overconsume resources, the closer the value gets to zero and the more your tasks decrease in priority. There is a memory effect to this calculation so the scheduler gradually "forgets" about any potential over- or under-consumption of resources from months past. Finally, note that the value of <tt>LevelFS</tt> is unique to the specific cluster.   
</translate>
</translate>
rsnt_translations
56,426

edits

Navigation menu