NVTOP

From Alliance Doc
Jump to navigation Jump to search
This page is a translated version of the page NVTOP and the translation is 19% complete.
Other languages:

NVTOP stands for Neat Videocard TOP, a (h)top like task monitor for GPUs and accelerators. It can handle multiple GPUs and print information about them in a htop-familiar way.

Une image vaut mille mots NVTOP.png


Monitor GPUs usage

NVTOP can monitor single or multiple GPUs. It can show the GPU usage and its memory. One can also select a specific device from the menu (F2 -> GPU Select).

NVTOP is useful to monitor and verify that your job is using the GPU as effeciently as possible.

Monitor batch job

If you have submitted a non-interactive job and would like to see its current GPU usage.

1. From a login node, find the job id and select the one to monitor:

Question.png
[name@server ~]$ sq

2. Attach to the running job:

Question.png
[name@server ~]$ srun --pty --jobid JOBID nvtop

Monitor interactive job

1. Start your interactive job with minimal resources.

2. In a second terminal, connect to the login node, find the job id:

Question.png
[name@server ~]$ sq

3. Attach to the running job:

Question.png
[name@server ~]$ srun --pty --jobid JOBID nvtop

You'll be able to the usage in real time as you run your commands in the first terminal.

Monitor a GPU on a specific node

When running multi-nodes jobs, it can be useful to verify that one or all GPUs are effectively used.

1. From a login node, find the job id and identify the nodes names:

[name@server ~]$ sq
[name@server ~]$ srun --jobid JOBID -n1 -c1 scontrol show hostname

2. Attach to the running job on the specific node:

Question.png
[name@server ~]$ srun --pty --jobid JOBiD --nodelist NODENAME nvtop