NVTOP
NVTOP stands for Neat Videocard TOP, a (h)top like task monitor for GPUs and accelerators. It can handle multiple GPUs and print information about them in a htop-familiar way.
Monitor GPUs usage
NVTOP can monitor single or multiple GPUs. It can show the GPU usage and its memory. One can also select a specific device from the menu (F2 -> GPU Select).
NVTOP is useful to monitor and verify that your job is using the GPU as effeciently as possible.
Monitor batch job
If you have submitted a non-interactive job and would like to see its current GPU usage.
Monitor interactive job
1. Start your interactive job with minimal resources.
You'll be able to the usage in real time as you run your commands in the first terminal.
Monitor a GPU on a specific node
When running multi-nodes jobs, it can be useful to verify that one or all GPUs are effectively used.
1. From a login node, find the job id and identify the nodes names:
[name@server ~]$ sq
[name@server ~]$ srun --jobid JOBID -n1 -c1 scontrol show hostname