cc_staff
782
edits
(add LD_LIBRARY_PATH command) |
(Disabling DCGM) |
||
Line 13: | Line 13: | ||
= Quickstart guide = | = Quickstart guide = | ||
On [[Béluga/en|Béluga]] and [[Narval/en|Narval]], the | |||
[https://developer.nvidia.com/dcgm NVIDIA Data Center GPU Manager (DCGM)] | |||
needs to be disabled, and this must be done while doing your job submission: | |||
[name@server ~]$ DISABLE_DCGM=1 salloc --gres=gpu:1 ... | |||
When your job starts, DCGM will eventually stop running in the following minute. | |||
For convenience, the following loop awaits until the monitoring service has stopped | |||
(that is as soon as <code>grep</code> returns nothing): | |||
[name@server ~]$ while [ ! -z "$(dcgmi -v | grep 'Hostengine build info:')" ]; do sleep 5; done | |||
== Environment modules == <!--T:3--> | == Environment modules == <!--T:3--> | ||
Before you start profiling with NVPROF, the appropriate [[Utiliser des modules/en|module]] needs to be loaded. | Before you start profiling with NVPROF, the appropriate [[Utiliser des modules/en|module]] needs to be loaded. |