40,947
edits
(Created page with "Hyper-Q / MPS") |
(Updating to match new version of source page) |
||
Line 14: | Line 14: | ||
<div lang="en" dir="ltr" class="mw-content-ltr"> | <div lang="en" dir="ltr" class="mw-content-ltr"> | ||
[https://docs.nvidia.com/deploy/mps/index.html According to NVIDIA], | [https://docs.nvidia.com/deploy/mps/index.html According to NVIDIA], | ||
< | ::<i>The MPS runtime architecture is designed to transparently enable co-operative multi-process CUDA applications, typically MPI jobs, to utilize Hyper-Q capabilities on the latest NVIDIA (Kepler and later) GPUs. Hyper-Q allows CUDA kernels to be processed concurrently on the same GPU; this can benefit performance when the GPU compute capacity is underutilized by a single application process.</i> | ||
The MPS runtime architecture is designed to transparently enable co-operative multi-process CUDA applications, typically MPI jobs, to utilize Hyper-Q capabilities on the latest NVIDIA (Kepler and later) GPUs. Hyper-Q allows CUDA kernels to be processed concurrently on the same GPU; this can benefit performance when the GPU compute capacity is underutilized by a single application process. | |||
</ | |||
</div> | </div> | ||
<div lang="en" dir="ltr" class="mw-content-ltr"> | <div lang="en" dir="ltr" class="mw-content-ltr"> | ||
Line 34: | Line 33: | ||
<div lang="en" dir="ltr" class="mw-content-ltr"> | <div lang="en" dir="ltr" class="mw-content-ltr"> | ||
Then you can use the MPS feature if you have more than one CPU thread accessing the GPU. This will happen if you run a hybrid MPI/CUDA application, a hybrid OpenMP/CUDA application, or multiple instances of a serial CUDA application ( | Then you can use the MPS feature if you have more than one CPU thread accessing the GPU. This will happen if you run a hybrid MPI/CUDA application, a hybrid OpenMP/CUDA application, or multiple instances of a serial CUDA application (<i>GPU farming</i>). | ||
</div> | </div> | ||
Line 74: | Line 73: | ||
<div lang="en" dir="ltr" class="mw-content-ltr"> | <div lang="en" dir="ltr" class="mw-content-ltr"> | ||
In the above example, we share a single V100 GPU between 8 instances of | In the above example, we share a single V100 GPU between 8 instances of <code>my_code</code> (which takes a single argument-- the loop index $i). We request 8 CPU cores (#SBATCH -c 8) so there is one CPU core per application instance. The two important elements are <code>&</code> on the code execution line, which sends the code processes to the background, and the <code>wait</code> command at the end of the script, which ensures that the job runs until all background processes end. | ||
</div> | </div> | ||