cc_staff
3
edits
No edit summary |
m (added missing import statement "import torch.nn.functional as F") |
||
Line 914: | Line 914: | ||
import torch | import torch | ||
import torch.nn as nn | import torch.nn as nn | ||
import torch.nn.functional as F | |||
import torch.optim as optim | import torch.optim as optim | ||
Line 1,034: | Line 1,035: | ||
}} | }} | ||
<translate> | <translate> | ||
=== Model parallelism with multiple GPUs === <!--T:316--> | === Model parallelism with multiple GPUs === <!--T:316--> | ||
In cases where a model is too large to fit inside a [[PyTorch#PyTorch_with_a_single_GPU|single GPU]], you can split it into multiple parts and load each one onto a separate GPU. In the example below, we revisit the code example from previous sections to illustrate how this works: we will split a Convolutional Neural Network in two parts - the convolutional/pooling layers and the densely connected feedforward layers. This job will request 2 GPUs and each of the two parts of the model will be loaded on its own GPU. We will also add code to perform [https://pytorch.org/docs/stable/pipeline.html?highlight=pipeline pipeline parallelism] and minimize as much as possible the amount of time the second GPU sits idle waiting for the outputs of the first. To do this, we will create a separate <code>nn.Module</code> for each part of our model, create a sequence of modules by wrapping our model parts with <code>nn.Sequential</code>, then use <code>torch.distributed.pipeline.sync.Pipe</code> to break each input batch into chunks and feed them in parallel to all parts of our model. | In cases where a model is too large to fit inside a [[PyTorch#PyTorch_with_a_single_GPU|single GPU]], you can split it into multiple parts and load each one onto a separate GPU. In the example below, we revisit the code example from previous sections to illustrate how this works: we will split a Convolutional Neural Network in two parts - the convolutional/pooling layers and the densely connected feedforward layers. This job will request 2 GPUs and each of the two parts of the model will be loaded on its own GPU. We will also add code to perform [https://pytorch.org/docs/stable/pipeline.html?highlight=pipeline pipeline parallelism] and minimize as much as possible the amount of time the second GPU sits idle waiting for the outputs of the first. To do this, we will create a separate <code>nn.Module</code> for each part of our model, create a sequence of modules by wrapping our model parts with <code>nn.Sequential</code>, then use <code>torch.distributed.pipeline.sync.Pipe</code> to break each input batch into chunks and feed them in parallel to all parts of our model. |