Translations:PyTorch/316/en
Model parallelism with multiple GPUs
In cases where a model is too large to fit inside a single GPU, you can split it into multiple parts and load each one onto a separate GPU. In the example below, we revisit the code example from previous sections to illustrate how this works: we will split a Convolutional Neural Network in two parts - the convolutional/pooling layers and the densely connected feedforward layers. This job will request 2 GPUs and each of the two parts of the model will be loaded on its own GPU. We will also add code to perform pipeline parallelism and minimize as much as possible the amount of time the second GPU sits idle waiting for the outputs of the first. To do this, we will create a separate nn.Module
for each part of our model, create a sequence of modules by wrapping our model parts with nn.Sequential
, then use torch.distributed.pipeline.sync.Pipe
to break each input batch into chunks and feed them in parallel to all parts of our model.