PyTorch: Difference between revisions
(Blanked the page) Tag: Blanking |
Tag: Undo |
||
Line 1: | Line 1: | ||
<languages /> | |||
[[Category:Software]] | |||
<translate> | |||
<!--T:14--> | |||
[http://pytorch.org/ PyTorch] is a Python package that provides two high-level features: | |||
* Tensor computation (like NumPy) with strong GPU acceleration | |||
* Deep neural networks built on a tape-based autograd system | |||
<!--T:19--> | |||
PyTorch has a distant connection with [[Torch]], but for all practical purposes you can treat them as separate packages. | |||
= Installation = <!--T:1--> | |||
==Latest available wheels== <!--T:20--> | |||
To see the latest version of PyTorch that we have built: | |||
{{Command|avail_wheels "torch*"}} | |||
For more information on listing wheels, see [[Python#Listing_available_wheels | listing available wheels]]. | |||
==Installing Compute Canada wheel== <!--T:15--> | |||
<!--T:25--> | |||
The preferred option is to install it using the Python [https://pythonwheels.com/ wheel] as follows: | |||
:1. Load a Python [[Utiliser_des_modules/en#Sub-command_load|module]], either <tt>python/2.7</tt>, <tt>python/3.5</tt>, <tt>python/3.6</tt> or <tt>python/3.7</tt> | |||
:2. Create and start a [[Python#Creating_and_using_a_virtual_environment|virtual environment]]. | |||
:3. Install PyTorch in the virtual environment with <code>pip install</code>. | |||
==== GPU and CPU ==== <!--T:18--> | |||
:{{Command|prompt=(venv) [name@server ~]|pip install torch --no-index}} | |||
====Extra==== <!--T:21--> | |||
In addition to <tt>torch</tt>, you can install <tt>torchvision</tt>, <tt>torchtext</tt> and <tt>torchaudio</tt>: | |||
{{Command|prompt=(venv) [name@server ~]|pip install torch torchvision torchtext torchaudio --no-index}} | |||
====libtorch==== <!--T:24--> | |||
<tt>libtorch.so</tt> is included in the wheel. Once Pytorch is installed in a virtual environment, you can find it at: <tt>$VIRTUAL_ENV/lib/python3.6/site-packages/torch/lib/libtorch.so</tt>. | |||
= Job submission = <!--T:10--> | |||
<!--T:11--> | |||
Here is an example of a job submission script using the python wheel, with a virtual environment inside a job: | |||
{{File | |||
|name=pytorch-test.sh | |||
|lang="bash" | |||
|contents= | |||
#!/bin/bash | |||
#SBATCH --gres=gpu:1 # Request GPU "generic resources" | |||
#SBATCH --cpus-per-task=6 # Cores proportional to GPUs: 6 on Cedar, 16 on Graham. | |||
#SBATCH --mem=32000M # Memory proportional to GPUs: 32000 Cedar, 64000 Graham. | |||
#SBATCH --time=0-03:00 | |||
#SBATCH --output=%N-%j.out | |||
<!--T:27--> | |||
module load python/3.6 | |||
virtualenv --no-download $SLURM_TMPDIR/env | |||
source $SLURM_TMPDIR/env/bin/activate | |||
pip install torch --no-index | |||
<!--T:28--> | |||
python pytorch-test.py | |||
}} | |||
<!--T:29--> | |||
The Python script <code>pytorch-test.py</code> has the form | |||
{{File | |||
|name=pytorch-test.py | |||
|lang="python" | |||
|contents= | |||
import torch | |||
x = torch.Tensor(5, 3) | |||
print(x) | |||
y = torch.rand(5, 3) | |||
print(y) | |||
# let us run the following only if CUDA is available | |||
if torch.cuda.is_available(): | |||
x = x.cuda() | |||
y = y.cuda() | |||
print(x + y) | |||
}} | |||
<!--T:31--> | |||
You can then submit a PyTorch job with: | |||
{{Command|sbatch pytorch-test.sh}} | |||
</translate> | |||
= Benchmarks = <!--T:32--> | |||
<!--T:33--> | |||
This section gives ResNet-18 benchmark results on different clusters with various configurations. | |||
<!--T:34--> | |||
All numbers are images per second '''per GPU''', using <code>DistributedDataParallel</code> and NCCL. | |||
<!--T:35--> | |||
'''These results are provisional and there is a lot of variance in their measurement. Work is being done to get a clearer picture.''' | |||
<!--T:36--> | |||
{| class="wikitable" | |||
|+ Graham[P100], images per second per GPU | |||
|- | |||
! Batch Size !! 1 Node, 1 GPU (baseline) !! 1 Node, 2 GPUs !! 2 * (1 Node, 2 GPUs) !! 3 * (1 Node, 2 GPUs) | |||
|- | |||
| 32 || 542 || 134 || 103 || 82 | |||
|- | |||
| 64 || 620 || 190 || 149 || 134 | |||
|- | |||
| 128 || 646 || 241 || 197 || 180 | |||
|- | |||
| 256 || 587 || 263 || 184 || 368 | |||
|} | |||
<translate> | |||
= Troubleshooting = <!--T:23--> | |||
== Memory leak == <!--T:30--> | |||
On AVX512 hardware (Béluga, Skylake or V100 nodes), older versions of Pytorch (less than v1.0.1) using older libraries (cuDNN < v7.5 or MAGMA < v2.5) may considerably leak memory resulting in an out-of-memory exception and death of your tasks. Please upgrade to the latest <tt>torch</tt> version. | |||
</translate> |
Revision as of 16:14, 2 July 2019
PyTorch is a Python package that provides two high-level features:
- Tensor computation (like NumPy) with strong GPU acceleration
- Deep neural networks built on a tape-based autograd system
PyTorch has a distant connection with Torch, but for all practical purposes you can treat them as separate packages.
Installation
Latest available wheels
To see the latest version of PyTorch that we have built:
[name@server ~]$ avail_wheels "torch*"
For more information on listing wheels, see listing available wheels.
Installing Compute Canada wheel
The preferred option is to install it using the Python wheel as follows:
- 1. Load a Python module, either python/2.7, python/3.5, python/3.6 or python/3.7
- 2. Create and start a virtual environment.
- 3. Install PyTorch in the virtual environment with
pip install
.
GPU and CPU
-
(venv) [name@server ~] pip install torch --no-index
Extra
In addition to torch, you can install torchvision, torchtext and torchaudio:
(venv) [name@server ~] pip install torch torchvision torchtext torchaudio --no-index
libtorch
libtorch.so is included in the wheel. Once Pytorch is installed in a virtual environment, you can find it at: $VIRTUAL_ENV/lib/python3.6/site-packages/torch/lib/libtorch.so.
Job submission
Here is an example of a job submission script using the python wheel, with a virtual environment inside a job:
#!/bin/bash
#SBATCH --gres=gpu:1 # Request GPU "generic resources"
#SBATCH --cpus-per-task=6 # Cores proportional to GPUs: 6 on Cedar, 16 on Graham.
#SBATCH --mem=32000M # Memory proportional to GPUs: 32000 Cedar, 64000 Graham.
#SBATCH --time=0-03:00
#SBATCH --output=%N-%j.out
module load python/3.6
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate
pip install torch --no-index
python pytorch-test.py
The Python script pytorch-test.py
has the form
import torch
x = torch.Tensor(5, 3)
print(x)
y = torch.rand(5, 3)
print(y)
# let us run the following only if CUDA is available
if torch.cuda.is_available():
x = x.cuda()
y = y.cuda()
print(x + y)
You can then submit a PyTorch job with:
[name@server ~]$ sbatch pytorch-test.sh
Benchmarks
This section gives ResNet-18 benchmark results on different clusters with various configurations.
All numbers are images per second per GPU, using DistributedDataParallel
and NCCL.
These results are provisional and there is a lot of variance in their measurement. Work is being done to get a clearer picture.
Batch Size | 1 Node, 1 GPU (baseline) | 1 Node, 2 GPUs | 2 * (1 Node, 2 GPUs) | 3 * (1 Node, 2 GPUs) |
---|---|---|---|---|
32 | 542 | 134 | 103 | 82 |
64 | 620 | 190 | 149 | 134 |
128 | 646 | 241 | 197 | 180 |
256 | 587 | 263 | 184 | 368 |
Troubleshooting
Memory leak
On AVX512 hardware (Béluga, Skylake or V100 nodes), older versions of Pytorch (less than v1.0.1) using older libraries (cuDNN < v7.5 or MAGMA < v2.5) may considerably leak memory resulting in an out-of-memory exception and death of your tasks. Please upgrade to the latest torch version.