PyTorch: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
(Blanked the page)
Tag: Blanking
Line 1: Line 1:
<languages />
[[Category:Software]]
<translate>
<!--T:14-->
[http://pytorch.org/ PyTorch] is a Python package that provides two high-level features:
* Tensor computation (like NumPy) with strong GPU acceleration
* Deep neural networks built on a tape-based autograd system


<!--T:19-->
PyTorch has a distant connection with [[Torch]], but for all practical purposes you can treat them as separate packages.
= Installation = <!--T:1-->
==Latest available wheels== <!--T:20-->
To see the latest version of PyTorch that we have built:
{{Command|avail_wheels "torch*"}}
For more information on listing wheels, see [[Python#Listing_available_wheels | listing available wheels]].
==Installing Compute Canada wheel== <!--T:15-->
<!--T:25-->
The preferred option is to install it using the Python [https://pythonwheels.com/ wheel] as follows:
:1. Load a Python [[Utiliser_des_modules/en#Sub-command_load|module]], either <tt>python/2.7</tt>, <tt>python/3.5</tt>, <tt>python/3.6</tt> or <tt>python/3.7</tt>
:2. Create and start a [[Python#Creating_and_using_a_virtual_environment|virtual environment]].
:3. Install PyTorch in the virtual environment with <code>pip install</code>.
==== GPU and CPU ==== <!--T:18-->
:{{Command|prompt=(venv) [name@server ~]|pip install torch --no-index}}
====Extra==== <!--T:21-->
In addition to <tt>torch</tt>, you can install <tt>torchvision</tt>, <tt>torchtext</tt> and <tt>torchaudio</tt>:
{{Command|prompt=(venv) [name@server ~]|pip install torch torchvision torchtext torchaudio --no-index}}
====libtorch==== <!--T:24-->
<tt>libtorch.so</tt> is included in the wheel. Once Pytorch is installed in a virtual environment, you can find it at: <tt>$VIRTUAL_ENV/lib/python3.6/site-packages/torch/lib/libtorch.so</tt>.
= Job submission = <!--T:10-->
<!--T:11-->
Here is an example of a job submission script using the python wheel, with a virtual environment inside a job:
{{File
  |name=pytorch-test.sh
  |lang="bash"
  |contents=
#!/bin/bash
#SBATCH --gres=gpu:1      # Request GPU "generic resources"
#SBATCH --cpus-per-task=6  # Cores proportional to GPUs: 6 on Cedar, 16 on Graham.
#SBATCH --mem=32000M      # Memory proportional to GPUs: 32000 Cedar, 64000 Graham.
#SBATCH --time=0-03:00
#SBATCH --output=%N-%j.out
<!--T:27-->
module load python/3.6
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate
pip install torch --no-index
<!--T:28-->
python pytorch-test.py
}}
<!--T:29-->
The Python script <code>pytorch-test.py</code> has the form
{{File
  |name=pytorch-test.py
  |lang="python"
  |contents=
import torch
x = torch.Tensor(5, 3)
print(x)
y = torch.rand(5, 3)
print(y)
# let us run the following only if CUDA is available
if torch.cuda.is_available():
    x = x.cuda()
    y = y.cuda()
    print(x + y)
}}
<!--T:31-->
You can then submit a PyTorch job with:
{{Command|sbatch pytorch-test.sh}}
</translate>
= Benchmarks = <!--T:32-->
<!--T:33-->
This section gives ResNet-18 benchmark results on different clusters with various configurations.
<!--T:34-->
All numbers are images per second '''per GPU''', using <code>DistributedDataParallel</code> and NCCL.
<!--T:35-->
'''These results are provisional and there is a lot of variance in their measurement. Work is being done to get a clearer picture.'''
<!--T:36-->
{| class="wikitable"
|+ Graham[P100], images per second per GPU
|-
! Batch Size !! 1 Node, 1 GPU (baseline) !! 1 Node, 2 GPUs !! 2 * (1 Node, 2 GPUs) !! 3 * (1 Node, 2 GPUs)
|-
| 32  || 542 || 134 || 103 || 82
|-
| 64  || 620 || 190 || 149 || 134
|-
| 128  || 646 || 241 || 197 || 180
|-
| 256  || 587 || 263 || 184 || 368
|}
<translate>
= Troubleshooting = <!--T:23-->
== Memory leak == <!--T:30-->
On AVX512 hardware (Béluga, Skylake or V100 nodes), older versions of Pytorch (less than v1.0.1) using older libraries (cuDNN < v7.5 or MAGMA < v2.5) may considerably leak memory resulting in an out-of-memory exception and death of your tasks. Please upgrade to the latest <tt>torch</tt> version.
</translate>

Revision as of 16:09, 2 July 2019