PyTorch: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
(Remove torch dep not found and add point on avx512 mem leak)
(Use localscratch to follow our best practices and reorder.)
Line 38: Line 38:


<!--T:11-->
<!--T:11-->
Once the setup is completed, you can submit a PyTorch job with
Here is an example of a job submission script using the python wheel, with a virtual environment inside a job:
{{Command|sbatch pytorch-test.sh}}
Here is an example of a job submission script using the python wheel, with a virtual environment in $HOME/pytorch:
{{File
{{File
   |name=pytorch-test.sh
   |name=pytorch-test.sh
Line 51: Line 49:
#SBATCH --time=0-03:00
#SBATCH --time=0-03:00
#SBATCH --output=%N-%j.out
#SBATCH --output=%N-%j.out
module load python/3.6
module load python/3.6
source $HOME/pytorch/bin/activate
virtualenv --no-download $SLURM_TMPDIR/env
python ./pytorch-test.py
source $SLURM_TMPDIR/env/bin/activate
pip install torch --no-index
 
python pytorch-test.py
}}
}}


Line 72: Line 74:
     print(x + y)
     print(x + y)
}}
}}
You can then submit a PyTorch job with:
{{Command|sbatch pytorch-test.sh}}
== Troubleshooting == <!--T:23-->
== Troubleshooting == <!--T:23-->



Revision as of 17:55, 28 May 2019

Other languages:

PyTorch is a Python package that provides two high-level features:

  • Tensor computation (like NumPy) with strong GPU acceleration
  • Deep neural networks built on a tape-based autograd system

PyTorch has a distant connection with Torch, but for all practical purposes you can treat them as separate packages.

Installation

Latest available wheels

To see the latest version of PyTorch that we have built:

Question.png
[name@server ~]$ avail_wheels "torch*"

For more information on listing wheels, see listing available wheels.

Installing Compute Canada wheel

The preferred option is to install it using the Python wheel as follows:

1. Load a Python module, either python/2.7, python/3.5, python/3.6 or python/3.7
2. Create and start a virtual environment.
3. Install PyTorch in the virtual environment with pip install.

GPU and CPU

Question.png
(venv) [name@server ~] pip install torch --no-index

Extra

In addition to torch, you can install torchvision, torchtext and torchaudio:

Question.png
(venv) [name@server ~] pip install torch torchvision torchtext torchaudio --no-index

libtorch

libtorch.so is included in the wheel. Once Pytorch is installed in a virtual environment, you can find it at: $VIRTUAL_ENV/lib/python3.6/site-packages/torch/lib/libtorch.so.

Job submission

Here is an example of a job submission script using the python wheel, with a virtual environment inside a job:

File : pytorch-test.sh

#!/bin/bash
#SBATCH --gres=gpu:1       # Request GPU "generic resources"
#SBATCH --cpus-per-task=6  # Cores proportional to GPUs: 6 on Cedar, 16 on Graham.
#SBATCH --mem=32000M       # Memory proportional to GPUs: 32000 Cedar, 64000 Graham.
#SBATCH --time=0-03:00
#SBATCH --output=%N-%j.out

module load python/3.6
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate
pip install torch --no-index

python pytorch-test.py


The Python script pytorch-test.py has the form

File : pytorch-test.py

import torch
x = torch.Tensor(5, 3)
print(x)
y = torch.rand(5, 3)
print(y)
# let us run the following only if CUDA is available
if torch.cuda.is_available():
    x = x.cuda()
    y = y.cuda()
    print(x + y)


You can then submit a PyTorch job with:

Question.png
[name@server ~]$ sbatch pytorch-test.sh

Troubleshooting

Memory leak

On AVX512 hardware (Béluga, skylake or V100 nodes), older version of Pytorch (less than v1.0.1) using older libraries (cuDNN < v7.5 or MAGMA < v2.5) may considerably leak memory resulting in an out of memory exception and death of your tasks. Please upgrade torch version to the latest.