PyTorch

From Alliance Doc
Revision as of 18:52, 17 July 2019 by Rdickson (talk | contribs) (Marked this version for translation)
Jump to navigation Jump to search
Other languages:

PyTorch is a Python package that provides two high-level features:

  • Tensor computation (like NumPy) with strong GPU acceleration
  • Deep neural networks built on a tape-based autograd system

PyTorch has a distant connection with Torch, but for all practical purposes you can treat them as separate projects.

PyTorch developers also offer LibTorch, which allows one to implement extensions to PyTorch using C++, and to implement pure C++ machine learning applications. Models written in Python using PyTorch can be converted and used in pure C++ through TorchScript.

Installation

Latest available wheels

To see the latest version of PyTorch that we have built:

Question.png
[name@server ~]$ avail_wheels "torch*"

For more information on listing wheels, see listing available wheels.

Installing Compute Canada wheel

The preferred option is to install it using the Python wheel as follows:

1. Load a Python module, either python/2.7, python/3.5, python/3.6 or python/3.7
2. Create and start a virtual environment.
3. Install PyTorch in the virtual environment with pip install.

GPU and CPU

Question.png
(venv) [name@server ~] pip install torch --no-index

Extra

In addition to torch, you can install torchvision, torchtext and torchaudio:

Question.png
(venv) [name@server ~] pip install torch torchvision torchtext torchaudio --no-index

libtorch

libtorch.so is included in the wheel. Once Pytorch is installed in a virtual environment, you can find it at: $VIRTUAL_ENV/lib/python3.6/site-packages/torch/lib/libtorch.so.

Job submission

Here is an example of a job submission script using the python wheel, with a virtual environment inside a job:

File : pytorch-test.sh

#!/bin/bash
#SBATCH --gres=gpu:1       # Request GPU "generic resources"
#SBATCH --cpus-per-task=6  # Cores proportional to GPUs: 6 on Cedar, 16 on Graham.
#SBATCH --mem=32000M       # Memory proportional to GPUs: 32000 Cedar, 64000 Graham.
#SBATCH --time=0-03:00
#SBATCH --output=%N-%j.out

module load python/3.6
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate
pip install torch --no-index

python pytorch-test.py


The Python script pytorch-test.py has the form

File : pytorch-test.py

import torch
x = torch.Tensor(5, 3)
print(x)
y = torch.rand(5, 3)
print(y)
# let us run the following only if CUDA is available
if torch.cuda.is_available():
    x = x.cuda()
    y = y.cuda()
    print(x + y)


You can then submit a PyTorch job with:

Question.png
[name@server ~]$ sbatch pytorch-test.sh


Troubleshooting

Memory leak

On AVX512 hardware (Béluga, Skylake or V100 nodes), older versions of Pytorch (less than v1.0.1) using older libraries (cuDNN < v7.5 or MAGMA < v2.5) may considerably leak memory resulting in an out-of-memory exception and death of your tasks. Please upgrade to the latest torch version.