TensorFlow
This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.
Installing Tensorflow
These instructions install Tensorflow into your home directory using Compute Canada's pre-built Python wheels. Custom Python wheels are stored in `/cvmfs/soft.computecanada.ca/custom/python/wheelhouse/`. To install Tensorflow both an installation of Numpy and Tensorflow are required. To install Python wheels we will use the `pip` command and install into a python virtual environment, see python virtual environments. The below instructions install for Python 3.5.2 but you can also install for python 3.5.Y or 2.7.X by loading a different python module and choosing wheels with either `cp35` or `cp27` in their names for either the Python versions 3.5.Y or 2.7.X respectively.
Load modules required by Tensorflow:
$ module load gcc java cuda cudnn bazel python/3.5.2
Create a new python virtual environment saved in the `python_envs` folder in your home directory:
$ pyvenv $HOME/python_envs/tensorflow
Note that this command is virtualenv
in Python 2.7 see python virtual environments)
Activate your newly created python virtual environment:
$ source $HOME/python_envs/tensorflow/bin/activate
Install the numpy and Tensorflow wheels into your newly created virtual environment
$ pip install --upgrade /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/avx2/numpy-1.12.1-cp35-cp35m-linux_x86_64.whl
$ pip install --upgrade /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/avx2/tensorflow-1.2.1+computecanada-cp35-cp35m-linux_x86_64.whl
Submitting a Tensorflow job
Once you have the above setup completed you can submit a Tensorflow job as
$ sbatch tensorflow-test.sh
The contents of `tensorflow-test.sh`:
#!/bin/bash
#SBATCH --gres=gpu:1 # request GPU "generic resource"
#SBATCH --mem=4000M # memory per node
#SBATCH --time=0-05:00 # time (DD-HH:MM)
#SBATCH --output=%N-%j.out # %N for node name, %j for jobID
module purge
module load gcc java cuda cudnn bazel python/3.5.2
source $HOME/python_envs/tensorflow/bin/activate
python ./tensorflow-test.py
The contents of `tensorflow-test.py`
import tensorflow as tf
node1 = tf.constant(3.0, dtype=tf.float32)
node2 = tf.constant(4.0) # also tf.float32 implicitly
print(node1, node2)
sess = tf.Session()
print(sess.run([node1, node2]))
Once the above job has completed (should take less than a minute) you should see an output file called something like `cdr116-122907.out` with contents similar to the following:
2017-07-10 12:35:19.489458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: Tesla P100-PCIE-12GB
major: 6 minor: 0 memoryClockRate (GHz) 1.3285
pciBusID 0000:82:00.0
Total memory: 11.91GiB
Free memory: 11.63GiB
2017-07-10 12:35:19.491097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-07-10 12:35:19.491156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2017-07-10 12:35:19.520737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:82:00.0)
Tensor("Const:0", shape=(), dtype=float32) Tensor("Const_1:0", shape=(), dtype=float32)
[3.0, 4.0]