TensorFlow
This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.
Installing Tensorflow
These instructions install Tensorflow into your home directory using Compute Canada's pre-built Python wheels. Custom Python wheels are stored in /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/
. To install Tensorflow's wheel we will use the pip
command and install it into a Python virtual environments. The below instructions install for Python 3.5.2 but you can also install for Python 3.5.Y or 2.7.X by loading a different Python module.
Load modules required by Tensorflow:
[name@server ~]$ module load cuda cudnn python/3.5.2
Create a new python virtual environment:
[name@server ~]$ virtualenv tensorflow
Activate your newly created python virtual environment:
[name@server ~]$ source tensorflow/bin/activate
Install the numpy and Tensorflow wheels into your newly created virtual environment
[name@server ~]$ pip install tensorflow
Submitting a Tensorflow job
Once you have the above setup completed you can submit a Tensorflow job as
$ sbatch tensorflow-test.sh
The contents of `tensorflow-test.sh`:
#!/bin/bash
#SBATCH --gres=gpu:1 # request GPU "generic resource"
#SBATCH --mem=4000M # memory per node
#SBATCH --time=0-05:00 # time (DD-HH:MM)
#SBATCH --output=%N-%j.out # %N for node name, %j for jobID
module load cuda cudnn python/3.5.2
source tensorflow/bin/activate
python ./tensorflow-test.py
The contents of `tensorflow-test.py`
import tensorflow as tf
node1 = tf.constant(3.0, dtype=tf.float32)
node2 = tf.constant(4.0) # also tf.float32 implicitly
print(node1, node2)
sess = tf.Session()
print(sess.run([node1, node2]))
Once the above job has completed (should take less than a minute) you should see an output file called something like `cdr116-122907.out` with contents similar to the following:
2017-07-10 12:35:19.489458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: Tesla P100-PCIE-12GB
major: 6 minor: 0 memoryClockRate (GHz) 1.3285
pciBusID 0000:82:00.0
Total memory: 11.91GiB
Free memory: 11.63GiB
2017-07-10 12:35:19.491097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-07-10 12:35:19.491156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2017-07-10 12:35:19.520737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:82:00.0)
Tensor("Const:0", shape=(), dtype=float32) Tensor("Const_1:0", shape=(), dtype=float32)
[3.0, 4.0]