TensorFlow: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
(Marked this version for translation)
Line 18: Line 18:


Install TensorFlow into your newly created virtual environment using the command from either one of the two following subsections.  
Install TensorFlow into your newly created virtual environment using the command from either one of the two following subsections.  
=== CPU-only ===
=== CPU-only === <!--T:8-->
{{Command|prompt=(tensorflow) [name@server $]
{{Command|prompt=(tensorflow) [name@server $]
|pip install tensorflow-cpu}}
|pip install tensorflow-cpu}}


=== GPU ===
=== GPU === <!--T:9-->


<!--T:10-->
{{Command|prompt=(tensorflow) [name@server $]
{{Command|prompt=(tensorflow) [name@server $]
|pip install tensorflow-gpu}}
|pip install tensorflow-gpu}}

Revision as of 21:03, 23 October 2017

Other languages:

Installing Tensorflow

These instructions install Tensorflow into your home directory using Compute Canada's pre-built Python wheels. Custom Python wheels are stored in /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/. To install Tensorflow's wheel we will use the pip command and install it into a Python virtual environments. The below instructions install for Python 3.5.2 but you can also install for Python 3.5.Y or 2.7.X by loading a different Python module.

Load modules required by Tensorflow:

Question.png
[name@server ~]$ module load python/3.5.2

Create a new python virtual environment:

Question.png
[name@server ~]$ virtualenv tensorflow

Activate your newly created python virtual environment:

Question.png
[name@server ~]$ source tensorflow/bin/activate

Install TensorFlow into your newly created virtual environment using the command from either one of the two following subsections.

CPU-only

Question.png
(tensorflow) [name@server $] pip install tensorflow-cpu

GPU

Question.png
(tensorflow) [name@server $] pip install tensorflow-gpu

Submitting a TensorFlow job with a GPU

Once you have the above setup completed you can submit a Tensorflow job as

Question.png
[name@server ~]$ sbatch tensorflow-test.sh

The job submission script has the contents

File : tensorflow-test.sh

#!/bin/bash
#SBATCH --gres=gpu:1              # request GPU "generic resource"
#SBATCH --cpus-per-task=6    #Maximum of CPU cores per GPU request: 6 on Cedar, 16 on Graham.
#SBATCH --mem=32000M               # memory per node
#SBATCH --time=0-03:00            # time (DD-HH:MM)
#SBATCH --output=%N-%j.out        # %N for node name, %j for jobID

module load cuda cudnn python/3.5.2
source tensorflow/bin/activate
python ./tensorflow-test.py


while the Python script has the form,

File : tensorflow-test.py

import tensorflow as tf
node1 = tf.constant(3.0, dtype=tf.float32)
node2 = tf.constant(4.0) # also tf.float32 implicitly
print(node1, node2)
sess = tf.Session()
print(sess.run([node1, node2]))


Once the above job has completed (should take less than a minute) you should see an output file called something like cdr116-122907.out with contents similar to the following example,

File : cdr116-122907.out

2017-07-10 12:35:19.489458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: Tesla P100-PCIE-12GB
major: 6 minor: 0 memoryClockRate (GHz) 1.3285
pciBusID 0000:82:00.0
Total memory: 11.91GiB
Free memory: 11.63GiB
2017-07-10 12:35:19.491097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-07-10 12:35:19.491156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y
2017-07-10 12:35:19.520737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:82:00.0)
Tensor("Const:0", shape=(), dtype=float32) Tensor("Const_1:0", shape=(), dtype=float32)
[3.0, 4.0]


Using Cedar's large GPU nodes

TensorFlow can run on all GPU node types on Cedar and Graham. Cedar's large GPU node type, which equips 4 x P100-PCIE-16GB with GPU Direct P2P enabled between each pair, is highly recommended for large scale Deep Learning/ Machine Learning research.

Large GPU nodes on Cedar accept both whole-node jobs and single-GPU jobs. But single-GPU requests can only run up to 24 hours. The job submission script for a single-GPU job should have the contents

File : tensorflow-lgpu-single.sh

#!/bin/bash
#SBATCH --nodes=1                 # request number of whole nodes
#SBATCH --ntasks-per-node=1    
#SBATCH --cpus-per-task=6  # Total CPU cores is 24, each GPU should use up to 6 cores 
#SBATCH --gres=gpu:lgpu:1              # lgpu is required for using large GPU nodes
#SBATCH --mem=60G               # Total memory per node is around 250GB, each GPU can ask 60G
#SBATCH --time=0-03:00            # time (DD-HH:MM)
#SBATCH --output=%N-%j.out        # %N for node name, %j for jobID

module load cuda cudnn python/3.5.2
source tensorflow/bin/activate
python ./tensorflow-test.py


The job submission script for a whole node (4 GPUs) job should have the contents

File : tensorflow-lgpu-whole-node.sh

#!/bin/bash
#SBATCH --nodes=1                 # request number of whole nodes
#SBATCH --ntasks-per-node=1    
#SBATCH --cpus-per-task=24  # Total CPU cores should be 24.
#SBATCH --gres=gpu:lgpu:4              # lgpu is required for using large GPU nodes
#SBATCH --mem=250G               # memory per node
#SBATCH --time=0-03:00            # time (DD-HH:MM)
#SBATCH --output=%N-%j.out        # %N for node name, %j for jobID

module load cuda cudnn python/3.5.2
source tensorflow/bin/activate
python ./tensorflow-test.py


Packing single-GPU jobs within one SLURM job

Cedar's large GPU nodes are highly recommended to run Deep Learning models which can be accelerated by multiple GPUs. If user needs to run 4 x single GPU codes or 2 x 2-GPU codes in a node for longer than 24 hours, GNU Parallel is recommended. A simple example is given below:

cat params.input | parallel -j4 'CUDA_VISIBLE_DEVICES=$(({%} - 1)) python {} &> {#}.out'

GPU id will be calculated by slot id {%} minus 1. {#} is the job id, starting from 1.

A params.input file should includes input parameters in each line like:

code1.py
code2.py
code3.py
code4.py
...

With this method, user can run multiple codes in one submission. In this case, GNU Parallel will run a maximum of 4 jobs at a time. It will launch the next job when one job is finished. CUDA_VISIBLE_DEVICES is used to force using only 1 GPU for each code.