TensorFlow/fr: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
(Updating to match new version of source page)
Line 28: Line 28:
#!/bin/bash
#!/bin/bash
#SBATCH --gres=gpu:1              # request GPU "generic resource"
#SBATCH --gres=gpu:1              # request GPU "generic resource"
#SBATCH --mem=4000M               # memory per node
#SBATCH --cpus-per-task=6    #Maximum of CPU cores per GPU request: 6 on Cedar, 16 on Graham.
#SBATCH --time=0-05:00            # time (DD-HH:MM)
#SBATCH --mem=32000M               # memory per node
#SBATCH --time=0-03:00            # time (DD-HH:MM)
#SBATCH --output=%N-%j.out        # %N for node name, %j for jobID
#SBATCH --output=%N-%j.out        # %N for node name, %j for jobID


Line 65: Line 66:
[3.0, 4.0]
[3.0, 4.0]
}}
}}
==Using Cedar's large GPU nodes==
TensorFlow can run on all GPU node types on Cedar and Graham. Cedar's large GPU node type, which equips 4 x P100-PCIE-16GB with GPU Direct P2P enabled between each pair, is highly recommended for large scale Deep Learning/ Machine Learning research.
Currently all large GPU nodes on Cedar accept whole node(s) jobs only.  User should run multi-gpu supported code or pack multiple codes in one job.  The job submission script should have the contents
{{File
  |name=tensorflow-test-lgpu.sh
  |lang="bash"
  |contents=
#!/bin/bash
#SBATCH --nodes=1                # request number of whole nodes
#SBATCH --ntasks-per-node=1   
#SBATCH --cpus-per-task=24  # must use a combination of --ntasks-per-node and --cpus-per-task. Total CPU cores should be 24.
#SBATCH --gres=gpu:lgpu:4              # lgpu is required for using large GPU nodes
#SBATCH --mem=250G              # memory per node
#SBATCH --time=0-03:00            # time (DD-HH:MM)
#SBATCH --output=%N-%j.out        # %N for node name, %j for jobID
module load cuda cudnn python/3.5.2
source tensorflow/bin/activate
python ./tensorflow-test.py
}}
===Packing single-GPU jobs within one SLURM job===
Cedar's large GPU nodes are highly recommended to run Deep Learning models which can be accelerated by multiple GPUs. If user has to run 4 x single GPU codes or 2 x 2-GPU codes in a node, [https://www.gnu.org/software/parallel/ GNU Parallel] is recommended. A simple example is given below:
<pre>
cat params.input | parallel -j4 'CUDA_VISIBLE_DEVICES=$(({%} - 1)) python {}'
</pre>
A params.input file should includes input parameters in each line like:
<pre>
code1.py
code2.py
code3.py
code4.py
...
</pre>
With this method, user can run multiple codes in one submission. In this case, GNU Parallel will run a maximum of 4 jobs at a time. It will launch the next job when one job is finished. CUDA_VISIBLE_DEVICES is used to force using only 1 GPU for each code.

Revision as of 20:31, 8 August 2017

Other languages:

Installation

Les directives suivantes servent à installer TensorFlow dans votre répertoire home à l'aide des paquets binaires (Python wheels) préparés par Calcul Canada; ces paquets se trouvent dans /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/.
Le paquet TensorFlow sera installé dans un environment virtuel Python avec la commande pip.
Ces directives sont valides pour Python 3.5.2; avec Python 3.5.Y ou 2.7.X, utilisez un des autres modules Python.

Chargez les modules requis par TensorFlow.

Question.png
[nom@serveur ~]$ module load cuda cudnn python/3.5.2

Créez un nouvel environnement Python.

Question.png
[nom@serveur ~]$ virtualenv tensorflow

Activez le nouvel environnement.

Question.png
[nom@serveur ~]$ source tensorflow/bin/activate

Installez TensorFlow dans ce nouvel environnement.

Question.png
[nom@serveur ~]$ pip install tensorflow

Soumettre une tâche TensorFlow

Soumettez une tâche TensorFlow ainsi :

Question.png
[nom@serveur ~]$ sbatch tensorflow-test.sh

Le script contient

File : tensorflow-test.sh

#!/bin/bash
#SBATCH --gres=gpu:1              # request GPU "generic resource"
#SBATCH --cpus-per-task=6    #Maximum of CPU cores per GPU request: 6 on Cedar, 16 on Graham.
#SBATCH --mem=32000M               # memory per node
#SBATCH --time=0-03:00            # time (DD-HH:MM)
#SBATCH --output=%N-%j.out        # %N for node name, %j for jobID

module load cuda cudnn python/3.5.2
source tensorflow/bin/activate
python ./tensorflow-test.py


alors que le script Python se lit

File : tensorflow-test.py

import tensorflow as tf
node1 = tf.constant(3.0, dtype=tf.float32)
node2 = tf.constant(4.0) # also tf.float32 implicitly
print(node1, node2)
sess = tf.Session()
print(sess.run([node1, node2]))


Une fois la tâche complétée, ce qui devrait nécessiter moins d'une minute, un fichier de sortie avec un nom semblable à cdr116-122907.out devrait être généré. Le contenu de ce fichier serait similaire à ce qui suit :

File : cdr116-122907.out

2017-07-10 12:35:19.489458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: Tesla P100-PCIE-12GB
major: 6 minor: 0 memoryClockRate (GHz) 1.3285
pciBusID 0000:82:00.0
Total memory: 11.91GiB
Free memory: 11.63GiB
2017-07-10 12:35:19.491097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-07-10 12:35:19.491156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y
2017-07-10 12:35:19.520737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:82:00.0)
Tensor("Const:0", shape=(), dtype=float32) Tensor("Const_1:0", shape=(), dtype=float32)
[3.0, 4.0]


Using Cedar's large GPU nodes

TensorFlow can run on all GPU node types on Cedar and Graham. Cedar's large GPU node type, which equips 4 x P100-PCIE-16GB with GPU Direct P2P enabled between each pair, is highly recommended for large scale Deep Learning/ Machine Learning research.

Currently all large GPU nodes on Cedar accept whole node(s) jobs only. User should run multi-gpu supported code or pack multiple codes in one job. The job submission script should have the contents

File : tensorflow-test-lgpu.sh

#!/bin/bash
#SBATCH --nodes=1                 # request number of whole nodes
#SBATCH --ntasks-per-node=1    
#SBATCH --cpus-per-task=24  # must use a combination of --ntasks-per-node and --cpus-per-task. Total CPU cores should be 24.
#SBATCH --gres=gpu:lgpu:4              # lgpu is required for using large GPU nodes
#SBATCH --mem=250G               # memory per node
#SBATCH --time=0-03:00            # time (DD-HH:MM)
#SBATCH --output=%N-%j.out        # %N for node name, %j for jobID

module load cuda cudnn python/3.5.2
source tensorflow/bin/activate
python ./tensorflow-test.py


Packing single-GPU jobs within one SLURM job

Cedar's large GPU nodes are highly recommended to run Deep Learning models which can be accelerated by multiple GPUs. If user has to run 4 x single GPU codes or 2 x 2-GPU codes in a node, GNU Parallel is recommended. A simple example is given below:

cat params.input | parallel -j4 'CUDA_VISIBLE_DEVICES=$(({%} - 1)) python {}'

A params.input file should includes input parameters in each line like:

code1.py
code2.py
code3.py
code4.py
...

With this method, user can run multiple codes in one submission. In this case, GNU Parallel will run a maximum of 4 jobs at a time. It will launch the next job when one job is finished. CUDA_VISIBLE_DEVICES is used to force using only 1 GPU for each code.