Revision as of 22:00, 15 February 2023

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries for simplifying running parallel/distributed workloads, in particular Machine Learning jobs.

Installation

Latest available wheels

To see the latest version of Ray that we have built:

[name@server ~]$ avail_wheels "ray"

For more information, see Available wheels.

Installing our wheel

The preferred option is to install it using the Python wheel as follows:

1. Load a Python module, thus module load python

2. Create and start a virtual environment.

3. Install Ray in the virtual environment with pip install.

(venv) [name@server ~] pip install --no-index ray

Job submission

Single Node

Below is an example of a job that spawns a single-node Ray cluster with 6 cpus and 1 GPU.

File : ray-example.sh

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --gpus-per-task=1
#SBATCH --cpus-per-task=6  
#SBATCH --mem=32000M       
#SBATCH --time=0-00:05
#SBATCH --output=%N-%j.out

export HEAD_NODE=$(hostname) # store hostname
export RAY_PORT=34567 # choose a port to start Ray on 

module load python gcc/9.3.0 arrow
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate

pip install ray --no-index

# Launch single-node ray cluster as a background process: only node is the head node! 
ray start --head --node-ip-address=$HEAD_NODE --port=$RAY_PORT --num-cpus=$SLURM_CPUS_PER_TASK --num-gpus=1 --block &
sleep 10

python ray-example.py

In this simple example, we connect to the single-node Ray cluster launched in the job submission script, then we check that Ray sees the resources allocated to the job.

File : ray-example.py

import ray
import os

# Connect to Ray cluster
ray.init(address=f"{os.environ['HEAD_NODE']}:{os.environ['RAY_PORT']}",_node_ip_address=os.environ['HEAD_NODE'])

# Check that ray can see 6 cpus and 1 GPU
print(ray.available_resources())

@@ Line 1: / Line 1: @@
 [[Category:Software]][[Category:AI and Machine Learning]]
-[https://docs.ray.io/ Ray] Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries for simplifying running parallel/distributed workloads, in particular Machine Learning jobs.
+[https://docs.ray.io/ Ray] is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries for simplifying running parallel/distributed workloads, in particular Machine Learning jobs.
 = Installation =

Ray: Difference between revisions

Revision as of 22:00, 15 February 2023

Contents

Installation

Latest available wheels

Installing our wheel

Job submission

Single Node

Navigation menu

Ray: Difference between revisions

Revision as of 22:00, 15 February 2023

Installation

Latest available wheels

Installing our wheel

Job submission

Single Node

Navigation menu

Search