Ray: Difference between revisions

Ray (view source)

Revision as of 00:25, 11 March 2023

4 bytes added , 1 year ago

m

→‎Hyperparameter search with Ray Tune

Lucasn1

cc_staff

282

edits

@@ Line 176: / Line 176: @@
 To run this example, you can use one of the job submission templates provided [[#Job_submission | above]] depending on whether you require one or multiple nodes. As you will see in the code that follows, the amount of resources required by your job will depend mainly on two factors: the number of samples you wish to draw from the search space and the size of your model in memory. Knowing these two things you can reason about how many trials you will run in total and how many of them can run in parallel using as few resources as possible. For example, how many copies of your model can you fit inside the memory of a single GPU? That is the number of trials you can run in parallel using just one GPU.
-In the example, our model takes up about 1GB in memory. We will run 20 trials in total, 10 in parallel at a time in the same GPU, and we will give one CPU per trial to be used as a <code>DataLoader</code> worker. So we will pick the single node job submission template and we will replace the number of cpus per task with <code>#SBATCH --cpus-per-task=10</code> and the Python call with <code>python ray-tune-example.py --num_samples=20 --cpus-per-trial=1 gpus-per-trial=0.1</code>. We will also need to install the packages <code>ray[tune]</code> and <code>torchvision</code> in our virtualenv.
+In the example, our model takes up about 1GB in memory. We will run 20 trials in total, 10 in parallel at a time on the same GPU, and we will give one CPU to each trial to be used as a <code>DataLoader</code> worker. So we will pick the single node job submission template and we will replace the number of cpus per task with <code>#SBATCH --cpus-per-task=10</code> and the Python call with <code>python ray-tune-example.py --num_samples=20 --cpus-per-trial=1 gpus-per-trial=0.1</code>. We will also need to install the packages <code>ray[tune]</code> and <code>torchvision</code> in our virtualenv.
 {{File

Ray: Difference between revisions

Ray (view source)

Revision as of 00:25, 11 March 2023

Navigation menu

Search