Huggingface: Difference between revisions

m
Line 305: Line 305:
Now that the model and the dataset are both saved locally on the cluster’s network filesystem, the next step is to design a job with sufficient resources to train our LLM efficiently. The main factors that might hinder training performance, or prevent the training script from even running in the first place are:
Now that the model and the dataset are both saved locally on the cluster’s network filesystem, the next step is to design a job with sufficient resources to train our LLM efficiently. The main factors that might hinder training performance, or prevent the training script from even running in the first place are:


1. The model is too large to fit entirely inside the memory of a single GPU.
# The model is too large to fit entirely inside the memory of a single GPU.
2. The training set, while relatively small in size, is made up of a large number of very small examples.
# The training set, while relatively small in size, is made up of a large number of very small examples.
   
   
To address these factors, our job will be designed to:
To address these factors, our job will be designed to:


1. Employ a strategy to shard the LLM across multiple GPUs.
# Employ a strategy to shard the LLM across multiple GPUs.
2. Read the dataset from the compute node’s local storage as opposed to the cluster’s parallel filesystem, and store it in the node’s memory afterwards.
# Read the dataset from the compute node’s local storage as opposed to the cluster’s parallel filesystem, and store it in the node’s memory afterwards.


To shard the LLM across multiple devices, we will use the <tt>accelerate</tt> library, along with a configuration file describing a [https://huggingface.co/docs/transformers/main/en/fsdp#fsdp-configuration Fully Sharded Data Parallel (FSDP)] strategy. Using, <tt>accelerate</tt>, the sharding strategy is applied automatically, without us having to explicitly write the code to do it inside the training script. To read the dataset from the compute node’s local storage, it suffices to copy the dataset over to <tt>$SLURM_TMPDIR</tt>.
To shard the LLM across multiple devices, we will use the <tt>accelerate</tt> library, along with a configuration file describing a [https://huggingface.co/docs/transformers/main/en/fsdp#fsdp-configuration Fully Sharded Data Parallel (FSDP)] strategy. Using, <tt>accelerate</tt>, the sharding strategy is applied automatically, without us having to explicitly write the code to do it inside the training script. To read the dataset from the compute node’s local storage, it suffices to copy the dataset over to <tt>$SLURM_TMPDIR</tt>.
cc_staff
282

edits