Allocations and compute scheduling: Difference between revisions

updates for 2025 RAC and multi-instance GPUs
(more specific)
(updates for 2025 RAC and multi-instance GPUs)
Line 36: Line 36:
It is even possible that you could end a month or even a year having run more work than your allocation would seem to allow, although this is unlikely given the demand on our resources.
It is even possible that you could end a month or even a year having run more work than your allocation would seem to allow, although this is unlikely given the demand on our resources.


=Reference GPU Units= <!--T:45-->
=Reference GPU Units (RGUs)= <!--T:45-->
{{Note|This is a new unit that will be used from RAC 2024.}}


<!--T:46-->
<!--T:46-->
As you may be aware, the performance of GPUs has dramatically increased in the recent years and is expected to do so again with the upcoming next generation of GPUs. Until RAC 2023, in order to reduce complexity, we have been treating all GPUs as equivalent to each other at allocation time and when considering how many resources groups have consumed. This has raised issues of fairness, both in the allocation process and while running jobs. We cannot continue to treat all GPU types as the same.
The performance of GPUs has dramatically increased in the recent years and continues to do so. Until RAC 2023 we treated all GPUs as equivalent to each other for allocation purposes. This caused problems both in the allocation process and while running jobs, so in the 2024 RAC year we introduced the <i>reference GPU unit</i>, or <b>RGU</b>, to rank all GPU models in production and alleviate these problems. In the 2025 RAC year we will also have to deal with new complexity involving [[Multi-Instance GPU]] technology.


<!--T:47-->
<!--T:47-->
To overcome the fairness problem, we have defined a <i>reference GPU unit</i> (or <b>RGU</b>) in order to be able to rank all GPU models in production. Because roughly half of our users use primarily single-precision floating-point operations ([https://en.wikipedia.org/wiki/Single-precision_floating-point_format FP32]), the other half use half-precision floating-point operations ([https://en.wikipedia.org/wiki/Half-precision_floating-point_format FP16]), and a significant portion of all users care about the memory on the GPU itself, we set the following evaluation criteria with their corresponding weight:
Because roughly half of our users primarily use single-precision floating-point operations ([https://en.wikipedia.org/wiki/Single-precision_floating-point_format FP32]), the other half use half-precision floating-point operations ([https://en.wikipedia.org/wiki/Half-precision_floating-point_format FP16], dense matrices), and a significant portion of all users are constrained by the amount of memory on the GPU, we chose the following evaluation criteria and corresponding weights to rank the different GPU models:


<!--T:48-->
<!--T:48-->
{| class="wikitable" style="margin: auto;"
{| class="wikitable" style="margin: auto;"
|-
|-
! scope="col"| Evaluation Criteria
! scope="col"| Evaluation Criterion
! scope="col"| Weight <br> (RGU)
! scope="col"| Weight  
|-
|-
! scope="row"| FP32 score
! scope="row"| FP32 score
| 40% * 4 = 1.6
| 40%
|-
|-
! scope="row"| FP16 score
! scope="row"| FP16 score
| 40% * 4 = 1.6
| 40%
|-
|-
! scope="row"| GPU memory score
! scope="row"| GPU memory score
| 20% * 4 = 0.8
| 20%
|}
|}


<!--T:49-->
<!--T:49-->
For convenience, weights are based on percentages up-scaled by a factor of 4 <i>reference GPU units</i> (RGUs). Then, by using the <b>A100-40gb</b> as the reference GPU model, we get the following scores for each model:
We currently use the NVidia <b>A100-40gb</b> GPU as the reference model and assign it an RGU value of 4.0 for historical reasons.  We define its FP16 performance, FP32 performance, and memory size each as 1.0.  Multiplying the percentages in the above table by 4.0 yields the following coefficients and RGU values for other models:


<!--T:50-->
<!--T:50-->
{| class="wikitable" style="margin: auto; text-align: center;"
{| class="wikitable" style="margin: auto; text-align: center;"
|+ RGU scores for whole GPU models
|-
|-
|
|
Line 71: Line 71:
! scope="col"| FP16 score
! scope="col"| FP16 score
! scope="col"| Memory score
! scope="col"| Memory score
! scope="col"| Weighted Score
! scope="col"| Combined score
! colspan="2",scope="col"| Available
! scope="col"| Allocatable
|-
|-
! scope="col"| Weight:
! scope="col"| Coefficient:
! scope="col"| 1.6
! scope="col"| 1.6
! scope="col"| 1.6
! scope="col"| 1.6
! scope="col"| 0.8
! scope="col"| 0.8
| (RGU)
! scope="col"| (RGU)
! scope="col"| Now
! scope="col"| 2025
! scope="col"| RAC 2025
|-
|-
! scope="row" style="text-decoration: underline;"| Model
! scope="row" | H100-80gb
| 3.44 || 3.17 || 2.0 || 12.2 || No ||  Yes || Yes
|-
|-
! scope="row"| P100-12gb
! scope="row"| A100-80gb
| 0.48
| 1.00 || 1.00 || 2.0 ||  4.8 || No ||    ? || No
| 0.00
|-
| 0.3
! scope="row"| A100-40gb
! 1.0
| <b>1.00</b> || <b>1.00</b> || <b>1.0</b> || <b>4.0</b> || Yes || Yes || Yes
|-
! scope="row"| V100-32gb
| 0.81 || 0.40 || 0.8 || 2.6 || Yes ||  ? || No
|-
|-
! scope="row"| P100-16gb
! scope="row"| V100-16gb
| 0.48
| 0.81 || 0.40 || 0.4 || 2.2 || Yes ||  ? || No
| 0.00
| 0.4
! 1.1
|-
|-
! scope="row"| T4-16gb
! scope="row"| T4-16gb
| 0.42
| 0.42 || 0.21 || 0.4 || 1.3 || Yes ||  ? || No
| 0.21
|-
| 0.4
! scope="row"| P100-16gb
! 1.3
| 0.48 || 0.03 || 0.4 || 1.1 || Yes || No || No
|-
|-
! scope="row"| V100-16gb
! scope="row"| P100-12gb
| 0.81
| 0.48 || 0.03 || 0.3 || 1.0 || Yes || No || No
| 0.40
|}
| 0.4
 
! 2.2
With the 2025 [[infrastructure renewal]] it will become possible to schedule a fraction of a GPU using [[multi-instance GPU]] technology.  Different jobs, potentially belonging to different users, can run on the same GPU at the same time.  Following [https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#terminology NVidia's terminology], a fraction of a GPU allocated to a single job is called a "GPU instance", also sometimes called a "MIG instance".
 
The following table lists the GPU models and instances that can be selected in the CCDB form for RAC 2025. RGU values for GPU instances have been estimated from whole-GPU performance numbers and the fraction of the GPU which comprises the instance.
 
{| class="wikitable" style="margin: auto; text-align: center;
|+ GPU models and instances available for RAC 2025
|-
|-
! scope="row"| V100-32gb
! Model or instance !! Fraction of GPU !! RGU
| 0.81
| 0.40
| 0.8
! 2.6
|-
|-
! scope="row"| A100-40gb
! scope="row"| A100-40gb
| <b>1.00</b>
| Whole GPU ⇒ 100% || 4.0
| <b>1.00</b>
|-
| <b>1.0</b>
! scope="row"| A100-3g.20gb
! 4.0
| max(3g/7g, 20GB/40GB) ⇒ 50% || 2.0
|-
! scope="row"| A100-4g.20gb
| max(4g/7g, 20GB/40GB) ⇒ 57% || 2.3
|-
! scope="row"| H100-80gb
| Whole GPU ⇒ 100% || 12.2
|-
! scope="row"| H100-1g.10gb
| max(1g/7g, 40GB/80GB) ⇒ 14% || 1.7
|-
! scope="row"| H100-2g.20gb
| max(2g/7g, 40GB/80GB) ⇒ 28% || 3.5
|-
! scope="row"| H100-3g.40gb
| max(3g/7g, 40GB/80GB) ⇒ 50% || 6.1
|-
|-
! scope="row"| A100-80gb*
! scope="row"| H100-4g.40gb
| 1.00
| max(4g/7g, 40GB/80GB) ⇒ 57% || 7.0
| 1.00
| 2.0
! 4.8
|}
|}


<!--T:59-->
Note: a GPU instance of profile <b>1g</b> is worth 1/7 of a A100 or H100 GPU. The case of <b>3g</b> takes into consideration the extra amount of memory per <b>g</b>.
(*) On Graham, 16 of these GPU models are available on three contributed GPU nodes. While all users can use them for short (<3h) jobs, they are not allocatable through the RAC process.
 
<!--T:51-->
As an example, the oldest GPU model in production (P100-12gb) is worth 1.0 RGU. The next few generations of GPUs will be compared to the A100-40gb using the same formula.


==Choosing GPU models for your project== <!--T:52-->
==Choosing GPU models for your project== <!--T:52-->
Line 139: Line 154:
* If your applications (typically AI-related) are doing primarily FP16 operations (including mixed precision operations or using other [https://en.wikipedia.org/wiki/Bfloat16_floating-point_format floating-point formats]), using an A100-40gb will result in getting evaluated as using 4x the resources of a P100-12gb, but it is capable of computing ~30x the calculations for the same amount of time, which would allow you to complete ~7.5x the computations.
* If your applications (typically AI-related) are doing primarily FP16 operations (including mixed precision operations or using other [https://en.wikipedia.org/wiki/Bfloat16_floating-point_format floating-point formats]), using an A100-40gb will result in getting evaluated as using 4x the resources of a P100-12gb, but it is capable of computing ~30x the calculations for the same amount of time, which would allow you to complete ~7.5x the computations.


==Starting from RAC 2024== <!--T:55-->
==RAC awards hold RGU values constant== <!--T:55-->


<!--T:56-->
<!--T:56-->
* During the Resource Allocation Competition 2024 (RAC 2024), any proposal asking for GPUs will require to specify the preferred GPU model for the project. Then, in the CCDB form, the amount of reference GPU units (RGUs) will automatically be calculated from the requested amount of gpu-years per year of project.
* During the Resource Allocation Competition (RAC), any proposal asking for GPUs must specify the preferred GPU model for the project. Then, in the CCDB form, the amount of reference GPU units (RGUs) will automatically be calculated from the requested amount of gpu-years per year of project.
** For example, if you select the <i>narval-gpu</i> resource and request 13 gpu-years of the model A100-40gb, the corresponding amount of RGUs would be 13 * 4.0 = 52. The RAC committee would then allocate up to 52 RGUs, depending on the proposal score. In case your allocation must be moved to Cedar, the committee would instead allocate up to 20 gpu-years, because each V100-32gb GPU is worth 2.6 RGUs (and 52 / 2.6 = 20).
** For example, if you select the <i>narval-gpu</i> resource and request 13 gpu-years of the model A100-40gb, the corresponding amount of RGUs would be 13 * 4.0 = 52. The RAC committee would then allocate up to 52 RGUs, depending on the proposal score. If your allocation must be moved to a different site, the committee will allocate gpu-years at that site so as to keep the amount of RGUs the same.
 
<!--T:57-->
* For job scheduling and for usage accounting on CCDB, the use of <i>reference GPU units</i> will take effect on April 1st, 2024, with the implementation of RAC 2024.


=Detailed effect of resource usage on priority= <!--T:10-->
=Detailed effect of resource usage on priority= <!--T:10-->
Bureaucrats, cc_docs_admin, cc_staff
2,879

edits