Translations:Infrastructure renewal/35/en

From Alliance Doc
Jump to navigation Jump to search

User Training Resources[edit]

Course Title Course Provider Instructor Date Description Audience Format Registration
Survival guide for the upcoming GPU upgrades SHARCNET Sergey Mashchenko Wednesday, November 20, 2024, 12:00 PM to 1:00 PM ET In the coming months, national systems will be undergoing significant upgrades. In particular, older GPUs (P100, V100) will be replaced with the newest H100 GPUs from NVIDIA. The total GPU computing power of the upgraded systems will grow by a factor of 3.5, but the number of GPUs will decrease significantly (from 3200 to 2100). This will present a significant challenge for users, as the usual practice of using a whole GPU for each process or MPI rank will no longer be feasible in most cases. Fortunately, NVIDIA provides two powerful technologies that can be used to mitigate this situation: MPS (Multi-Process Service) and MIG (Multi-Instance GPU). The presentation will walk the audience through both technologies and discuss the ways they can be used on the clusters. The discussion will include how to determine which approach will work best for specific code, and a live demonstration will be given at the end. Prospective users of the upgraded systems. Users intending to use a substantial amount of H100 resources (e.g., more than one GPU at a time, and/or over 24 hours runtime) 1-hour live presentation, recorded for later access No registration required. More info