cc_staff
353
edits
(Marked this version for translation) |
No edit summary |
||
Line 11: | Line 11: | ||
<!--T:4--> | <!--T:4--> | ||
Since it requires an internet connection, wandb has restricted availability on compute nodes, depending on the cluster: | Since it requires an internet connection, wandb has restricted availability on compute nodes, depending on the cluster: | ||
<!--T:5--> | <!--T:5--> | ||
Line 18: | Line 17: | ||
! Cluster !! Availability !! Note | ! Cluster !! Availability !! Note | ||
|- | |- | ||
| Béluga || No ❌ || Wandb requires access to Google Cloud Storage, which is not | | Béluga || No ❌ || Wandb requires access to Google Cloud Storage, which is not accessible from the compute nodes | ||
|- | |- | ||
| Cedar || Yes ✅ || Internet access is enabled | | Cedar || Yes ✅ || Internet access is enabled | ||
Line 28: | Line 27: | ||
<!--T:41--> | <!--T:41--> | ||
While it is possible to upload basic metrics to Weights&Biases during a job on Béluga, the wandb package automatically uploads information about the user's environment to a Google Cloud Storage bucket. It is not currently possible to disable this behaviour. Uploading artifacts to W&B with <tt>wandb.save()</tt> also requires access to Google Cloud Storage, which is not available on Béluga's compute nodes. | While it is possible to upload basic metrics to Weights&Biases during a job on Béluga, the wandb package automatically uploads information about the user's environment to a Google Cloud Storage bucket, resulting in a crash during or at the very end of a training run. It is not currently possible to disable this behaviour. Uploading artifacts to W&B with <tt>wandb.save()</tt> also requires access to Google Cloud Storage, which is not available on Béluga's compute nodes. | ||
<!--T:42--> | <!--T:42--> | ||
Users can still use wandb on Béluga by enabling the [https://docs.wandb.ai/library/cli#wandb-offline <tt>offline</tt>] or [https://docs.wandb.ai/library/init#save-logs-offline <tt>dryrun</tt>] modes. In these two modes, wandb will write all metrics, logs and artifacts to the local disk and will not attempt to sync anything to the Weights&Biases service on the internet. After their jobs finish running, users can sync their wandb content to the online service by running the command [https://docs.wandb.ai/ref/cli#wandb-sync <tt>wandb sync</tt>] on the login node. | Users can still use wandb on Béluga by enabling the [https://docs.wandb.ai/library/cli#wandb-offline <tt>offline</tt>] or [https://docs.wandb.ai/library/init#save-logs-offline <tt>dryrun</tt>] modes. In these two modes, wandb will write all metrics, logs and artifacts to the local disk and will not attempt to sync anything to the Weights&Biases service on the internet. After their jobs finish running, users can sync their wandb content to the online service by running the command [https://docs.wandb.ai/ref/cli#wandb-sync <tt>wandb sync</tt>] on the login node. | ||
Note that [[Comet.ml]] is a product very similar to Weights & Biases, and works on Béluga. | |||
=== Example === <!--T:6--> | === Example === <!--T:6--> |