Weights & Biases (wandb)/fr: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
(Updating to match new version of source page)
Line 10: Line 10:




<div class="mw-translate-fuzzy">
{| class="wikitable"
{| class="wikitable"
|-
|-
Line 20: Line 21:
| Graham || non ❌ || accès internet désactivé sur les nœuds de calcul  
| Graham || non ❌ || accès internet désactivé sur les nœuds de calcul  
|}
|}
</div>
=== Béluga ===
While it is possible to upload basic metrics to Weights&Biases during a job on Béluga, the wandb package automatically uploads information about the user's environment to a Google Cloud Storage bucket. It is not currently possible to disable this behaviour. Uploading artifacts to W&B with <tt>wandb.save()</tt> also requires access to Google Cloud Storage, which is not available on Béluga's compute nodes.
Users can still use wandb on Béluga by enabling the [https://docs.wandb.ai/library/cli#wandb-offline <tt>offline</tt>] or [https://docs.wandb.ai/library/init#save-logs-offline <tt>dryrun</tt>] modes. In these two modes, wandb will write all metrics, logs and artifacts to the local disk and will not attempt to sync anything to the Weights&Biases service on the internet. After their jobs finish running, users can sync their wandb content to the online service by running the command [https://docs.wandb.ai/ref/cli#wandb-sync <tt>wandb sync</tt>] on the login node.


=== Exemple ===
=== Exemple ===


<div class="mw-translate-fuzzy">
L'exemple suivant montre comment utiliser wandb pour le suivi de l'expérimentation sur Béluga. Pour reproduire ceci sur Cedar, il n'est pas nécessaire de charger le module <tt>httpproxy</tt>.
L'exemple suivant montre comment utiliser wandb pour le suivi de l'expérimentation sur Béluga. Pour reproduire ceci sur Cedar, il n'est pas nécessaire de charger le module <tt>httpproxy</tt>.
</div>


{{File
{{File
Line 36: Line 46:




<div class="mw-translate-fuzzy">
module load python/3.6 httpproxy
module load python/3.6 httpproxy
virtualenv --no-download $SLURM_TMPDIR/env
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate
source $SLURM_TMPDIR/env/bin/activate
pip install torchvision wandb --no-index
pip install torchvision wandb --no-index
</div>


### Save your wandb API key in your .bash_profile or replace $API_KEY with your actual API key. Uncomment the line below and comment out 'wandb offline'. if running on Cedar ###


### Save your wandb API key in your .bash_profile or replace $API_KEY with your actual API key:
#wandb login $API_KEY  


wandb login $API_KEY
wandb offline


python wandb-test.py
python wandb-test.py

Revision as of 17:58, 11 March 2021

Other languages:

Weights & Biases (wandb) est une plateforme de méta-apprentissage machine qui permet de construire des modèles pour des applications concrètes. La plateforme permet de suivre, comparer, décrire et reproduire les expériences d'apprentissage machine.

Utilisation sur nos grappes

Disponibilité

Puisque wandb exige une connexion à l'internet, sa disponibilité sur les nœuds de calcul dépend de la grappe.


Grappe Disponible
Béluga oui ✅ avant d'utiliser wandb, chargez le module httpproxy avec module load httpproxy
Cedar oui ✅ accès internet activé
Graham non ❌ accès internet désactivé sur les nœuds de calcul

Béluga

While it is possible to upload basic metrics to Weights&Biases during a job on Béluga, the wandb package automatically uploads information about the user's environment to a Google Cloud Storage bucket. It is not currently possible to disable this behaviour. Uploading artifacts to W&B with wandb.save() also requires access to Google Cloud Storage, which is not available on Béluga's compute nodes.

Users can still use wandb on Béluga by enabling the offline or dryrun modes. In these two modes, wandb will write all metrics, logs and artifacts to the local disk and will not attempt to sync anything to the Weights&Biases service on the internet. After their jobs finish running, users can sync their wandb content to the online service by running the command wandb sync on the login node.

Exemple

L'exemple suivant montre comment utiliser wandb pour le suivi de l'expérimentation sur Béluga. Pour reproduire ceci sur Cedar, il n'est pas nécessaire de charger le module httpproxy.


File : wandb-test.sh

#!/bin/bash
#SBATCH --cpus-per-task=1 
#SBATCH --mem=2G       
#SBATCH --time=0-03:00
#SBATCH --output=%N-%j.out


<div class="mw-translate-fuzzy">
module load python/3.6 httpproxy
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate
pip install torchvision wandb --no-index
</div>

### Save your wandb API key in your .bash_profile or replace $API_KEY with your actual API key. Uncomment the line below and comment out 'wandb offline'. if running on Cedar ###

#wandb login $API_KEY 

wandb offline

python wandb-test.py


Le script wandb-test.py utilise la méthode watch() pour journaliser les métriques. Voir la documentation complète.


File : wandb-test.py

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.backends.cudnn as cudnn

import torchvision
import torchvision.transforms as transforms
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader

import argparse

import wandb


parser = argparse.ArgumentParser(description='cifar10 classification models, wandb test')
parser.add_argument('--lr', default=0.1, help='')
parser.add_argument('--batch_size', type=int, default=768, help='')
parser.add_argument('--max_epochs', type=int, default=4, help='')
parser.add_argument('--num_workers', type=int, default=0, help='')

def main():
    
    args = parser.parse_args()

    print("Starting Wandb...")

    wandb.init(project="wandb-pytorch-test", config=args)

    class Net(nn.Module):

       def __init__(self):
          super(Net, self).__init__()

          self.conv1 = nn.Conv2d(3, 6, 5)
          self.pool = nn.MaxPool2d(2, 2)
          self.conv2 = nn.Conv2d(6, 16, 5)
          self.fc1 = nn.Linear(16 * 5 * 5, 120)
          self.fc2 = nn.Linear(120, 84)
          self.fc3 = nn.Linear(84, 10)

       def forward(self, x):
          x = self.pool(F.relu(self.conv1(x)))
          x = self.pool(F.relu(self.conv2(x)))
          x = x.view(-1, 16 * 5 * 5)
          x = F.relu(self.fc1(x))
          x = F.relu(self.fc2(x))
          x = self.fc3(x)
          return x

    net = Net()

    transform_train = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

    dataset_train = CIFAR10(root='./data', train=True, download=False, transform=transform_train)

    train_loader = DataLoader(dataset_train, batch_size=args.batch_size, num_workers=args.num_workers)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=args.lr)

    wandb.watch(net)

    for epoch in range(args.max_epochs):

        train(epoch, net, criterion, optimizer, train_loader)


def train(epoch, net, criterion, optimizer, train_loader):

    for batch_idx, (inputs, targets) in enumerate(train_loader):

       outputs = net(inputs)
       loss = criterion(outputs, targets)

       optimizer.zero_grad()
       loss.backward()
       optimizer.step()


if __name__=='__main__':
   main()