Translations:Tutoriel Apprentissage machine/55/en

From Alliance Doc
Jump to navigation Jump to search
  1. Modify your job submission script (or your program) so that your job can be interrupted and continued . Your program should be able to access the most recent checkpoint file. (See the example script below).
  2. Verify how many epochs (or iterations) can be carried out in a 24 hour unit.
  3. Calculate how many of these 24 hour units you will need: n_units = n_epochs_total / n_epochs_per_24h
  4. Use the argument --array 1-<n_blocs>%1 to ask for a chain of n_blocs jobs.