Translations:Tutoriel Apprentissage machine/55/en
Jump to navigation
Jump to search
- Modify your job submission script (or your program) so that your job can be interrupted and continued . Your program should be able to access the most recent checkpoint file. (See the example script below).
- Verify how many epochs (or iterations) can be carried out in a 24 hour unit.
- Calculate how many of these 24 hour units you will need: n_units = n_epochs_total / n_epochs_per_24h
- Use the argument --array 1-<n_blocs>%1 to ask for a chain of n_blocs jobs.