AI and Machine Learning: Difference between revisions

AI and Machine Learning (view source)

Revision as of 17:10, 8 October 2019

44 bytes removed , 5 years ago

Rephrase, add link to tutorial, fix typo

Lemc2220

cc_staff

353

edits

@@ Line 61: / Line 61: @@
 <!--T:13-->
 * Filesystem [[Storage and file management#Filesystem_quotas_and_policies|quotas]] on Compute Canada clusters limit the number of filesystem objects;
-* Your software could become be significantly slowed down from streaming lots of small files from <tt>/project</tt> (or <tt>/scratch</tt>) to a compute node.
+* Your software could be significantly slowed down from streaming lots of small files from <tt>/project</tt> (or <tt>/scratch</tt>) to a compute node.
 <!--T:14-->
@@ Line 70: / Line 70: @@
 <!--T:16-->
-If your computations are long, you should use checkpointing. For example, if your training time is 3 days, you could split it in 3 chunks of 24 hours. This would prevent you from losing all the work in case of an outage, and would give you an edge in terms of priority (more nodes are available for short jobs). Most machine learning libraries natively support checkpointing. Please see our suggestions about [[Running jobs#Resubmitting_jobs_for_long_running_computations|resubmitting jobs for long running computations]]. If your program does not natively support this, we provide a [[Points de contrôle/en|general checkpointing solution]].
+If your computations are long, you should use checkpointing. For example, if your training time is 3 days, you should split it in 3 chunks of 24 hours. This will prevent you from losing all the work in case of an outage, and give you an edge in terms of priority (more nodes are available for short jobs). Most machine learning libraries natively support checkpointing; the typical case is covered in our [[Tutoriel_Apprentissage_machine/en#Checkpointing_a_long-running_job|tutorial]]. If your program does not natively support this, we provide a [[Points de contrôle/en|general checkpointing solution]].
 == Running many similar jobs == <!--T:17-->

AI and Machine Learning: Difference between revisions

AI and Machine Learning (view source)

Revision as of 17:10, 8 October 2019

Navigation menu

Search