GATK: Difference between revisions

Jump to navigation Jump to search
23 bytes added ,  1 year ago
no edit summary
No edit summary
No edit summary
Line 78: Line 78:
Note that all options passed to <code>--java-options</code> have to be within quotation marks.
Note that all options passed to <code>--java-options</code> have to be within quotation marks.


=== Considerations in our systems === <!--T:50-->
=== Considerations regarding our systems === <!--T:50-->
   
   
To use GATK in our systems we recommend you use the <code>--tmp-dir</code> option and set it to <code>${SLURM_TMPDIR}</code> when in a sbatch job so that the temporary files are redirected to the local storage.
To use GATK in our systems we recommend you use the <code>--tmp-dir</code> option and set it to <code>${SLURM_TMPDIR}</code> when in a sbatch job so that the temporary files are redirected to the local storage.
Line 86: Line 86:


===Earlier versions than GATK 4 === <!--T:14-->
===Earlier versions than GATK 4 === <!--T:14-->
Earlier versions of GATK do not have the '''gatk''' command. Instead, one has to call the jar file:
Earlier versions of GATK do not have the <code>gatk</code> command. Instead, one has to call the jar file:


<!--T:15-->
<!--T:15-->
Line 106: Line 106:


===Multicore usage === <!--T:19-->
===Multicore usage === <!--T:19-->
Most  GATK (>=4) tools are not multicore by default. This means that you should request only one core when calling these kind of tools. Some tools use threads in some of the computations (e.g. <code>Mutect2</code> has the <code>--native-pair-hmm-threads</code>) and therefore you can require more cpus (most of them with up to 4 threads) for these computations. GATK4, however, does provides '''some''' [https://gatk.broadinstitute.org/hc/en-us/articles/360035890591-Spark SPARK commands]:
Most  GATK (>=4) tools are not multicore by default. This means that you should request only one core when calling these kind of tools. Some tools use threads in some of the computations (e.g. <code>Mutect2</code> has the <code>--native-pair-hmm-threads</code>) and therefore you can require more cpus (most of them with up to 4 threads) for these computations. GATK4, however, does provides <b>some</b> [https://gatk.broadinstitute.org/hc/en-us/articles/360035890591-Spark SPARK commands]:


<!--T:46-->
<!--T:46-->
Line 117: Line 117:
<!--T:48-->
<!--T:48-->
- Some GATK tools exist in distinct Spark-capable and non-Spark-capable versions.
- Some GATK tools exist in distinct Spark-capable and non-Spark-capable versions.
The "sparkified" versions have the suffix "Spark" at the end of their names. Many of these are still experimental; down the road we plan to consolidate them so that there will be only one version per tool.
The "sparkified" versions have the suffix <i>Spark</i> at the end of their names. Many of these are still experimental; down the road we plan to consolidate them so that there will be only one version per tool.


<!--T:49-->
<!--T:49-->
Line 125: Line 125:


<!--T:22-->
<!--T:22-->
For the commands that do use Spark, you can request multiple cpus. '''NOTE:''' Please provide the exact number of cpus to the spark command.  For example if you requested 10 cpus, use <code>--spark-master local[10]</code> instead of <code>--spark-master local[*]</code>. If you want to use multiple nodes to scale the Spark cluster, you have to first [[Apache_Spark|deploy a SPARK cluster]] and then set the appropriate variables in the GATK command.
For the commands that do use Spark, you can request multiple cpus. <b>NOTE:</b> Please provide the exact number of cpus to the spark command.  For example if you requested 10 cpus, use <code>--spark-master local[10]</code> instead of <code>--spark-master local[*]</code>. If you want to use multiple nodes to scale the Spark cluster, you have to first [[Apache_Spark|deploy a SPARK cluster]] and then set the appropriate variables in the GATK command.


==Running GATK via Apptainer== <!--T:36-->
==Running GATK via Apptainer== <!--T:36-->
Line 165: Line 165:
==Frequently asked questions == <!--T:23-->
==Frequently asked questions == <!--T:23-->
===How do I add a read group (RG) tag in my bam file? ===
===How do I add a read group (RG) tag in my bam file? ===
Assuming that you want to add a read group called '''tag''' to the file called '''input.bam''', you can use the GATK/PICARD command [https://gatk.broadinstitute.org/hc/en-us/articles/360037226472-AddOrReplaceReadGroups-Picard- AddOrReplaceReadGroups]:
Assuming that you want to add a read group called <i>tag</i> to the file called <i>input.bam</i>, you can use the GATK/PICARD command [https://gatk.broadinstitute.org/hc/en-us/articles/360037226472-AddOrReplaceReadGroups-Picard- AddOrReplaceReadGroups]:
<pre>
<pre>
gatk  AddOrReplaceReadGroups \
gatk  AddOrReplaceReadGroups \
rsnt_translations
56,430

edits

Navigation menu