Revision as of 18:54, 4 September 2020

This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.

Parabricks is a software suite for performing secondary analysis of next generation sequencing (NGS) DNA data. A major benefit of Parabricks is that it is designed to deliver results at blazing fast speeds and low cost. Parabricks can analyze whole human genomes in about 45 minutes, compared to about 30 hours for 30x WGS data. The best part is the output results exactly match the commonly used software. So, it's fairly simple to verify the accuracy of the ouput.

Under the hood, it achieves this performance through tight integration with GPUs, which excel at performing data parallel computation much more effectively than traditional CPU-based solutions. Parabricks was built from the ground up by GPU computing and Deep Learning experts who wanted to develop the fastest and most efficient possible implementation of common genomics algorithms used in secondary analysis.

You can learn more at www.nvidia.com/parabricks

Usage in Compute Canada Clusters

This software was provided freely by NVidia to help with research on COVID19 until Sunday, 17 May 2020. Since this free period has expired, you must have your own license arrangement with NVidia in order to use Parabricks on Compute Canada equipment.

Finding and loading Parabricks

Parabricks can be looked for as a regular module through module spider:

[name@server ~]$ module spider parabricks

Likewise, it can be loaded through LMOD modules:

[name@server ~]$ module load parabricks/2.5.0

Example of usage

Before you embark on using Parabricks, make sure you have gone through the Parabricks documentation, including their standalone tools and pipelines. Also make sure you know how to request graphic cards in Compute Canada clusters. Once you understand the above, you can submit a job like:

#!/bin/bash
#SBATCH --account=def-someuser
#SBATCH --gres=gpu:1
#SBATCH --nodes=1
#SBATCH --cpus-per-task=32
#SBATCH --mem=0
#SBATCH --time=5:00:00

module load parabricks/2.5.0

DATA_DIR=/path/to/data
OUT_DIR=/path/to/output
pbrun germline \
      --ref ${DATA_DIR}/Homo_sapiens_assembly38.fa \
      --in-fq ${DATA_DIR}/some_1.fastq ${DATA_DIR}/some_2.fastq \
      --knownSites ${DATA_DIR}/dbsnp_146.hg38.vcf.gz \
      --tmp-dir ${SLURM_TMPDIR}/ \
      --out-bam ${OUT_DIR}/output.bam \
      --out-variants ${OUT_DIR}/output.vcf \
      --out-recal-file ${OUT_DIR}/report.txt

Make the path to the files absolute real paths (i.e. with the command `realpath .`)!!

Common issues

Almost immediate fail

If your first test fails right away, there might be a missing module or some environmental variable clash. To solve this try:

[name@server ~]$ module --force purge

[name@server ~]$ module load StdEnv/2016.4 nixpkgs/16.09 parabricks/2.5.0

Later fail

Oftentimes Parabricks may not give you a clear traceback of the failure. This usually means that that you did not request enough memory. If you are reserving a full node already through --nodes=1, we suggest you also use all the memory in the node with --mem=0. Otherwise, make sure that your pipeline has enough memory to process your data.

Hybrid usage

Parabricks uses both CPU and GPUs. During our tests, Parabricks used at least 10 CPUs, so we recommend to ask for at least that amount through --cpus-per-task=10

References

Parabricks Home

@@ Line 7: / Line 7: @@
 =Usage in Compute Canada Clusters =
-'''This software has been provided freely by NVidia to help with research on COVID19 until Sunday, 17 May 2020'''
+'''This software was provided freely by NVidia to help with research on COVID19 until Sunday, 17 May 2020.'''
+Since this free period has expired, you must have your own license arrangement with NVidia in order
+to use Parabricks on Compute Canada equipment.
 ==Finding and loading Parabricks ==
 Parabricks can be looked for as a regular module through module spider:
 {{Command
@@ Line 23: / Line 27: @@
 ==Example of usage ==
 Before you embark on using Parabricks, make sure you have gone through the [https://www.nvidia.com/en-us/docs/parabricks/ Parabricks documentation], including their standalone tools and pipelines. Also make sure you know [https://docs.computecanada.ca/wiki/Using_GPUs_with_Slurm how to request graphic cards in Compute Canada clusters]. Once you understand the above, you can submit a job like:
 <pre>
 #!/bin/bash
@@ Line 46: / Line 52: @@
        --out-recal-file ${OUT_DIR}/report.txt
 </pre>
 {{Note
 |Make the path to the files absolute real paths (i.e. with the command <code>realpath .</code>)!!
 }}
 ==Common issues ==
 ===Almost immediate fail ===
-If your first test fails right away, there might be a missing module or some environmental variable clash. To solve this you can:
+If your first test fails right away, there might be a missing module or some environmental variable clash. To solve this try:
 {{Command
@@ Line 60: / Line 70: @@
 }}
-And you should be ready to go.
+===Later fail ===
-===Later fail ===
+Oftentimes Parabricks may not give you a clear traceback of the failure. This usually means that that you did not request enough memory. If you are reserving a full node already through <code>--nodes=1</code>, we suggest you also use all the memory in the node with <code>--mem=0</code>. Otherwise, make sure that your pipeline has enough memory to process your data.
-Oftentimes Parabricks would not give you a clear traceback of the fail. This usually means that that you did not requested enough memory. If you are reserving a full node already through <code>--nodes=1</code>, we suggest you also use all the memory in the node with <code>--mem=0</code>. Otherwise, make sure that your pipeline has enough memory to process your data.
 ==Hybrid usage ==
 Parabricks uses both CPU and GPUs. During our tests, Parabricks used at least 10 CPUs, so we recommend to ask for at least that amount through <code>--cpus-per-task=10</code>

Parabricks: Difference between revisions

Revision as of 18:54, 4 September 2020

Contents

Usage in Compute Canada Clusters

Finding and loading Parabricks

Example of usage

Common issues

Almost immediate fail

Later fail

Hybrid usage

References

Navigation menu

Parabricks: Difference between revisions

Revision as of 18:54, 4 September 2020

Usage in Compute Canada Clusters

Finding and loading Parabricks

Example of usage

Common issues

Almost immediate fail

Later fail

Hybrid usage

References

Navigation menu

Search