Parabricks: Difference between revisions

Remove advertising language in into
(mention license requirement)
(Remove advertising language in into)
Line 1: Line 1:
{{Draft}}
{{Draft}}
Parabricks is a software suite for performing secondary analysis of next generation sequencing (NGS) DNA data. A major benefit of Parabricks is that it is designed to deliver results at blazing fast speeds and low cost. Parabricks can analyze whole human genomes in about 45 minutes, compared to about 30 hours for 30x WGS data. The best part is the output results exactly match the commonly used software. So, it's fairly simple to verify the accuracy of the ouput.


Under the hood, it achieves this performance through tight integration with GPUs, which excel at performing data parallel computation much more effectively than traditional CPU-based solutions. Parabricks was built from the ground up by GPU computing and Deep Learning experts who wanted to develop the fastest and most efficient possible implementation of common genomics algorithms used in secondary analysis.
Parabricks is a software suite for performing secondary analysis of next generation sequencing (NGS) DNA data. Parabricks is extremely fast: It can analyze the whole human genome in about 45 minutes, compared to about 30 hours for 30x [https://en.wikipedia.org/wiki/Whole-genome_shotgun WGS] data. It achieves this performance through tight integration with GPUs.


You can learn more at [http://www.nvidia.com/parabricks www.nvidia.com/parabricks]
You can learn more at [http://www.nvidia.com/parabricks www.nvidia.com/parabricks]
Line 12: Line 11:
to use Parabricks on Compute Canada equipment.
to use Parabricks on Compute Canada equipment.


==Finding and loading Parabricks ==
== Finding and loading Parabricks ==


Parabricks can be looked for as a regular module through module spider:
Parabricks can be looked for as a regular module through module spider:
Line 26: Line 25:




==Example of usage ==
== Example of use ==


Before you embark on using Parabricks, make sure you have gone through the [https://www.nvidia.com/en-us/docs/parabricks/ Parabricks documentation], including their standalone tools and pipelines. Also make sure you know [https://docs.computecanada.ca/wiki/Using_GPUs_with_Slurm how to request graphic cards in Compute Canada clusters]. Once you understand the above, you can submit a job like:
Before you use Parabricks, make sure you have gone through the [https://www.nvidia.com/en-us/docs/parabricks/ Parabricks documentation], including their standalone tools and pipelines. Also make sure you know [https://docs.computecanada.ca/wiki/Using_GPUs_with_Slurm how to request GPUs in Compute Canada clusters]. Once you understand the above, you can submit a job like:


<pre>
<pre>
Line 54: Line 53:


{{Note
{{Note
|Make the path to the files absolute real paths (i.e. with the command <code>realpath .</code>)!!
|Make the path to the files absolute real paths (i.e. with the command <code>realpath .</code>)
}}
}}


==Common issues ==
== Common issues ==


===Almost immediate fail ===
=== Almost immediate failure ===


If your first test fails right away, there might be a missing module or some environmental variable clash. To solve this try:
If your first test fails right away, there might be a missing module or some environmental variable clash. To solve this try:
Line 70: Line 69:
}}
}}


===Later fail ===
=== Later failure ===


Oftentimes Parabricks may not give you a clear traceback of the failure. This usually means that that you did not request enough memory. If you are reserving a full node already through <code>--nodes=1</code>, we suggest you also use all the memory in the node with <code>--mem=0</code>. Otherwise, make sure that your pipeline has enough memory to process your data.
Often Parabricks may not give you a clear traceback of the failure. This usually means that that you did not request enough memory. If you are reserving a full node already through <code>--nodes=1</code>, we suggest you also use all the memory in the node with <code>--mem=0</code>. Otherwise, make sure that your pipeline has enough memory to process your data.


==Hybrid usage ==
== Hybrid usage ==


Parabricks uses both CPU and GPUs. During our tests, Parabricks used at least 10 CPUs, so we recommend to ask for at least that amount through <code>--cpus-per-task=10</code>
Parabricks uses both CPU and GPUs. During our tests, Parabricks used at least 10 CPUs, so we recommend to ask for at least that amount through <code>--cpus-per-task=10</code>
Bureaucrats, cc_docs_admin, cc_staff
2,879

edits