Parabricks: Difference between revisions
(remove draft tag, mark for translation) |
(Marked this version for translation) |
||
Line 3: | Line 3: | ||
<translate> | <translate> | ||
<!--T:1--> | |||
Parabricks is a software suite for performing secondary analysis of next generation sequencing (NGS) DNA data. Parabricks is extremely fast: It can analyze the whole human genome in about 45 minutes, compared to about 30 hours for 30x [https://en.wikipedia.org/wiki/Whole-genome_shotgun WGS] data. It achieves this performance through tight integration with GPUs. | Parabricks is a software suite for performing secondary analysis of next generation sequencing (NGS) DNA data. Parabricks is extremely fast: It can analyze the whole human genome in about 45 minutes, compared to about 30 hours for 30x [https://en.wikipedia.org/wiki/Whole-genome_shotgun WGS] data. It achieves this performance through tight integration with GPUs. | ||
<!--T:2--> | |||
You can learn more at [http://www.nvidia.com/parabricks www.nvidia.com/parabricks] | You can learn more at [http://www.nvidia.com/parabricks www.nvidia.com/parabricks] | ||
=Usage in Compute Canada Clusters = | =Usage in Compute Canada Clusters = <!--T:3--> | ||
<!--T:4--> | |||
'''This software was provided freely by NVidia to help with research on COVID19 until Sunday, 17 May 2020.''' | '''This software was provided freely by NVidia to help with research on COVID19 until Sunday, 17 May 2020.''' | ||
Since this free period has expired, you must have your own license arrangement with NVidia in order | Since this free period has expired, you must have your own license arrangement with NVidia in order | ||
to use Parabricks on Compute Canada equipment. | to use Parabricks on Compute Canada equipment. | ||
== Finding and loading Parabricks == | == Finding and loading Parabricks == <!--T:5--> | ||
<!--T:6--> | |||
Parabricks can be looked for as a regular module through module spider: | Parabricks can be looked for as a regular module through module spider: | ||
{{Command | {{Command | ||
Line 20: | Line 24: | ||
}} | }} | ||
<!--T:7--> | |||
Likewise, it can be loaded through LMOD modules: | Likewise, it can be loaded through LMOD modules: | ||
<!--T:8--> | |||
{{Command | {{Command | ||
|module load parabricks/2.5.0 | |module load parabricks/2.5.0 | ||
Line 27: | Line 33: | ||
== Example of use == | == Example of use == <!--T:9--> | ||
<!--T:10--> | |||
Before you use Parabricks, make sure you have gone through the [https://www.nvidia.com/en-us/docs/parabricks/ Parabricks documentation], including their standalone tools and pipelines. Also make sure you know [https://docs.computecanada.ca/wiki/Using_GPUs_with_Slurm how to request GPUs in Compute Canada clusters]. Once you understand the above, you can submit a job like: | Before you use Parabricks, make sure you have gone through the [https://www.nvidia.com/en-us/docs/parabricks/ Parabricks documentation], including their standalone tools and pipelines. Also make sure you know [https://docs.computecanada.ca/wiki/Using_GPUs_with_Slurm how to request GPUs in Compute Canada clusters]. Once you understand the above, you can submit a job like: | ||
<!--T:11--> | |||
<pre> | <pre> | ||
#!/bin/bash | #!/bin/bash | ||
Line 40: | Line 48: | ||
#SBATCH --time=5:00:00 | #SBATCH --time=5:00:00 | ||
<!--T:12--> | |||
module load parabricks/2.5.0 | module load parabricks/2.5.0 | ||
<!--T:13--> | |||
DATA_DIR=/path/to/data | DATA_DIR=/path/to/data | ||
OUT_DIR=/path/to/output | OUT_DIR=/path/to/output | ||
Line 54: | Line 64: | ||
</pre> | </pre> | ||
<!--T:14--> | |||
{{Note | {{Note | ||
|Make the path to the files absolute real paths (i.e. with the command <code>realpath .</code>) | |Make the path to the files absolute real paths (i.e. with the command <code>realpath .</code>) | ||
}} | }} | ||
== Common issues == | == Common issues == <!--T:15--> | ||
=== Almost immediate failure === | === Almost immediate failure === <!--T:16--> | ||
<!--T:17--> | |||
If your first test fails right away, there might be a missing module or some environmental variable clash. To solve this try: | If your first test fails right away, there might be a missing module or some environmental variable clash. To solve this try: | ||
<!--T:18--> | |||
{{Command | {{Command | ||
|module --force purge | |module --force purge | ||
Line 71: | Line 84: | ||
}} | }} | ||
=== Later failure === | === Later failure === <!--T:19--> | ||
<!--T:20--> | |||
Often Parabricks may not give you a clear traceback of the failure. This usually means that that you did not request enough memory. If you are reserving a full node already through <code>--nodes=1</code>, we suggest you also use all the memory in the node with <code>--mem=0</code>. Otherwise, make sure that your pipeline has enough memory to process your data. | Often Parabricks may not give you a clear traceback of the failure. This usually means that that you did not request enough memory. If you are reserving a full node already through <code>--nodes=1</code>, we suggest you also use all the memory in the node with <code>--mem=0</code>. Otherwise, make sure that your pipeline has enough memory to process your data. | ||
== Hybrid usage == | == Hybrid usage == <!--T:21--> | ||
<!--T:22--> | |||
Parabricks uses both CPU and GPUs. During our tests, Parabricks used at least 10 CPUs, so we recommend to ask for at least that amount through <code>--cpus-per-task=10</code> | Parabricks uses both CPU and GPUs. During our tests, Parabricks used at least 10 CPUs, so we recommend to ask for at least that amount through <code>--cpus-per-task=10</code> | ||
=References = | =References = <!--T:23--> | ||
[http://www.nvidia.com/parabricks Parabricks Home] | [http://www.nvidia.com/parabricks Parabricks Home] | ||
<!--T:24--> | |||
[[Category:Bioinformatics]] | [[Category:Bioinformatics]] | ||
[[Category:Software]] | [[Category:Software]] |
Revision as of 19:05, 4 September 2020
Parabricks is a software suite for performing secondary analysis of next generation sequencing (NGS) DNA data. Parabricks is extremely fast: It can analyze the whole human genome in about 45 minutes, compared to about 30 hours for 30x WGS data. It achieves this performance through tight integration with GPUs.
You can learn more at www.nvidia.com/parabricks
Usage in Compute Canada Clusters
This software was provided freely by NVidia to help with research on COVID19 until Sunday, 17 May 2020. Since this free period has expired, you must have your own license arrangement with NVidia in order to use Parabricks on Compute Canada equipment.
Finding and loading Parabricks
Parabricks can be looked for as a regular module through module spider:
[name@server ~]$ module spider parabricks
Likewise, it can be loaded through LMOD modules:
[name@server ~]$ module load parabricks/2.5.0
Example of use
Before you use Parabricks, make sure you have gone through the Parabricks documentation, including their standalone tools and pipelines. Also make sure you know how to request GPUs in Compute Canada clusters. Once you understand the above, you can submit a job like:
#!/bin/bash #SBATCH --account=def-someuser #SBATCH --gres=gpu:1 #SBATCH --nodes=1 #SBATCH --cpus-per-task=32 #SBATCH --mem=0 #SBATCH --time=5:00:00 module load parabricks/2.5.0 DATA_DIR=/path/to/data OUT_DIR=/path/to/output pbrun germline \ --ref ${DATA_DIR}/Homo_sapiens_assembly38.fa \ --in-fq ${DATA_DIR}/some_1.fastq ${DATA_DIR}/some_2.fastq \ --knownSites ${DATA_DIR}/dbsnp_146.hg38.vcf.gz \ --tmp-dir ${SLURM_TMPDIR}/ \ --out-bam ${OUT_DIR}/output.bam \ --out-variants ${OUT_DIR}/output.vcf \ --out-recal-file ${OUT_DIR}/report.txt
Make the path to the files absolute real paths (i.e. with the command |
Common issues
Almost immediate failure
If your first test fails right away, there might be a missing module or some environmental variable clash. To solve this try:
[name@server ~]$ module --force purge
[name@server ~]$ module load StdEnv/2016.4 nixpkgs/16.09 parabricks/2.5.0
Later failure
Often Parabricks may not give you a clear traceback of the failure. This usually means that that you did not request enough memory. If you are reserving a full node already through --nodes=1
, we suggest you also use all the memory in the node with --mem=0
. Otherwise, make sure that your pipeline has enough memory to process your data.
Hybrid usage
Parabricks uses both CPU and GPUs. During our tests, Parabricks used at least 10 CPUs, so we recommend to ask for at least that amount through --cpus-per-task=10