Nextflow: Difference between revisions
No edit summary |
No edit summary |
||
Line 113: | Line 113: | ||
<!--T:19--> | <!--T:19--> | ||
This configuration | This configuration ensures that there are no more than 100 jobs in the Slurm queue and that it only submits 60 jobs per minute. It indicates that Béluga machines have 40 cores and 186G of RAM with a maximum walltime of one week (168 hours). | ||
<!--T:20--> | <!--T:20--> | ||
That config is linked to the system you are running on, but it is also related to the pipeline itself. For example, here cpu = 1 is the default value, but steps in the pipeline can have more than that. This can get quite complicated and labels | That config is linked to the system you are running on, but it is also related to the pipeline itself. For example, here cpu = 1 is the default value, but steps in the pipeline can have more than that. This can get quite complicated and labels in the <code>workflow/config/base.config</code> file are used to identify a step with a specific configuration, which is not covered in this page. | ||
<!--T:21--> | <!--T:21--> | ||
What we do here is implementing a default restart behavior that will add some memory automatically on fail steps that have ret code 125 (out of memory) or 139 (omm killed because the process used more memory that what was allowed by cgroup). | What we do here is implementing a default restart behavior that will add some memory automatically on fail steps that have ret code 125 (out of memory) or 139 (omm killed because the process used more memory that what was allowed by cgroup). | ||
====Running the | ====Running the pipeline==== <!--T:22--> | ||
<!--T:23--> | <!--T:23--> | ||
We will use the two profiles provided by nf-core, test, and singularity and the profile one we have just created for | We will use the two profiles provided by nf-core, test, and singularity and the profile one we have just created for Béluga. Note that Nextflow is mainly written in JAVA and that JAVA tends to use a lot of virtual memory. On the Narval cluster that won't be a problem, but on Beluga login node you will need to change the virtual memory to run most workflows, you can set the virtual memory limit to 40G with this command <code>ulimit -v 40000000</code>. We also used a [[Prolonging_terminal_sessions#Terminal_multiplexers|terminal multiplexer]], so if we are disconnected the Nextflow pipeline will still run, and you will be able to reconnect to the controller process. Note that running Nextflow on login nodes is easy on Beluga and Naval, but not on Graham and Cedar since the login node virtual memory limit cannot be changed on these clusters; for them, we recommend launching Nextflow from a compute node, where the virtual memory is never limited. | ||
<source lang="bash"> | <source lang="bash"> | ||
nextflow run nf-core-${NFCORE_PL}-${PL_VERSION}/workflow -profile test,singularity,beluga --outdir ${NFCORE_PL}_OUTPUT | nextflow run nf-core-${NFCORE_PL}-${PL_VERSION}/workflow -profile test,singularity,beluga --outdir ${NFCORE_PL}_OUTPUT |
Revision as of 21:10, 27 March 2023
Nextflow is software for running reproducible scientific workflows. The term Nextflow is used to describe both the domain-specific-language (DSL) the pipelines are written in, and also the software used to interpret those workflows.
Usage
On our systems, Nextflow is provided as a module you can load with module load nextflow
.
While you can build your own workflow with Nextflow, you can also rely on the published nf-core pipelines. We will describe here a simple configuration that will let you run nf-core pipelines on our systems. This should also help you configure Nextflow properly for homemade pipelines.
We will use nf-core/smrnaseq
as a nf-core pipeline example.
Installation
The following procedure is to be run on a login node.
We first install a pip package that will help us do our setup: please note that the nf-core tools can be slow to install.
module purge # we make sure that some previously loaded package are not polluting the installation
module load python/3.8
python -m venv nf-core-env
source nf-core-env/bin/activate
python -m pip install nf_core
We set the name of the pipeline that we will test, and then load nextflow and apptainer (the new name of the singularity container utility). Nexflow integrates well with apptainer/singularity.
export NFCORE_PL=smrnaseq
export PL_VERSION=1.1.0
module load nextflow/22.04.3
module load apptainer/1.1.3
An important step is to download all the singularity images that will be used to run the pipeline at the same time we download the workflow itself. If we are not doing that, Nexflow will try to download the images from the compute nodes, just before steps are executed. It would not on most Alliance clusters since there is no internet connection on the compute node.
We create a folder where Singularity images will be stored and set the environment variable NXF_SINGULARITY_CACHEDIR
to it. Workflow images tend to be big, so do not store them in your $HOME space because it has a small quota; prefer a spot on the /project space instead.
mkdir /project/<def-group>/NFX_SINGULARITY_CACHEDIR
export NXF_SINGULARITY_CACHEDIR=/project/<def-group>/NFX_SINGULARITY_CACHEDIR
You can add the export line to your .bashrc
, as a convenience. You should also share that folder with other members of your group that are planning to use Nextflow with Singularity.
The following command will download the smrnaseq
pipeline to your /scratch directory and put all the Apptainer/Singularity containers in the cache directory
cd ~/scratch
nf-core download --singularity-cache-only --container singularity --compress none -r ${PL_VERSION} -p 6 ${NFCORE_PL}
This workflow will download 18 containers for a total of about 4Go. It also creates an nf-core-${NFCORE_PL}-${PL_VERSION}
folder with the workflow
and config
subfolders. The config
subfolder includes the institutional configuration while the workflow itself is in the workflow
subfolder.
This is what a typical nf-core pipeline looks like:
$ ls nf-core-${NFCORE_PL}-${PL_VERSION}/workflow
assets bin CHANGELOG.md CODE_OF_CONDUCT.md conf Dockerfile docs environment.yml lib LICENSE main.nf nextflow.config nextflow_schema.json README.md
Once we are ready to launch the pipeline, Nextflow will look at the nextflow.config
file and also at the ~/.nextflow/config
files (if it exists) to control how to run the workflow. The nf-core pipelines all have a default config, a test config, and container configs (singularity, podman, etc). We will also need a custom config for the cluster (Narval, Béluga, Cedar or Graham) you are running on. Nextflow pipelines could also run on Niagara if they where designed with that specific cluster in mind, but we would generally discourage you to try running nf-core or any other generic Nextflow pipeline there.
A config for our clusters
You can use the following config by changing the default value for nf-core processes and enter the correct information for the Béluga and Narval clusters. This config is saved in a profile block that we will load at runtime.
process {
executor = 'slurm'
pollInterval = '60 sec'
clusterOptions = '--account=<my-account>'
submitRateLimit = '60/1min'
queueSize = 100
errorStrategy = 'retry'
maxRetries = 1
errorStrategy = { task.exitStatus in [125,139] ? 'retry' : 'finish' }
memory = { check_max( 4.GB * task.attempt, 'memory' ) }
cpu = 1
time = '3h'
}
profiles {
beluga{
max_memory='186G'
max_cpu=40
max_time='168h'
}
narval{
max_memory='249G'
max_cpu=64
max_time='168h'
}
}
Replace <my-account>
with your own account, which looks like def-pname
.
This configuration ensures that there are no more than 100 jobs in the Slurm queue and that it only submits 60 jobs per minute. It indicates that Béluga machines have 40 cores and 186G of RAM with a maximum walltime of one week (168 hours).
That config is linked to the system you are running on, but it is also related to the pipeline itself. For example, here cpu = 1 is the default value, but steps in the pipeline can have more than that. This can get quite complicated and labels in the workflow/config/base.config
file are used to identify a step with a specific configuration, which is not covered in this page.
What we do here is implementing a default restart behavior that will add some memory automatically on fail steps that have ret code 125 (out of memory) or 139 (omm killed because the process used more memory that what was allowed by cgroup).
Running the pipeline
We will use the two profiles provided by nf-core, test, and singularity and the profile one we have just created for Béluga. Note that Nextflow is mainly written in JAVA and that JAVA tends to use a lot of virtual memory. On the Narval cluster that won't be a problem, but on Beluga login node you will need to change the virtual memory to run most workflows, you can set the virtual memory limit to 40G with this command ulimit -v 40000000
. We also used a terminal multiplexer, so if we are disconnected the Nextflow pipeline will still run, and you will be able to reconnect to the controller process. Note that running Nextflow on login nodes is easy on Beluga and Naval, but not on Graham and Cedar since the login node virtual memory limit cannot be changed on these clusters; for them, we recommend launching Nextflow from a compute node, where the virtual memory is never limited.
nextflow run nf-core-${NFCORE_PL}-${PL_VERSION}/workflow -profile test,singularity,beluga --outdir ${NFCORE_PL}_OUTPUT
So now you have started the Nexflow sub scheduler on the login node. This process sends jobs to SLURM when they are ready to be processed.
You see the progression of the pipeline right there, you can also open a new session on the cluster or detach from the tmux session to have a look at the jobs in the SLURM queue with squeue -u $USER