Nextflow: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
Line 66: Line 66:
==== A config for our cluster ====
==== A config for our cluster ====


Here is a reasonable config for Béluga built from the info found in the wiki [[Béluga]]
Here is a reasonable config that makes changes the default value for process and also some tweak from the wiki for the [[Béluga]] and [[Narval]] clusters put in a profile block that we will be able to load at run time.
 
A config file for our cluster:
A config file for our cluster:
{{File
{{File
Line 73: Line 72:
   |lang="config"
   |lang="config"
   |contents=
   |contents=
  profiles {
process {
  executor = 'slurm'
  pollInterval = '60 sec'
  clusterOptions = '--account=<rac-account>'
  submitRateLimit = '60/1min'
  queueSize = 100
  errorStrategy = 'retry'
  maxRetries = 1
  errorStrategy = { task.exitStatus in [125,139] ? 'retry' : 'finish' }
  memory = { check_max( 4.GB * task.attempt, 'memory' ) }
  cpu = 1  
  time  = 3h
}


profiles {
   beluga{
   beluga{
      process {
      executor = 'slurm'
      pollInterval = '60 sec'
      clusterOptions = '--account=<rac-account>'
      submitRateLimit = '60/1min'
      queueSize = 100
      errorStrategy = 'retry'
      maxRetries = 1
      errorStrategy = { task.exitStatus in [125,139] ? 'retry' : 'finish' }
      memory = { check_max( 4.GB * task.attempt, 'memory' ) }
      cpu = 1 
      time  = 3h
    }
     max_memory='186G'
     max_memory='186G'
     max_cpu=40
     max_cpu=40
    max_time='168h'
  }
  Narval{
    max_memory='249G'
    max_cpu=64
     max_time='168h'
     max_time='168h'
   }
   }
}
}
}}
}}


Line 102: Line 104:
This configuration also makes sure that there are no more than 100 jobs in the SLURM queue, that it only submit 60 jobs per minute. It states that beluga machines have 40 cores and 186 G of ram with a maximum walltime of a week (168 hours)
This configuration also makes sure that there are no more than 100 jobs in the SLURM queue, that it only submit 60 jobs per minute. It states that beluga machines have 40 cores and 186 G of ram with a maximum walltime of a week (168 hours)


That config is linked to the system you are running on, but it is also related to the pipeline itself. For example, here cpu = 1 is the default value, but steps in the pipeline can have more than that. This can get quite complicated and label are used to identify some step with specific configuration, this is done in the <code>config/base.config</code> file. But this goes beyond this tutorial.
That config is linked to the system you are running on, but it is also related to the pipeline itself. For example, here cpu = 1 is the default value, but steps in the pipeline can have more than that. This can get quite complicated and label are used to identify some step with specific configuration, this is done in the <code>workflow/config/base.config</code> file. But this goes beyond this tutorial.
 
What we do here is implementing a default restart behavior that will add some memory automatically on fail steps that have ret code 125 (out of memory) or 139 (omm killed because the process used more memory that what was allowed by cgroup).  


What we do here is implementing a default restart behavior that will add some memory automatically on fail steps that have ret code 125 (out of memory) or 139 (omm killed because the process used more memory that what was allowed by cgroup).


====Running the Pipeline====
====Running the Pipeline====

Revision as of 17:14, 14 October 2022


Description

Nextflow[1] is a software for running reproducible scientific workflows. The term "Nextflow" is used to both describe the domain-specific-language (DSL) that the pipelines are written in, and also the software used to interpret those workflows.


Usage

Nextflow is provided as a module on Compute Canada systems and can be loaded with module load nextflow.

While you can build your own workflow with Nextflow, you can also rely on the published [nf-core](https://nf-co.re/) pipelines. We will describe here a simple configuration that will let you run nf-core pipelines on our systems. This should also help you configure Nextflow properly for homemade pipelines.

We will use nf-core/smrnaseq as a nf-core pipeline example.


Mise en place

The following procedure is to be run on a login node.

We first install a pip package that will help us do our setup: the nf-core tools -- which is a mammoth, sorry about that:

module purge # we make sure that some previously loaded package are not polluting the installation 
module load python/3.8
python -m venv nf-core-env
source nf-core-env/bin/activate
python -m pip install nf_core

We set the name of the pipeline that we will test, and then load nextflow and apptainer (the new name of the singularity container utility). Nexflow integrates well with apptainer/singularity.

export NFCORE_PL=smrnaseq
export PL_VERSION=1.1.0
module load nextflow/22.04.3
module load apptainer/1.0


An important step is to download all the singularity images that will be used to run the pipeline at the same time we download the workflow itself. If we are not doing that, Nexflow will try to download the images from the compute nodes, just before steps are executed. It would not on most Alliance clusters since there is no internet connection on the compute node.

We create a folder where singularity images will be stored and set the environment variable NXF_SINGULARITY_CACHEDIR to it. Workflow images tend to be big, so do not store them in your $HOME has a small quota, prefer a spot on the project space.

mkdir /project/<def-group>/NFX_SINGULARITY_CACHEDIR
export NXF_SINGULARITY_CACHEDIR=/project/<def-group>/NFX_SINGULARITY_CACHEDIR

You can add the export line to your .bashrc, as a convenience. You should also share that folder with other member of your group that are planning to use Nextflow with singularity.

The following command will download the smrnaseq pipeline to your scratch directory, and it will also put all the apptainer/singularity container at in the cache directory

cd ~/scratch
nf-core download --singularity-cache-only --container singularity  --compress none -r ${PL_VERSION}  -p 6  ${NFCORE_PL}

This workflow will download 18 containers for a total of about 4Go. It also creates an nf-core-${NFCORE_PL}-${PL_VERSION} folder with a workflow and a config subfolder. The config subfolder includes "[institutional config](https://github.com/nf-core/configs)" while the workflow itself is in the... workflow subfolder.

Here is what a typical nf-core pipeline looks like:

$ ls nf-core-${NFCORE_PL}-${PL_VERSION}/workflow
assets  bin  CHANGELOG.md  CODE_OF_CONDUCT.md  conf  Dockerfile  docs  environment.yml  lib  LICENSE  main.nf  nextflow.config  nextflow_schema.json  README.md

Once we will be ready to launch the pipeline, Nextflow will look at the nextflow.config file and also at the ~/.nextflow/config files (if it exists) to control how to run the workflow. The nf-core pipeline all have a default config, a test config, and container configs (singularity, podman, etc). On top of that we will need a custom config for the cluster {Narval, Beluga, Cedar, Graham} you are running on.

A config for our cluster

Here is a reasonable config that makes changes the default value for process and also some tweak from the wiki for the Béluga and Narval clusters put in a profile block that we will be able to load at run time. A config file for our cluster:

File : ~/.nextflow/config

process {
  executor = 'slurm' 
  pollInterval = '60 sec'
  clusterOptions = '--account=<rac-account>'
  submitRateLimit = '60/1min'
  queueSize = 100 
  errorStrategy = 'retry'
  maxRetries = 1
  errorStrategy = { task.exitStatus in [125,139] ? 'retry' : 'finish' }
  memory = { check_max( 4.GB * task.attempt, 'memory' ) }
  cpu = 1  
  time   = 3h 
}

profiles {
  beluga{
    max_memory='186G'
    max_cpu=40
    max_time='168h'
  }
  Narval{
    max_memory='249G'
    max_cpu=64
    max_time='168h'
  }
}


You will need to replace <rac-account> with your own account, which looks like def-pi_uid

This configuration also makes sure that there are no more than 100 jobs in the SLURM queue, that it only submit 60 jobs per minute. It states that beluga machines have 40 cores and 186 G of ram with a maximum walltime of a week (168 hours)

That config is linked to the system you are running on, but it is also related to the pipeline itself. For example, here cpu = 1 is the default value, but steps in the pipeline can have more than that. This can get quite complicated and label are used to identify some step with specific configuration, this is done in the workflow/config/base.config file. But this goes beyond this tutorial.

What we do here is implementing a default restart behavior that will add some memory automatically on fail steps that have ret code 125 (out of memory) or 139 (omm killed because the process used more memory that what was allowed by cgroup).

Running the Pipeline

We will used the two provided profile, test, and singularity and the one we have just created for beluga. Note that Nextflow is mainly written in JAVA and that this then to use a lot of virtual memory. On Narval that won't be a problem, but on Beluga you will need to change the virtual memory to run most workflow, you can set it to 40G with this command ulimt -v 40000000. We also start a tmux session, so if we are disconnected the Nextflow pipeline will still run, and you will be able to reconnect.

tmux
nextflow run nf-core-${NFCORE_PL}-${PL_VERSION}/workflow -profile test,singularity,beluga  --outdir ${NFCORE_PL}_OUTPUT

So now you have started the Nexflow sub scheduler on the login node. This process sends jobs to SLURM when they are ready to be processed.

You see the progression of the pipeline right there, you can also open a new session on the cluster or detach from the tmux session to have a look at the jobs in the SLURM queue with squeue -u $USER

References