Cellranger
This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.
Cellranger Description
Cellranger is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more. Note: This page does not cover all features of Cellranger.
Please refer to https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger for the complete list of all subtools.
Availability and module loading
To query all available version of cellranger on Compute Canada stack:
[name@server ~]$ module spider cellranger
You can load the version of your choice using module load. For example, to load cellranger 5.0.1 use command
[name@server ~]$ module load mugqic/cellranger/5.0.1
To load the default version of cellranger use command
[name@server ~]$ module load mugqic/cellranger/5.0.1 [name@server ~]$ cellranger cellranger cellranger-5.0.1 Process 10x Genomics Gene Expression, Feature Barcode, and Immune Profiling data USAGE: cellranger <SUBCOMMAND>
General usage
Demultiplexing
Cellranger mkfastq demultiplexes raw base call (BCL) files generated by Illumina sequencers into FASTQ files. It is a wrapper around Illumina's bcl2fastq, with additional features that are specific to 10x Genomics libraries and a simplified sample sheet format.
A simple csv sample sheet is recommended for most sequencing experiments. The simple csv format has only three columns (Lane, Sample, Index):
Lane,Sample,Index 1,test_sample,SI-TT-D9
You can run the mkfastq pipeline as follows:
[name@server ~] cellranger mkfastq --id=$ID --run=/path/to/bcl --csv=test_sample.csv
Counting
Cellranger count takes FASTQ files from cellranger mkfastq and performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis. The count pipeline can take input from multiple sequencing runs on the same GEM well. Cellranger count also processes Feature Barcode data alongside Gene Expression reads.
[name@server ~] cellranger count --id=$ID --fastqs=$FASTQS --transcriptome=refdata-gex-GRCh38-2020-A --include-introns
Aggregating
Cellranger aggr aggregates outputs from multiple runs of cellranger count, normalizing those runs to the same sequencing depth and then recomputing the feature-barcode matrices and analysis on the combined data. The aggr pipeline can be used to combine data from multiple samples into an experiment-wide feature-barcode matrix and analysis.
To aggregate the datasets, you need to create a CSV containing the following columns:
sample_id,molecule_h5 Sample1,/opt/runs/outs/per_sample_outs/Sample1/count/sample_molecule_info.h5 Sample2,/opt/runs/outs/per_sample_outs/Sample2/count/sample_molecule_info.h5
You can run the aggr pipeline as follows:
[name@server ~] cellranger aggr --id=$ID --csv=aggr.csv
Cellranger multi
cellranger multi is used to analyze Cell Multiplexing data. It inputs FASTQ files from cellranger mkfastq and performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis. The cellranger multi pipeline also supports the analysis of Feature Barcode data.
Running cellranger multi requires a config CSV, described below:
[gene-expression] reference,/path/to/transcriptome expect-cells, enter expected number of recovered cells include-introns,true
[libraries] fastq_id,fastqs,feature_types gex1,/path/to/fastqs,Gene Expression mux1,/path/to/fastqs,Multiplexing Capture
[samples] sample_id,cmo_ids sample1,CMO301|CMO302 sample2,CMO303|CMO304
You can run the aggr pipeline as follows:
[name@server ~] cellranger multi --id=sample345 –csv=confi.csv
Processing multiple files with multithreading and/or GNU parallel
Cell Ranger can run using multiple nodes on the cluster. This method provides high performance, but is difficult to troubleshoot. 10x Genomics does not officially support Slurm or Torque/PBS. While it’s possible to run Cell Ranger with those job schedulers in cluster mode, it is unsupported and may require trial and error.
Instead of submitting one job to the cluster, Cell Ranger creates hundreds and potentially thousands of small stage jobs. Each of these stage jobs need to be queued, launched, and tracked by the pipeline framework. The necessary coordination between Cell Ranger and the cluster makes this approach harder to set up and troubleshoot, since every cluster configuration is different.
To learn more, please go to the cluster mode page https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/choosing-how-to-run#cluster
Running Cellranger within CCF
#!/bin/bash #SBATCH --account=cc-debug #SBATCH -N 1 #SBATCH --ntasks-per-node=8 #SBATCH --mem=50g #SBATCH --time=24:00:00 module load mugqic/cellranger/5.0.1 FASTQS=$1 ID=$2 WORK_DIR=3 cd $WORK_DIR cellranger count --id=$ID --fastqs=$FASTQS --transcriptome=refdata-gex-GRCh38-2020-A --include-introns