Cellranger: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
 
(50 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<pre>
{{Draft}}
{{Draft}}
</pre>
== Cellranger Description == <!--T:1-->
== Cellranger Description == <!--T:1-->
Cell Ranger is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
Cellranger is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis.
Note: This page does not cover all features of Cellranger.  


Please refer to https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger for the complete list of all subtools.
Please refer to the [https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis official documentation] for the complete list of all subtools.


== Availability and module loading == <!--T:3-->
== Download and installation == <!--T:3-->
Cell Ranger is licensed, the users have to register and download the file from https://www.10xgenomics.com/support/software/cell-ranger/downloads/eula?closeUrl=%2Fsu


To query all available version of cellranger  on Compute Canada stack:


    [name@server ~]$ module spider cellranger
=== Download and unpack the cellranger-x.y.z.tar.gz tar file ===
      # [ download file from downloads page ]  
      tar -xzvf cellranger-x.y.z.tar.gz
If you downloaded Cell Ranger in the .xz compression format, be sure to use the correct file extension tar.xz and tar flags to unpack:
      tar -xvf cellranger-x.y.z.tar.xz
This unpacks Cell Ranger, its dependencies, and the cellranger script into a new directory called cellranger-x.y.z.


You can load the version of your choice using module load. For example, to load cellranger 5.0.1  use command
=== Download and unpack any of the reference data files in a convenient location===
      # [ download file from downloads page ]
      # Example human reference transcriptome
      tar -xzvf refdata-gex-GRCh38-2020-A.tar.gz
This creates a new directory called refdata-gex-GRCh38-2020-A that contains a single reference (in this case, GRCh38). Each reference contains a set of pre-generated indices and other data required by Cell Ranger.


    [name@server ~]$ module load mugqic/cellranger/5.0.1
=== Prepend the Cell Ranger directory to your $PATH. This will allow you to invoke the cellranger command.===
      export PATH=/opt/cellranger-x.y.z:$PATH
You may wish to add this command to your .bashrc for convenience.


To load the default version of cellranger use command
=== Run cellranger===
 
    [name@server ~]$ module load mugqic/cellranger/5.0.1
     [name@server ~]$ cellranger
     [name@server ~]$ cellranger
    cellranger cellranger-5.0.1
     Process 10x Genomics Gene Expression, Feature Barcode, and Immune Profiling data
     Process 10x Genomics Gene Expression, Feature Barcode, and Immune Profiling data
     USAGE: cellranger <SUBCOMMAND>
     USAGE: cellranger <SUBCOMMAND>
Line 28: Line 33:
== General usage == <!--T:4-->
== General usage == <!--T:4-->
=== Demultiplexing === <!--T:12-->
=== Demultiplexing === <!--T:12-->
cellranger mkfastq demultiplexes raw base call (BCL) files generated by Illumina sequencers into FASTQ files. It is a wrapper around Illumina's bcl2fastq, with additional features that are specific to 10x Genomics libraries and a simplified sample sheet format.
Cellranger mkfastq demultiplexes raw base call (BCL) files generated by Illumina sequencers into FASTQ files. It is a wrapper around Illumina's bcl2fastq, with additional features that are specific to 10x Genomics libraries and a simplified sample sheet format.


A simple csv sample sheet is recommended for most sequencing experiments. The simple csv format has only three columns (Lane, Sample, Index):
A simple csv sample sheet is recommended for most sequencing experiments. The simple csv format has only three columns (Lane, Sample, Index)


     Lane,Sample,Index
     Lane,Sample,Index
     1,test_sample,SI-TT-D9
     1,test_sample,SI-TT-D9
If you have multiple library types (e.g., Gene Expression, Feature Barcode, and Cell Multiplexing) that all have the same type of indexing (e.g., dual-indexing), the samples can be demultiplexed together and the CSV could be formatted as follows


You can run the mkfastq pipeline as follows:
    Lane,Sample,Index
    1,GEX_sample,SI-TT-D9
    1,FB_sample,SI-NT-A1
    1,CMO_sample,SI-NN-A1
You can run the mkfastq pipeline as follows


     [name@server ~] cellranger mkfastq --id=$ID  
     [name@server ~] cellranger mkfastq --id=$ID  
Line 42: Line 52:


=== Counting === <!--T:13-->
=== Counting === <!--T:13-->
cellranger count takes FASTQ files from cellranger mkfastq and performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis. The count pipeline can take input from multiple sequencing runs on the same GEM well. Cellranger count also processes Feature Barcode data alongside Gene Expression reads.
Cellranger count takes FASTQ files from cellranger mkfastq and performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis. The count pipeline can take input from multiple sequencing runs on the same GEM well. Cellranger count also processes Feature Barcode data alongside Gene Expression reads.
 
 
    [name@server ~] cellranger count --id=$ID \
                    --transcriptome=refdata-gex-GRCh38-2020-A \
                    --fastqs=$FASTQS \
                    --sample=mysample \
                    --create-bam=true \
                    --localcores=8 \
                    --localmem=64


    [name@server ~] cellranger count --id=$ID
Cell Ranger provides a set of analysis pipelines that process Chromium Single Cell Gene Expression data to align reads, generate Feature Barcode matrices, perform clustering and other secondary analysis, and more.
                                    --fastqs=$FASTQS 
The required input files for running Cell Ranger vary depending on the chosen pipeline. To select the appropriate pipeline for your needs, please refer to the Choosing a pipeline page.
                                    --transcriptome=refdata-gex-GRCh38-2020-A
[https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/running-pipelines/cr-choosing-a-pipeline  Choosing a pipeline]
                                    --include-introns


=== Aggregating === <!--T:14-->
=== Aggregating === <!--T:14-->
cellranger aggr aggregates outputs from multiple runs of cellranger count, normalizing those runs to the same sequencing depth and then recomputing the feature-barcode matrices and analysis on the combined data. The aggr pipeline can be used to combine data from multiple samples into an experiment-wide feature-barcode matrix and analysis.
Cellranger aggr aggregates outputs from multiple runs of cellranger count, normalizing those runs to the same sequencing depth and then recomputing the feature-barcode matrices and analysis on the combined data. The aggr pipeline can be used to combine data from multiple samples into an experiment-wide feature-barcode matrix and analysis.




To aggregate the datasets, you need to create a CSV containing the following columns:
To aggregate the datasets, you need to create a CSV containing the following columns
     sample_id,molecule_h5
     sample_id,molecule_h5
     Sample1,/opt/runs/outs/per_sample_outs/Sample1/count/sample_molecule_info.h5
     Sample1,/opt/runs/outs/per_sample_outs/Sample1/count/sample_molecule_info.h5
     Sample2,/opt/runs/outs/per_sample_outs/Sample2/count/sample_molecule_info.h5
     Sample2,/opt/runs/outs/per_sample_outs/Sample2/count/sample_molecule_info.h5


You can run the aggr pipeline as follows:
You can run the aggr pipeline as follows
     [name@server ~] cellranger aggr --id=$ID --csv=aggr.csv
     [name@server ~] cellranger aggr --id=$ID --csv=aggr.csv


=== Cellranger multi === <!--T:15-->
= Running Cellranger in the alliance clusters = <!--T:6-->
cellranger multi is used to analyze Cell Multiplexing data. It inputs FASTQ files from cellranger mkfastq and performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis. The cellranger multi pipeline also supports the analysis of Feature Barcode data.
 
Running cellranger multi requires a config CSV, described below:
    [gene-expression]
    reference,/path/to/transcriptome
    expect-cells, enter expected number of recovered cells
    include-introns,true
 
    [libraries]
    fastq_id,fastqs,feature_types
    gex1,/path/to/fastqs,Gene Expression
    mux1,/path/to/fastqs,Multiplexing Capture
 
    [samples]
    sample_id,cmo_ids
    sample1,CMO301|CMO302
    sample2,CMO303|CMO304
 
You can run the aggr pipeline as follows:
    [name@server ~] cellranger multi --id=sample345 –csv=confi.csv
 
=== Processing multiple files with multithreading and/or GNU parallel === <!--T:16-->
 
Cell Ranger can run using multiple nodes on the cluster. This method provides high performance, but is difficult to troubleshoot.
10x Genomics does not officially support Slurm or Torque/PBS. While it’s possible to run Cell Ranger with those job schedulers in cluster mode, it is unsupported and may require trial and error.
 
Instead of submitting one job to the cluster, Cell Ranger creates hundreds and potentially thousands of small stage jobs. Each of these stage jobs need to be queued, launched, and tracked by the pipeline framework. The necessary coordination between Cell Ranger and the cluster makes this approach harder to set up and troubleshoot, since every cluster configuration is different. To learn more, please go to the cluster mode page
https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/choosing-how-to-run#cluster
 
= Running Cellranger within CCF = <!--T:6-->


     #!/bin/bash
     #!/bin/bash
     #SBATCH --account=cc-debug
     #SBATCH --account=def-someprof
     #SBATCH -N 1
     #SBATCH -N 1
     #SBATCH --ntasks-per-node=8
     #SBATCH --ntasks-per-node=8
     #SBATCH --mem=50g
     #SBATCH --mem=64g
     #SBATCH --time=24:00:00
     #SBATCH --time=24:00:00
 
    module load mugqic/cellranger/5.0.1
 
     FASTQS=$1
     FASTQS=$1
     ID=$2
     ID=$2
     WORK_DIR=3
     WORK_DIR=$3
 
     cd $WORK_DIR
     cd $WORK_DIR
     cellranger count --id=$ID \
     cellranger count --id=$ID \
    --fastqs=$FASTQS \
                    --fastqs=$FASTQS \
    --transcriptome=refdata-gex-GRCh38-2020-A \
                    --transcriptome=refdata-gex-GRCh38-2020-A \
    --include-introns
                    --create-bam=true \
                    --localcores=8 \
                    --localmem=64


=== References === <!--T:7-->
= References = <!--T:7-->
https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/tutorial_ov
[https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis official documentation]

Latest revision as of 18:55, 19 August 2024


This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.



Cellranger Description

Cellranger is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis.

Please refer to the official documentation for the complete list of all subtools.

Download and installation

Cell Ranger is licensed, the users have to register and download the file from https://www.10xgenomics.com/support/software/cell-ranger/downloads/eula?closeUrl=%2Fsu


Download and unpack the cellranger-x.y.z.tar.gz tar file

      # [ download file from downloads page ] 
      tar -xzvf cellranger-x.y.z.tar.gz

If you downloaded Cell Ranger in the .xz compression format, be sure to use the correct file extension tar.xz and tar flags to unpack:

      tar -xvf cellranger-x.y.z.tar.xz

This unpacks Cell Ranger, its dependencies, and the cellranger script into a new directory called cellranger-x.y.z.

Download and unpack any of the reference data files in a convenient location

      # [ download file from downloads page ]
      # Example human reference transcriptome
      tar -xzvf refdata-gex-GRCh38-2020-A.tar.gz

This creates a new directory called refdata-gex-GRCh38-2020-A that contains a single reference (in this case, GRCh38). Each reference contains a set of pre-generated indices and other data required by Cell Ranger.

Prepend the Cell Ranger directory to your $PATH. This will allow you to invoke the cellranger command.

      export PATH=/opt/cellranger-x.y.z:$PATH

You may wish to add this command to your .bashrc for convenience.

Run cellranger

   [name@server ~]$ cellranger
   Process 10x Genomics Gene Expression, Feature Barcode, and Immune Profiling data
   USAGE: cellranger <SUBCOMMAND>

General usage

Demultiplexing

Cellranger mkfastq demultiplexes raw base call (BCL) files generated by Illumina sequencers into FASTQ files. It is a wrapper around Illumina's bcl2fastq, with additional features that are specific to 10x Genomics libraries and a simplified sample sheet format.

A simple csv sample sheet is recommended for most sequencing experiments. The simple csv format has only three columns (Lane, Sample, Index)

   Lane,Sample,Index
   1,test_sample,SI-TT-D9

If you have multiple library types (e.g., Gene Expression, Feature Barcode, and Cell Multiplexing) that all have the same type of indexing (e.g., dual-indexing), the samples can be demultiplexed together and the CSV could be formatted as follows

   Lane,Sample,Index
   1,GEX_sample,SI-TT-D9
   1,FB_sample,SI-NT-A1
   1,CMO_sample,SI-NN-A1

You can run the mkfastq pipeline as follows

   [name@server ~] cellranger mkfastq --id=$ID 
                            --run=/path/to/bcl 
                            --csv=test_sample.csv

Counting

Cellranger count takes FASTQ files from cellranger mkfastq and performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis. The count pipeline can take input from multiple sequencing runs on the same GEM well. Cellranger count also processes Feature Barcode data alongside Gene Expression reads.


   [name@server ~] cellranger count --id=$ID \
                   --transcriptome=refdata-gex-GRCh38-2020-A \
                   --fastqs=$FASTQS \
                   --sample=mysample \
                   --create-bam=true \
                   --localcores=8 \
                   --localmem=64

Cell Ranger provides a set of analysis pipelines that process Chromium Single Cell Gene Expression data to align reads, generate Feature Barcode matrices, perform clustering and other secondary analysis, and more. The required input files for running Cell Ranger vary depending on the chosen pipeline. To select the appropriate pipeline for your needs, please refer to the Choosing a pipeline page. Choosing a pipeline

Aggregating

Cellranger aggr aggregates outputs from multiple runs of cellranger count, normalizing those runs to the same sequencing depth and then recomputing the feature-barcode matrices and analysis on the combined data. The aggr pipeline can be used to combine data from multiple samples into an experiment-wide feature-barcode matrix and analysis.


To aggregate the datasets, you need to create a CSV containing the following columns

   sample_id,molecule_h5
   Sample1,/opt/runs/outs/per_sample_outs/Sample1/count/sample_molecule_info.h5
   Sample2,/opt/runs/outs/per_sample_outs/Sample2/count/sample_molecule_info.h5

You can run the aggr pipeline as follows

   [name@server ~] cellranger aggr --id=$ID --csv=aggr.csv

Running Cellranger in the alliance clusters

   #!/bin/bash
   #SBATCH --account=def-someprof
   #SBATCH -N 1
   #SBATCH --ntasks-per-node=8
   #SBATCH --mem=64g
   #SBATCH --time=24:00:00

   FASTQS=$1
   ID=$2
   WORK_DIR=$3
   cd $WORK_DIR
   cellranger count --id=$ID \
                    --fastqs=$FASTQS \
                    --transcriptome=refdata-gex-GRCh38-2020-A \
                    --create-bam=true \
                    --localcores=8 \
                    --localmem=64

References

official documentation