Dependencies

University of California, Santa Cruz Genomics Institute

Guide: Running the Single Cell RNA-seq Pipeline using Toil

This guide attempts to walk the user through running this pipeline from start to finish. If there are any questions please contact John Vivian (jtvivian@gmail.com). If you find any errors or corrections please feel free to make a pull request. Feedback of any kind is appreciated.

Dependencies
Installation
Inputs
Usage
Methods

Overview

RNA-seq fastqs generated from 10x Chromium single-cell experiments are quantified to produce a gene by cell matrix. Additional QC plots are generated

This pipeline produces a tarball (tar.gz) file for a given sample that contains n subdirectories:

The output tarball is prepended with the UUID for the sample (e.g. UUID.tar.gz).

Dependencies

This pipeline has been tested on Ubuntu 14.04, but should also run on other unix based systems. apt-get and pip often require sudo privilege, so if the below commands fail, try prepending sudo. If you do not have sudo privileges you will need to build these tools from source, or bug a sysadmin about how to get them (they don't mind).

General Dependencies

1. Python 2.7
2. Curl         apt-get install curl
3. Docker       http://docs.docker.com/engine/installation/

Python Dependencies

1. Toil         pip install toil
2. S3AM         pip install --pre s3am (optional, needed for uploading output to S3)

System Dependencies

Installation

Inputs

The CGL RNA-seq pipeline requires an index file in order to run. This file is hosted on Synapse and can be downloaded after creating an account which takes about 1 minute and is free.

Register for a Synapse account
Either download the samples from the website GUI or use the Python API
pip install synapseclient
python
- import synapseclient
- syn = synapseclient.Synapse()
- syn.login('foo@bar.com', 'password')
- Get the Kallisto index reference
  - syn.get('syn5889216', downloadLocation='.')

All samples and inputs must be submitted as URLs with support for the following schemas: http://, file://, s3://, ftp://.

Samples consisting of tarballs with fastq files inside must follow the file name convention of ending in an R1/R2 or _1/_2 followed by .fastq.gz, .fastq, .fq.gz or .fq..

General Usage

Type toil-rnaseq to get basic help menu and instructions

Type toil-rnaseq-sc generate to create an editable manifest and config in the current working directory.
Parameterize the pipeline by editing the config.
Fill in the manifest with information pertaining to your samples.
Type toil-rnaseq-sc run [jobStore] to execute the pipeline.

Example Commands

Run sample(s) locally using the manifest

toil-rnaseq-sc generate
Fill in config and manifest
toil-rnaseq-sc run ./example-jobstore

Toil options can be appended to toil-rnaseq run, for example: toil-rnaseq-sc run ./example-jobstore --retryCount=1 --workDir=/data

For a complete list of Toil options, just type toil-rnaseq run -h

Run a variety of samples locally

toil-rnaseq-sc generate-config
Fill in config
toil-rnaseq-sc run ./example-jobstore --retryCount=1 --workDir=/data --samples \ s3://example-bucket/sample_1.tar file:///full/path/to/sample_2.tar https://sample-depot.com/sample_3.tar

Example Config

kallisto-index: s3://cgl-pipeline-inputs/rnaseq_cgl/kallisto_hg38.idx
output-dir: /data/my-toil-run
ssec: 
ci-test:

Distributed Run

To run on a distributed AWS cluster, see CGCloud for instance provisioning, then run toil-rnaseq-sc run aws:us-west-2:example-jobstore-bucket --batchSystem=mesos --mesosMaster mesos-master:5050 to use the AWS job store and mesos batch system.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
src/toil_rnaseq_sc		src/toil_rnaseq_sc
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
jenkins.sh		jenkins.sh
setup.py		setup.py
version.py		version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

University of California, Santa Cruz Genomics Institute

Guide: Running the Single Cell RNA-seq Pipeline using Toil

Overview

Dependencies

General Dependencies

Python Dependencies

System Dependencies

Installation

Inputs

General Usage

Example Commands

Example Config

Distributed Run

Methods

Tools

Reference Data

Tool Options

About

Releases

Packages

Languages

License

alaindomissy/toil-rnaseq-sc

Folders and files

Latest commit

History

Repository files navigation

University of California, Santa Cruz Genomics Institute

Guide: Running the Single Cell RNA-seq Pipeline using Toil

Overview

Dependencies

General Dependencies

Python Dependencies

System Dependencies

Installation

Inputs

General Usage

Example Commands

Example Config

Distributed Run

Methods

Tools

Reference Data

Tool Options

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages