Skip to content

Pipeline Documentation

MikeWLloyd edited this page Aug 5, 2024 · 30 revisions

Pipeline Documentation:

Each pipeline functions based on a standardized framework, which was inspired by NF-core, an open-source group developing Nextflow (NF) pipelines. The framework we have established allows a user to call any pipeline from this repository from a single script using with a restricted vocabulary of parameters.

Currently Supported Workflows:

NGS-Ops Pipelines:
  1. Amplicon Sequencing - xGen Sample Identification: amplicon
  2. Amplicon Sequencing - General PCR / Target Panel
  3. ATAC Sequencing: atac
  4. ChIP Sequencing: chipseq
  5. RNA sequencing: rnaseq
  6. RNA Fusion: rna_fusion
  7. RRBS: rrbs
  8. Whole Exome Sequencing: wes
  9. PDX Whole Exome Sequencing: pdx_wes
  10. Whole Genome Sequencing: wgs
  11. Paired Tumor Analysis: pta
  12. MMRSVD Germline SV: germline_sv
Genetic Diversity Analysis Suite:
  1. EMASE: emase
  2. GBRS: gbrs
  3. Generate Pseudoreference: generate_pseudoreference
  4. Prepare EMASE Inputs: prepare_emase
  5. Prepare DO GBRS Inputs: prep_do_gbrs_inputs

Quick Start for JAX Users

  1. Secure shell login to Sumner2:
    ssh login.sumner2.jax.org

  2. Connect to a compute node:
    srun -p compute -q batch -N 1 -n 1 --mem 2G -t 08:00:00 --pty bash Note: This instance is only to obtain the pipelines and launch Slurm pipeline jobs. It is not for running pipelines, therefore the request time reservation is only 1 hour.

  3. Clone Repo (or Pull Updated Repo):
    git clone https://github.com/TheJacksonLaboratory/cs-nf-pipelines.git && cd cs-nf-pipelines

  4. Start test data in rna pipeline:

An example run script is provided in this repo:
sbatch run.sh

Where run.sh contains:

#!/bin/bash

#SBATCH --job-name=CS_nextflow_example
#SBATCH -p compute
#SBATCH -q batch
#SBATCH -t 72:00:00
#SBATCH --mem=5G
#SBATCH --ntasks=1

cd $SLURM_SUBMIT_DIR

# LOAD NEXTFLOW
module use --append /projects/omics_share/meta/modules
module load nextflow/23.10.1 

# RUN TEST PIPELINE
nextflow main.nf \
-profile sumner2 \
--workflow rnaseq \
--gen_org mouse \
--sample_folder 'test/rna/mouse' \
--pubdir '/flashscratch/${USER}/outputDir' \
-w '/flashscratch/${USER}/outputDir/work'

Notes:

  1. By default the example run.sh script runs the rna workflow, using a provided set of 10,000 simulated paired end mouse RNA reads. To run a different pipeline, change the --workflow and --sample_folder parameters to the desired pipeline for testing.

    • Supported options for workflow are list at the top of this page.

    The inputs and parameters for these pipelines are outlined below. For analyses with real data, we recommend that you do not run pipelines from within the cloned git directory. Place the run script in an appropriate working directory, and modify the Nextflow command to contain the complete path of main.nf (i.e., ~/nextflow /home/USERNAME/main.nf <... Additional Options ...>)

  2. By default the example run.sh script uses -profile sumner this sets the pipeline to use HPC options related to sumner. If using non-Jax systems, a properly formatted profile for your particular HPC environment will be required.

  3. Example run scripts with information on how to quickly start each pipeline and data type are provided in the run_scripts directory within the main repository.

  4. Example 10,000 paired end reads for mouse and human for all data types (i.e., RNA, WES, WGS) are provided in test directory within the main repository. These small datasets can be used to validate that the pipelines are functional.

  5. For all pipelines, running the flag --help (e.g, ~/nextflow <PATH>/<TO>/main.nf --workflow rna --help) will print help documentation for that pipeline and quit.

Quick Start for JAX Users - Alternate Method

The pipelines can also be accessed using the nextflow pull command.

  1. Secure shell login to Sumner:
    ssh login.sumner.jax.org

  2. Connect to a compute node:
    srun -p compute -q batch -N 1 -n 1 --mem 2G -t 01:00:00 --pty bash Note: This instance is only to obtain the pipelines and launch Slurm pipeline jobs. It is not for running pipelines, therefore the request time reservation is only 1 hour.

  3. Load Nextflow module
    module use --append /projects/omics_share/meta/modules
    module load nextflow/23.10.1

Alternative method for local Nextflow install: See Nextflow Installation

  1. The pipeline can be run as follows:
nextflow run TheJacksonLaboratory/cs-nf-pipelines -r v0.6.5 --workflow rnaseq --gen_org mouse --sample_folder '/home/${USER}/.nextflow/assets/TheJacksonLaboratory/cs-nf-pipelines/test/rna/mouse' --pubdir '/flashscratch/${USER}/outputDir' -w '/flashscratch/${USER}/outputDir/work'
  1. Pipelines are by default pulled to: /home/<USERNAME>/.nextflow/assets/TheJacksonLaboratory/cs-nf-pipelines and to run the test data, you must add your user change this to reflect your user name.

  2. The above command will run release v0.6.5 for any other release, change this tag to the release number. Project release numbers are found on GitHub

  3. Additional information on using nextflow run with remote repositories is provided by the Nextflow documentation.

Users External to JAX

For users external to JAX, reference data used by the workflows is available in a Google Cloud bucket, and transfers of data can be made upon request. The --reference_cache parameter can be used to point workflows at your local reference cache location. Support for additional HPC environments beyond SLURM is also by request.

Global Pipeline Configs (nextflow.config)

These parameters are utilized by all pipelines

  • --profile

    • Default: sumner2
    • Comment: This is the main parameter that will set runtime options relating to the HPC environment being used.
  • --workflow

    • Default: Not_Specified
    • Comment: This specifies the workflow to be run. See wiki sidebard for available workflows.
  • --reference_cache

    • Default: /projects/omics_share
    • Comment: This is the location of all reference files specified in workflow configs.
  • --help

    • Default: NA
    • Comment: When this flag is used, pipeline will print help information and quit.
Clone this wiki locally