-
Notifications
You must be signed in to change notification settings - Fork 10
Pipeline Documentation
Each pipeline functions based on a standardized framework, which was inspired by NF-core, an open-source group developing Nextflow (NF) pipelines. The framework we have established allows a user to call any pipeline from this repository from a single script using with a restricted vocabulary of parameters.
Currently Supported Workflows:
NGS-Ops Pipelines:
- Amplicon Sequencing - xGen Sample Identification:
amplicon
- Amplicon Sequencing - General PCR / Target Panel
- ATAC Sequencing:
atac
- ChIP Sequencing:
chipseq
- RNA sequencing:
rnaseq
- RNA Fusion:
rna_fusion
- RRBS:
rrbs
- Whole Exome Sequencing:
wes
- PDX Whole Exome Sequencing:
pdx_wes
- Whole Genome Sequencing:
wgs
- Paired Tumor Analysis:
pta
- MMRSVD Germline SV:
germline_sv
Genetic Diversity Analysis Suite:
-
Secure shell login to Sumner2:
ssh login.sumner2.jax.org
-
Connect to a compute node:
srun -p compute -q batch -N 1 -n 1 --mem 2G -t 08:00:00 --pty bash
Note: This instance is only to obtain the pipelines and launch Slurm pipeline jobs. It is not for running pipelines, therefore the request time reservation is only 1 hour. -
Clone Repo (or Pull Updated Repo):
git clone https://github.com/TheJacksonLaboratory/cs-nf-pipelines.git && cd cs-nf-pipelines
-
Start test data in rna pipeline:
An example run script is provided in this repo:
sbatch run.sh
Where run.sh
contains:
#!/bin/bash
#SBATCH --job-name=CS_nextflow_example
#SBATCH -p compute
#SBATCH -q batch
#SBATCH -t 72:00:00
#SBATCH --mem=5G
#SBATCH --ntasks=1
cd $SLURM_SUBMIT_DIR
# LOAD NEXTFLOW
module use --append /projects/omics_share/meta/modules
module load nextflow/23.10.1
# RUN TEST PIPELINE
nextflow main.nf \
-profile sumner2 \
--workflow rnaseq \
--gen_org mouse \
--sample_folder 'test/rna/mouse' \
--pubdir '/flashscratch/${USER}/outputDir' \
-w '/flashscratch/${USER}/outputDir/work'
Notes:
-
By default the example
run.sh
script runs the rna workflow, using a provided set of 10,000 simulated paired end mouse RNA reads. To run a different pipeline, change the--workflow
and--sample_folder
parameters to the desired pipeline for testing.- Supported options for workflow are list at the top of this page.
The inputs and parameters for these pipelines are outlined below. For analyses with real data, we recommend that you do not run pipelines from within the cloned git directory. Place the run script in an appropriate working directory, and modify the Nextflow command to contain the complete path of
main.nf
(i.e.,~/nextflow /home/USERNAME/main.nf <... Additional Options ...>
) -
By default the example
run.sh
script uses-profile sumner
this sets the pipeline to use HPC options related tosumner
. If using non-Jax systems, a properly formatted profile for your particular HPC environment will be required. -
Example run scripts with information on how to quickly start each pipeline and data type are provided in the
run_scripts
directory within the main repository. -
Example 10,000 paired end reads for mouse and human for all data types (i.e., RNA, WES, WGS) are provided in
test
directory within the main repository. These small datasets can be used to validate that the pipelines are functional. -
For all pipelines, running the flag
--help
(e.g,~/nextflow <PATH>/<TO>/main.nf --workflow rna --help
) will print help documentation for that pipeline and quit.
The pipelines can also be accessed using the nextflow pull
command.
-
Secure shell login to Sumner:
ssh login.sumner.jax.org
-
Connect to a compute node:
srun -p compute -q batch -N 1 -n 1 --mem 2G -t 01:00:00 --pty bash
Note: This instance is only to obtain the pipelines and launch Slurm pipeline jobs. It is not for running pipelines, therefore the request time reservation is only 1 hour. -
Load Nextflow module
module use --append /projects/omics_share/meta/modules
module load nextflow/23.10.1
Alternative method for local Nextflow install: See Nextflow Installation
- The pipeline can be run as follows:
nextflow run TheJacksonLaboratory/cs-nf-pipelines -r v0.6.5 --workflow rnaseq --gen_org mouse --sample_folder '/home/${USER}/.nextflow/assets/TheJacksonLaboratory/cs-nf-pipelines/test/rna/mouse' --pubdir '/flashscratch/${USER}/outputDir' -w '/flashscratch/${USER}/outputDir/work'
-
Pipelines are by default pulled to:
/home/<USERNAME>/.nextflow/assets/TheJacksonLaboratory/cs-nf-pipelines
and to run the test data, you must add your user change this to reflect your user name. -
The above command will run release
v0.6.5
for any other release, change this tag to the release number. Project release numbers are found on GitHub -
Additional information on using nextflow run with remote repositories is provided by the Nextflow documentation.
For users external to JAX, reference data used by the workflows is available in a Google Cloud bucket, and transfers of data can be made upon request. The --reference_cache
parameter can be used to point workflows at your local reference cache location. Support for additional HPC environments beyond SLURM is also by request.
These parameters are utilized by all pipelines
-
--profile
- Default:
sumner2
- Comment: This is the main parameter that will set runtime options relating to the HPC environment being used.
- Default:
-
--workflow
- Default:
Not_Specified
- Comment: This specifies the workflow to be run. See wiki sidebard for available workflows.
- Default:
-
--reference_cache
- Default:
/projects/omics_share
- Comment: This is the location of all reference files specified in workflow configs.
- Default:
-
--help
- Default: NA
- Comment: When this flag is used, pipeline will print help information and quit.