Skip to content

Commit

Permalink
Update README information (#58)
Browse files Browse the repository at this point in the history
Completed all filetype-specific documentation
  • Loading branch information
jonperdomo authored Sep 26, 2024
1 parent b000dfb commit 26a4609
Showing 1 changed file with 149 additions and 55 deletions.
204 changes: 149 additions & 55 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,38 +114,61 @@ This section describes parameters common to all filetypes:

# WGS BAM

This section describes general usage for BAM files from whole-genome sequencing
(WGS) with alignments to a linear reference genome such as GRCh38:
This section describes how to generate QC reports for BAM files from whole-genome sequencing
(WGS) with alignments to a linear reference genome such as GRCh38 (data shown is HG002 sequenced with ONT Kit V14
Promethion R10.4.1 from https://labs.epi2me.io/askenazi-kit14-2022-12/)

![image](https://github.com/user-attachments/assets/166f6d04-26ca-4469-be2c-ce466597a68a)

![image](https://github.com/user-attachments/assets/7d83e55c-85a2-48a8-b9a7-92b671de758f)

![image](https://github.com/user-attachments/assets/d303f5a9-8e1b-425e-b0b0-f46f263a3f9f)

![image](https://github.com/user-attachments/assets/f74f985a-3c3d-4b00-bf98-d59b128d8722)

## General usage
```
longreadsum bam -i $INPUT_FILE -o $OUTPUT_DIRECTORY
```
Download an example HTML report [here]() (data is HG002 sequenced with ONT Kit V14
Promethion R10.4.1 from https://labs.epi2me.io/askenazi-kit14-2022-12/)

# BAM with base modifications

This section describes parameters for BAM files with base modification tags (MM,
ML).
This section describes how to generate QC reports for BAM files with MM, ML base modification tags (data shown is HG002 sequenced with ONT
MinION R9.4.1 from https://labs.epi2me.io/gm24385-5mc/)

![image](https://github.com/user-attachments/assets/5d97a949-842a-4f41-bfc5-81e9f30c57bc)


## Parameters
| Parameter | Description | Default |
| --- | --- | --- |
| --mod | Run base modification analysis on the BAM file | False
| --modprob | Base modification filtering threshold. Above/below this value, the base is considered modified/unmodified. | 0.8
| --ref | The reference genome FASTA file to use for identifying CpG sites (optional)

General usage:
## General usage
```
longreadsum bam -i $INPUT_FILE -o $OUTPUT_DIRECTORY --ref $REF_GENOME --modprob 0.8
```

Download an example HTML report [here]() (data is HG002 sequenced with ONT
MinION R9.4.1 from https://labs.epi2me.io/gm24385-5mc/)

# RRMS BAM

This section describes parameters for ONT RRMS BAM files and associated CSVs.
This section describes describes how to generate QC reports for ONT RRMS BAM files and associated CSVs (data shown is HG002 RRMS using ONT
R9.4.1).

### Accepted reads:
![image](https://github.com/user-attachments/assets/c0e69e53-0a1e-432d-ad4c-9edfac764514)

![image](https://github.com/user-attachments/assets/105a47ff-7bd8-436e-9d3d-1b112b94fb5e)


### Rejected reads:
![image](https://github.com/user-attachments/assets/7c213975-ec6b-4476-81c9-8853c664b653)

![image](https://github.com/user-attachments/assets/604ca74a-516b-48a7-8b02-931e27255bd8)


## Parameters
| Parameter | Description | Default |
| --- | --- | --- |
| -c, --csv | CSV file containing read IDs to extract from the BAM file*
Expand All @@ -161,28 +184,52 @@ batch_time,read_number,channel,num_samples,read_id,sequence_length,decision
1675186897.7544408,80,68,4025,fab0c19d-8085-454c-bfb7-c375bbe237a1,462,unblock
1675186897.7544408,93,127,4028,5285e0ba-86c0-4b5d-ba27-5783acad6105,438,unblock
1675186897.7544408,103,156,4023,65d8befa-eec0-4496-bf2b-aa1a84e6dc5e,362,stop_receiving
...
```

General usage:
## General usage
```
longreadsum rrms -i $INPUT_FILE -o $OUTPUT_DIRECTORY -c $RRMS_CSV
```

Download an example HTML report [here]() (data is HG002 RRMS using ONT
R9.4.1)

# RNA-Seq BAM

This section describes parameters for generating TIN (transcript integrity
number) scores from RNA-Seq BAM files.
This section describes how to generate QC reports for TIN (transcript integrity
number) scores from RNA-Seq BAM files (data shown is Adult GTEx v9 long-read RNA-seq data sequenced with ONT
cDNA-PCR protocol from https://www.gtexportal.org/home/downloads/adult-gtex/long_read_data).

## Outputs
A TSV file with scores for each transcript:

```
geneID chrom tx_start tx_end TIN
ENST00000456328.2 chr1 11868 14409 2.69449577083296
ENST00000450305.2 chr1 12009 13670 0.00000000000000
ENST00000488147.2 chr1 14695 24886 94.06518975035769
ENST00000619216.1 chr1 17368 17436 0.00000000000000
ENST00000473358.1 chr1 29553 31097 0.00000000000000
...
```

An TSV file with TIN score summary statistics:

```
Bam_file TIN(mean) TIN(median) TIN(stddev)
/mnt/isilon/wang_lab/perdomoj/data/GTEX/GTEX-14BMU-0526-SM-5CA2F_rep.FAK93376.bam 67.06832655372376 74.24996965188242 26.03788585287367
```

A summary table in the HTML report:

![image](https://github.com/user-attachments/assets/400bcd68-05fc-4f08-8b70-b981cd9dc994)

## Parameters
| Parameter | Description | Default |
| --- | --- | --- |
| --genebed | Gene BED12 file required for calculating TIN scores
| --sample-size | Sample size for TIN calculation | 100
| --min-coverage | Minimum coverage for TIN calculation | 10

General usage:
## General usage
```
longreadsum bam -i $INPUT_FILE -o $OUTPUT_DIRECTORY --genebed $BED_FILE --min-coverage <COVERAGE> --sample-size <SIZE>
```
Expand All @@ -192,21 +239,33 @@ cDNA-PCR protocol from https://www.gtexportal.org/home/downloads/adult-gtex/long

# PacBio unaligned BAM

This section describes general usage for PacBio BAM files without alignments:
This section describes how to generate QC reports for PacBio BAM files without alignments (data shown is HG002 sequenced with PacBio
Revio HiFi long reads obtained from https://www.pacb.com/connect/datasets/#WGS-datasets).

![image](https://github.com/user-attachments/assets/76374274-3671-49d2-984f-0208e9d8e3e7)

![image](https://github.com/user-attachments/assets/15112738-b6cd-4d1d-b0c4-e0bb31464374)

![image](https://github.com/user-attachments/assets/e3935f58-eb53-4f9d-b4b5-7287fcdc3252)

![image](https://github.com/user-attachments/assets/8b17c9e2-8932-45b3-a673-b5b35ae994e6)

## General usage
```
longreadsum bam -i $INPUT_FILE -o $OUTPUT_DIRECTORY
```
Download an example HTML report [here]() (data is HG002 sequenced with PacBio
Revio HiFi long reads obtained from
https://www.pacb.com/connect/datasets/#WGS-datasets)

# ONT POD5

This section describes parameters for generating a signal and basecalling QC
report from ONT POD5 (signal) and their corresponding BAM files (basecalls).
This section describes how to generate QC reports for ONT POD5 (signal) files and their corresponding basecalled BAM files (data shown is HG002 using ONT
R10.4.1 and LSK114 downloaded from the tutorial https://github.com/epi2me-labs/wf-basecalling).

![image](https://github.com/user-attachments/assets/62c3c810-5c1a-4124-816b-74245af8b57c)

> :bulb: **NOTE**: The interactive signal-base correspondence plots in the HTML report use a

## Parameters
> [!NOTE]
> The interactive signal-base correspondence plots in the HTML report use a
lot of memory (RAM) which can make your web browser slow. Thus by default, we
randomly sample only a few reads, and the user can specify a list of read IDs as
well (e.g. from a specific region of interest).
Expand All @@ -217,22 +276,28 @@ well (e.g. from a specific region of interest).
| -r, --read_ids | A comma-separated list of read IDs to extract from the file
| -R, --read-count | Set the number of reads to randomly sample from the file | 3

General usage:
```
longreadsum pod5 -i <INPUT_FILE> -o $OUTPUT_DIRECTORY --basecalls $INPUT_BAM [--read-count <COUNT> | --read-ids <IDS>]
## General usage
```
# Individual file:
longreadsum pod5 -i $INPUT_FILE -o $OUTPUT_DIRECTORY --basecalls $INPUT_BAM [--read-count <COUNT> | --read-ids <IDS>]
Download an example HTML report [here]() (data is HG002 using ONT
R10.4.1 and LSK114 downloaded from the tutorial https://github.com/epi2me-labs/wf-basecalling)
# Directory:
longreadsum pod5 -P "$INPUT_DIRECTORY/*.fast5" -o $OUTPUT_DIRECTORY --basecalls $INPUT_BAM [--read-count <COUNT> | --read-ids <IDS>]
```

# ONT FAST5

## Signal QC

This section describes parameters for generating a signal and basecalling QC
report from ONT FAST5 files with signal and basecall information.
This section describes how to generate QC reports for generating a signal and basecalling QC
report from ONT FAST5 files with signal and basecall information (data shown is HG002 sequenced with ONT MinION R9.4.1 from https://labs.epi2me.io/gm24385-5mc/)

![image](https://github.com/user-attachments/assets/b2a56529-55e3-4678-918a-e5787881643e)


> :bulb: **NOTE**: The interactive signal-base correspondence plots in the HTML report use a
## Parameters
> [!NOTE]
> The interactive signal-base correspondence plots in the HTML report use a
lot of memory (RAM) which can make your web browser slow. Thus by default, we
randomly sample only a few reads, and the user can specify a list of read IDs as
well (e.g. from a specific region of interest).
Expand All @@ -242,60 +307,89 @@ well (e.g. from a specific region of interest).
| -r, --read_ids | A comma-separated list of read IDs to extract from the file
| -R, --read-count | Set the number of reads to randomly sample from the file | 3

General usage:
## General usage
```
# Individual file:
longreadsum f5s -i $INPUT_FILE -o $OUTPUT_DIRECTORY [--read-count <COUNT> | --read-ids <IDS>]
```
Download an example HTML report [here]() (data is HG002 sequenced with ONT Kit
V12 Promethion R10.4.1 from https://labs.epi2me.io/gm24385_q20_2021.10/)
# Directory:
longreadsum f5s -P "$INPUT_DIRECTORY/*.fast5" -o $OUTPUT_DIRECTORY [--read-count <COUNT> | --read-ids <IDS>]
```

## Sequence QC

This section describes how to generate QC reports for sequence data from ONT FAST5 files:
This section describes how to generate QC reports for sequence data from ONT FAST5 files (data shown is HG002 sequenced with ONT MinION R9.4.1 from https://labs.epi2me.io/gm24385-5mc/)

![image](https://github.com/user-attachments/assets/97876343-cd34-4bfe-9612-7f6b14a2be0d)

![image](https://github.com/user-attachments/assets/be8415a5-63ee-403b-931a-66d79a3b28a5)

![image](https://github.com/user-attachments/assets/2d822263-d9a4-470f-aa6d-5bccf570edac)

![image](https://github.com/user-attachments/assets/0fd1ee15-1e2f-492a-b072-bfaf768a448a)

![image](https://github.com/user-attachments/assets/aa5b37c8-7c83-418b-b5ea-6e53d557fb93)


## General usage
```
longreadsum f5 -i $INPUT_FILE -o $OUTPUT_DIRECTORY
```

Download an example HTML report [here]() (data is HG002 sequenced with ONT Kit
V12 Promethion R10.4.1 from https://labs.epi2me.io/gm24385_q20_2021.10/)

# Basecall summary

This section describes how to generate QC reports for basecall summary files
(sequencing_summary.txt).
This section describes how to generate QC reports for ONT basecall summary (sequencing_summary.txt) files (data shown is HG002 sequenced with ONT
PromethION R10.4 from https://labs.epi2me.io/gm24385_q20_2021.10/, filename `gm24385_q20_2021.10/analysis/20210805_1713_5C_PAH79257_0e41e938/guppy_5.0.15_sup/sequencing_summary.txt`)

![image](https://github.com/user-attachments/assets/ad094b0a-7878-4937-840c-ad0d7c09335b)

![image](https://github.com/user-attachments/assets/5e2417e8-74b8-4f39-8c3d-d6481749711d)

![image](https://github.com/user-attachments/assets/f25841bf-8129-41bc-a90f-0196ca14159f)


## General usage
```
longreadsum seqtxt -i $INPUT_FILE -o $OUTPUT_DIRECTORY
```

Download an example HTML report [here]() (data is HG002 sequenced with ONT
PromethION R10.4 from https://labs.epi2me.io/gm24385_q20_2021.10/, filename `gm24385_q20_2021.10/analysis/20210805_1713_5C_PAH79257_0e41e938/guppy_5.0.15_sup/sequencing_summary.txt`)

# FASTQ

This section describes how to generate QC reports for FASTQ files.
This section describes how to generate QC reports for FASTQ files (data shown is HG002 ONT 2D from GIAB
[FTP index](https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data_indexes/AshkenazimTrio/sequence.index.AJtrio_HG002_Cornell_Oxford_Nanopore_fasta_fastq_10132015.HG002))

![image](https://github.com/user-attachments/assets/09d80928-bae7-4c64-a21c-8ef21fe9ab60)

![image](https://github.com/user-attachments/assets/d5ee0aa5-9127-447f-b96b-26f3fad7a963)

![image](https://github.com/user-attachments/assets/cea3c23c-44a3-4313-9d31-4c8559073b22)

![image](https://github.com/user-attachments/assets/acb199f6-4529-43ce-9212-f938128b0706)

![image](https://github.com/user-attachments/assets/47e395fc-b33d-45d0-b3ac-c658a84f62cb)


## General usage
```
longreadsum fq -i $INPUT_FILE -o $OUTPUT_DIRECTORY
```

Download an example HTML report [here]() (data is HG002 ONT 2D from GIAB
[FTP index](https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data_indexes/AshkenazimTrio/sequence.index.AJtrio_HG002_Cornell_Oxford_Nanopore_fasta_fastq_10132015.HG002))

# FASTA

This section describes how to generate QC reports for FASTA files.
This section describes how to generate QC reports for FASTA files (data shown is HG002 ONT 2D from GIAB
[FTP index](https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data_indexes/AshkenazimTrio/sequence.index.AJtrio_HG002_Cornell_Oxford_Nanopore_fasta_fastq_10132015.HG002)).

![image](https://github.com/user-attachments/assets/d4862e6d-435e-4317-b331-4af0428a6419)

![image](https://github.com/user-attachments/assets/af3b736b-beb6-44e4-a0d8-df736c288389)

![image](https://github.com/user-attachments/assets/cd4e1f59-0c34-41a6-91ea-08381bdc906a)

## General usage
```
longreadsum fa -i $INPUT_FILE -o $OUTPUT_DIRECTORY
```

Download an example HTML report [here]() (data is HG002 ONT 2D from GIAB
[FTP index](https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data_indexes/AshkenazimTrio/sequence.index.AJtrio_HG002_Cornell_Oxford_Nanopore_fasta_fastq_10132015.HG002))


# Revision history
For release history, please visit [here](https://github.com/WGLab/LongReadSum/releases).

Expand Down

0 comments on commit 26a4609

Please sign in to comment.