Skip to content

Commit

Permalink
chore: fix maths notations
Browse files Browse the repository at this point in the history
  • Loading branch information
mbhall88 committed Nov 22, 2024
1 parent cf9de21 commit 7058e4d
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 5 deletions.
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
paper/** linguist-vendored
install/** linguist-vendored
justfile linguist-vendored
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ conda install -c bioconda lrge
![Crates.io Total Downloads](https://img.shields.io/crates/d/lrge)

```sh
cargo install --locked lrge
cargo install lrge
```

### Container
Expand Down Expand Up @@ -232,7 +232,7 @@ If you don't want the estimate to be rounded to the nearest integer 🤓
$ lrge --float-my-boat reads.fq
```

In [the paper][doi], we suggest using the 15th and 65th percentiles of the estimates to get a 92% confidence interval.
In [the paper][doi], we suggest using the 15th and 65th percentiles of the estimates to get a ~92% confidence interval.
However, you can change these

```
Expand Down Expand Up @@ -344,10 +344,10 @@ against $T$ and a genome size ($\textbf{GS}$) estimate is generated for that rea
the number of overlaps of $q_i$ with reads in $T$ ($O_{T,q_i}$), according to the formula:

```math
\textbf{GS}_{T,q_i} \approx \frac{\size{T} \cdot \size{q_i} + \overline{t \in T} - 2 \cdot \textbf{OT}}{O_{T,q_i}}
\textbf{GS}_{T,q_i} \approx \frac{\vert T \vert \cdot \vert q_i \vert + \overline{t \in T} - 2 \cdot \textbf{OT}}{O_{T,q_i}}
```

where $\size{T}$ is the total size of the target set, $\size{q_i}$ is the length of read $q_i$, $\overline{t \in T}$ is
where $\vert T \vert$ is the total size of the target set, $\vert q_i \vert$ is the length of read $q_i$, $\overline{t \in T}$ is
the average length of reads in $T$, and $\textbf{OT}$ is the overlap threshold (minimum chain score in minimap2, which
defaults to 100 for overlaps). See the paper for more formal/rigorous definitions.

Expand All @@ -361,7 +361,7 @@ the estimate all that much.
### All-vs-all strategy

The all-vs-all strategy involves overlapping some random subset (`-n`) of reads in the input against each other. The
genome size estimate for each read is calculated as above, but we subtract one from $\size{T}$ to account for the fact
genome size estimate for each read is calculated as above, but we subtract one from $\vert T \vert$ to account for the fact
that the read is not being overlapped against itself. We also do not factor the length of the read whose size is being
estimated into the average read length calculation.

Expand Down

0 comments on commit 7058e4d

Please sign in to comment.