Skip to content

Commit

Permalink
Add option to deactivate the BLAST default DUST filter for low comple…
Browse files Browse the repository at this point in the history
…xity sequences
  • Loading branch information
gregdenay committed Dec 18, 2023
1 parent bad6d4f commit 30f9d87
Show file tree
Hide file tree
Showing 11 changed files with 48 additions and 2 deletions.
1 change: 1 addition & 0 deletions .tests/config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ taxdb: data/miniblast
taxid_filter: 7742
blocklist: extinct
seq_blocklist: None
blast_filter_low_complexity: True
blast_evalue: 1e-10
blast_identity: 97
blast_qcov: 100
Expand Down
1 change: 1 addition & 0 deletions .tests/config/config_foodme_paramspace.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ taxdb: ../../../data/miniblast
taxid_filter: 7742
blocklist: extinct
seq_blocklist: None
blast_filter_low_complexity: True
blast_evalue: 1e-10
blast_identity: 97
blast_qcov: 100
Expand Down
1 change: 1 addition & 0 deletions .tests/config/config_otu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ taxdb: data/miniblast
taxid_filter: 7742
blocklist: extinct
seq_blocklist: None
blast_filter_low_complexity: True
blast_evalue: 1e-10
blast_identity: 97
blast_qcov: 100
Expand Down
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,24 @@
### 1.7.0

#### Breaking changes

Older configuration files are not compatible anymore. Update the files by adding the following line
should yield the same results as before:

```{yaml}
blast_filter_low_complexity: True
```

#### New features

It is now possible to desactivate the defualt low-complexity filter of the BLAST search.
This can be advantageous if you expect your barcode to contain low-complexity sequences which could
prevent getting any match at all.
This behaviour can be activated/deactivated by changing the `blast_filter_low_complexity` from `True` to `False`.

The default behaviour (`False`) uses the default 'DUST' filter of the blast tool:
`-dust 20 64 1 -soft_masking true`.

### 1.6.6

#### Fixes
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.6.6
1.7.0
2 changes: 2 additions & 0 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ blocklist: extinct
# Exclude specific sequence accessions from the results.
# None or path to a user provided list of accessions
seq_blocklist: None
# BLAST low-complexity filter
blast_filter_low_complexity: True
# E-value threshold for blast results
blast_evalue: 1e-10
# Minimal identity between the hit and query for blast results (in percent)
Expand Down
2 changes: 2 additions & 0 deletions config/config_16Smeat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ blocklist: extinct
# Sequence Accessions blocklist, prevents specific accessions to show up in the results.
# None or path to a user provided list of accessions
seq_blocklist: None
# BLAST low-complexity filter
blast_filter_low_complexity: True
# E-value threshold for blast results
blast_evalue: 1e-20
# Minimal identity between the hit and query for blast results (in percent)
Expand Down
1 change: 1 addition & 0 deletions docs/userguide/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ This will create a file called `samples.tsv` in the `raw_data` folder.
| `taxid_filter` | Taxonomic identifier | Node under which to perform the BLAST search. <br>Equivalent to pruning the taxonomy above <br>this node. Use the Root Node number to keep <br>the entire taxonomy |
| `blocklist` | `extinct` or custom path | Path to a list of taxonomic identifier to exclude <br>from the BLAST search |
| `seq_blocklist` | `None` or custom path | Path to a list of sequence accessions (e.g. `NC_0016400`) <br>to exclude from the results |
| `blast_filter_low_complexity` | True/False | Wether to mask low-complexity regions in the BLAST search. On by default, deactivate if you expect barcode sequences with low complexity. |
| `blast_evalue` | Number (scientific) | Minimal E-value threshold for the BLAST search |
| `blast_identity` | Number [0, 100] | Minimal identity (in percent) between the query and <br>hit sequence for the BLAST search |
| `blast_qcov` | Number [0, 100] | Percent of the query to be covered by the hit <br>sequence for the BLAST search |
Expand Down
3 changes: 2 additions & 1 deletion workflow/rules/blast.smk
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ rule blast_otus:
e_value=config["blast_evalue"],
perc_identity=config["blast_identity"],
qcov=config["blast_qcov"],
low_complexity=get_low_complexity_filter_params,
threads: config["threads_sample"]
message:
"[{wildcards.sample}][assignement] BLASTing clusters against local database"
Expand All @@ -148,7 +149,7 @@ rule blast_otus:
blastn -db {params.blast_DB} \
-query {input.query} \
-out {output.report} \
-task 'megablast' \
-task 'megablast' {params.low_complexity} \
-evalue {params.e_value} \
-perc_identity {params.perc_identity} \
-qcov_hsp_perc {params.qcov} $masking \
Expand Down
8 changes: 8 additions & 0 deletions workflow/rules/common.smk
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,11 @@ def get_acc_blocklist(wildcards):
return f"{wildcards.sample}/taxonomy/{wildcards.sample}_blast_report.tsv"
else:
return f"{wildcards.sample}/taxonomy/{wildcards.sample}_blast_report_prefiltered.tsv"


def get_low_complexity_filter_params(wildcards):
# Filter is on by default in blastn
if config["blast_filter_low_complexity"]:
return ""
else:
return "-dust no -soft_masking false"
8 changes: 8 additions & 0 deletions workflow/rules/common_benchmark.smk
Original file line number Diff line number Diff line change
Expand Up @@ -62,3 +62,11 @@ def get_acc_blocklist(wildcards):
return f"{wildcards.sample}/taxonomy/{wildcards.sample}_blast_report.tsv"
else:
return f"{wildcards.sample}/taxonomy/{wildcards.sample}_blast_report_prefiltered.tsv"


def get_low_complexity_filter_params(wildcards):
# Filter is on by default in blastn
if config["blast_filter_low_complexity"]:
return ""
else:
return "-dust no -soft_masking false"

0 comments on commit 30f9d87

Please sign in to comment.