Fixing fix
Fixing some issues with blast report generation when using newer BLAST databases. Source of the problemis still unclear but appears to be linked to new realeses of the BLAST db and concerns only specific taxids or sequences.
Older configuration files are not compatible anymore. Update the files by adding the following line should yield the same results as before:
blast_filter_low_complexity: True
It is now possible to desactivate the defualt low-complexity filter of the BLAST search.
This can be advantageous if you expect your barcode to contain low-complexity sequences which could
prevent getting any match at all.
This behaviour can be activated/deactivated by changing the blast_filter_low_complexity
from True
to False
.
The default behaviour (False
) uses the default 'DUST' filter of the blast tool:
-dust 20 64 1 -soft_masking true
.
- Corrects parsing of the
trim_primers_3end
parameter (#64) - Added a primer disambiguation step that converts primers sequences in the IUPAC ambiguous nucleotide format to their explicit forms (#63)
- BLAST rule now correctly uses the
threads_sample
parameter instead ofthreads
. This results in better ressource management for the BLAST rule.
Benchmarking paper for 16S Metabarcoding of meat products is online:
Denay, G.; Preckel, L.; Petersen, H.; Pietsch, K.; Wöhlke, A.; Brünen-Nieweler, C. Benchmarking and Validation of a Bioinformatics Workflow for Meat Species Identification Using 16S rDNA Metabarcoding. Foods 2023, 12, 968. https://doi.org/10.3390/foods12050968
- It is now possible to filter specific sequences form the database using a list of accession provided in a text file using the
seq_blocklist
parameter (#60).
- Taxa names are now displayed in the benchmarking report
- Fixed many typos and errors in the documentation
- Improved report aesthetics
- Improved the fetch_nt_blast.sh script to make it easier to resume interupted processes, also more robust
- Fixed confusion matrix calculation when the expected taxid rank is above the target rank
- Benchmarking: confusion table now reports prediction rank as well
- Fixed major problem in confusion matrix determnination for the benchmark module. Prior to this fix, False negatives were not correctly reported.
- Small fix to report aggregation rules for a rarely happening failure
- Fixed composition summary when input is empty
- Moved the benchmark module form a rule to a workflow. THis allows to ignore the benchmark arguments when running routine
analysis, as originally intended. Benchmarking is now called with the
-s path/to/FooDMe/woorkflow/benchmark
argument. This has no impact on the basic analysis (no-s
argument) or the paramspace analysis (-s path/to/FooDMe/woorkflow/paramspace
).
- Fixed a few errors in the
config.yaml
comments - Modified documentation of the
benchmark
module
- Fixed handling of last common node calculation in confusion matrix where last common ancesotr is the root node.
- Modified dependencies in environement definition files to solve some issues in conda solving
This update is not backwards compatible. A configuration file update is nescessary.
- Added a new Snakefile for parameter space exploration.
Basically acts as a wrapper around the foodme benchmark workflow
for parameter grid search using snakemake's
Paramspcae
utility.
- Flattened the parameter structure in the configuration.
This is more compatible with the
--config
CLI argument and was required for the implementation of the parameter space exploration workflow. This requires users to update their configurations.
- Added documentation for the
paramspace
workflow.
- Added missing parameters to meat config file
- Fix dtype parsing in confusion matrix calculation
- Fix package version reporting
- Fix Error calculation in benchmarking module
- Fix multiple plotting in benchmarking report
This update is not backwards compatible. A configuration file update is nescessary.
- Benchmark module is live with possibility to compare results to an expected sample composition. The benchmark module will output the comparison results and several useful metrics in an HTML report. It can be used directly for validation or parameter space exploration.
- Added the benchmarking module which can be called with
snakemake benchmark
- Added required parameters in the config file
- The python wrapper is now deprecated. See the documentation on how to use configuration files.
- Added a dependency to Scikit-learn
- Updated R packages dependency in the
rmarkdown
environment
- Added documentation for the benchmark module
- Added missing
pandas
dependency intaxidTools
environment - Moved log directive to top of python scripts to catch import errors
- Replaced all
bc
callsbyprintf
statements invsearch.smk
(#52) - Improved logging for OTU workflow
- Added test suite for OTU workflow
- Improved Conda installation guide by quoting the Bioconda guide and adding new snakemake requirement to set sstrict channel priority
- Fixed header in
consensus-table.tsv
- Fixed a bash synthax misuse in the calculation of VSearch statistics
- Moved the documentation to the homepage at https://cvua-rrw.github.io/FooDMe/
- Fixed missing report-wise reports
- The parameter
taxid_filter
now only accepts integers, default config values have been changed (#42).
- Now correctly reports composition as both percentage of total usable reads and assigned reads (#41)
- Added a configuration file for 16S birds and mammals experiments
- The usage of the python warpper is not recommended. Prefer the use a yaml configuration file.
- Pending deprecation warning added to the python wrapper
- Expanded documentation on the use of the config file
- Improved error handling and logging for the DADA2 steps. Will now correctly output number of reads and denoisin/merging results for failing samples.
- Now unpacks trimmed read files on a sample wise fashion prior to Dada2 denoising instead of unpacking all samples at once. This should reduce the memory use during the analysis.
- Preventively fixed a pandas CopyWarning (#31) and FutureWarning
- Updated dependencies to newer versions. NCBI's upcoming new identifier definitions should be supported (#33).
- Check compatibility with snakemake v7 (#34)
- Dependency taxidTools now handled through conda environment and therefore not needed in the base environment anymore (#36).
- Reorganised logging (#27)
- Fully linted and formatted (#28)
- Fixed a variable refernece breaking Vsearch pipeline
- Fixed time display upon pipeline completion on success or error
- Fixed wrapper
- Moderate performance improvements due to saving taxonomy as a filtered JSON file. Expect the workflow to be about 1 min faster per sample.
- Fixed Github version paring for lightweight tags.
tests
was renamed.tests
- Linting and reorganize workflow to match be closer to snakemake standards
- Added JSON-Schema validation for the config and sample sheet files
- Added the possibility to export a Snakemake report containing QC summaries and results as well as the workflow runtime and DAG using the
--report
argument (snakemake CLI only)
- Migrated to TaxidTools version 2. The taxidTools package must now be installed via conda or pip before starting th epipeline (See README.md).
- Modified default parameters of the config file and python laucher with more sensible values
- Expand disambiguation info with the frequency of each species (#17)
- Add minimum consensus filter as an alternative to last common ancestor. Use it with the parameter
--min_consensus
. The value be be in the interval (0.5;1], 1 being a last common ancestro behavior and 0.51 a simple majority vote. - Added blocklist of taxids to mask (#13). Default behaviour is to mask extinct taxids. Users can skip this steps or provide their own blocklist with the
--blocklist
parameter.
- Cluster that do not find a matching reference in BLAST are not counted towards the compoisiton total anymore. Additionnaly the number of assigned reads is now visible in the summary report(#12)
- Fixed the calculation of the "No primer found" field under the triming statistics (#19)
- Upgraded Dada2 dependency to version 1.20
- Upgraded dependencies to last (conda) version
- Test now runs with just
snakemake --cores 1 --use-conda
- Added CI in github actions
- Reworked environments definition files, environments should build correctly.
- Added a very basic test script. This is meant to test the installation - not provide unit testing
- Added an example of expected output
- Fixed Snakemake version
- Fixed summary report
- Workflow will no longer crash on blank samples
- Now reports the proportion of reads discarded during primer trimming
- Now requires user to provide a fasta file with primer sequences
- Taxonomic reports now include a 'disambiguation' field summarizing the different blast hit for each cluster
- Primers will now be trimmed for the reads before quality trimming. It is possible to trim primers on both ends
- Performance fix for the display of large tables in the html report
- Updated BLAST+ and Fastp to the latest version
- Report now includes links to BLast reports
- Blast report now includes number of mismatch, gaps and alignment length
- Added the --skip_adapter_trimming option to disable adapter trimming in fastp (only recommended for artificial dataset)
- taxidTools is now a submodule
- Cloning the repository should now be done with '--recurse-submodules'
- taxidtools updated to version 2
- Adapted scripts to the new version of taxidtools
- Changed BLAST database masking to not be silent about taxids missing from the Taxdump definition files
- Added the option to filter the BLAST search by taxid
- Fixed a performance issue for BLAST filtering
- Better error handling for LCA determination
- Snakemake logging has been moved to the logs folder
- Added primer trimming option (experimental)
- Added subspecies to taxonomy levels
- Added a helper script to fetch the BLAST nt database
- Fixed Krona broken link in report
- Fixed BLAST filtering for floating point values of bitscores
- Fixed crash upon absence of BLAST hits
- Fixed BLAST database version reporting
- initial release