Cenote-Taker 2 Version 2.1.2
If you haven't already installed Cenote-Taker 2, please follow installation instructions in README. If you have already installed it, please do:
conda activate cenote-taker2_env
conda install -c bioconda biopython bedtools
cd Cenote-Taker2
git pull
Then update the HMM database.
Thank you.
This release improves a number of things regarding the annotation and outputs of Cenote-Taker 2. Here is a fairly comprehensive list:
- BLASTN can be used to determine if your sequence belongs to an extant virus species based on 95% Average Nucleotide Identity (ANI) and 85% Alignment Fraction (AF), per community standards. This module requires GenBank nt database, GenBank virus nucleotide database, or some subset thereof. If a sequence has at least 95% ANI and 85% AF to a virus, the taxonomy/organism name will be changed to match the GenBank entry. This module uses anicalc.py from CheckV, see license and copyright in anicalc directory.
- ORFs that overlap tRNAs are now removed to comply with GenBank guidelines. ORFs that are cut off by the end of a contig are now properly formatted per GenBank guidelines.
- "Messy" gene names are largely improved to comply with GenBank guidelines.
- Organism/Taxonomy and BLASTN info are now included in the summary .tsv file
- Cenote-Taker 2 uses more refined gene content searches to identify putative conjugative transposons. Also, genes that Cenote-Taker 2 flags as conjugative machinery are output as a .gtf file in the sequin_and_genome_maps directory.
- Cenote-Taker 2 will now take a CRISPR spacer hit table as an optional input, and will put CRISPR spacer hit info in the note of the genome output files. The format required is a tab-separated table:
CONTIG_NAME HOST_NAME NUMBER_OF_HITS
e.g.
my_contig_1 bacteroides 9
Best,
Mike