v2.4.0
Overhauled dist_analysis script and automatically execute this as part of hist, gcp and comp. Statistics generated are output to stdout as well as a file called .dist_analysis.json for easy parsing in downstream tasks.
Added a new tool called 'cold' for contig length and duplication analysis. This creates a scatter plot where contigs are defined according to their length, kmer duplication rate (in assembly), kmer coverage (according to reads) and GC values.
KAT now supports gzipped fastq and fasta files natively (for k-mer counting only)
KAT can automatically trim the start of reads to a defined number of bases. This is useful for processing 10X data to avoid the barcodes on R1.
Simplified and improved build system:
-
a subset of boost is kept in the KAT source tree and can be built prior to KAT with a single command: './build_boost.sh'. This version of boost will be statically linked into the KAT binaries and will NOT be installed to the system.
-
Tweaked compiler options, and ensure KAT builds with GCC 5-7.
-
kat python scripts are now wrapped up in a proper python package called kat. This is installed along with the kat c++ programs during install to the defined location.
-
We try to statically link now to most dependencies where practical.
-
Tidied up user interfaces. Removed "canonical" options, we assume the user wants canonical kmer counting by default.