Skip to content

nickp60/EzClermont

Repository files navigation

Build StatusCoverage StatusLicense: MIT PyPI Version Docker Image Version (latest by date)

Icon

EzClermont: The E. coli Clermont PCR phylotyping tool

Description

This is a tool for using the Clermont 2013 PCR typing method for in silico analysis of E. coli whole genomes or assembled contigs.

Changelog

  • bump to version 0.7 in Nov 2021; add option for logfile instead of stderr messages for workflow compatibility
  • bump to version 0.4 in May 2018; improved handling of partial matches
  • made a webapp on April 19th, 2018 after requests from several to make the tool more user friendly.
  • updated on August 2, 2017 to add reactions that differentiate A/C, D/E/cryptic, and to add more robust tests.
  • released Dec. 2016

Usage

EzClermont can either read in a file or read from stdin.

Try:

ezclermont tests/refs/CP004009.1.fasta

or

cat tests/refs/CP004009.1.fasta | ezclermont - -e "APEC_O78"
usage: ezclermont [-m MIN_LENGTH] [-e EXPERIMENT_NAME] [-n]
                  [--logfile LOGFILE] [-h] [--version]
                  contigs

run a 'PCR' to get Clermont 2013 phylotypes; version 0.7.0

positional arguments:
  contigs               FASTA formatted genome or set of contigs. If reading
                        from stdin, use '-'

optional arguments:
  -m MIN_LENGTH, --min_length MIN_LENGTH
                        minimum contig length to consider.default: 500
  -e EXPERIMENT_NAME, --experiment_name EXPERIMENT_NAME
                        name of experiment; defaults to file name without
                        extension. If reading from stdin, uses the first
                        contig's ID
  -n, --no_partial      If scanning contigs, breaks between contigs could
                        potentially contain your sequence of interest. if
                        --no_partial, these plausible partial matches will NOT
                        be reported; default behaviour is to consider partial
                        hits if the assembly has more than 4 sequnces(ie, no
                        partial matches for complete genomes, allowing for 1
                        chromasome and several plasmids)
  --logfile LOGFILE     send log messages to logfile instead stderr
  -h, --help            Displays this help message
  --version             show program's version number and exit

It prints out the presense or absence of the PCR product to stderr, and the resulting phylotype and experiment name to stdout. It checks the length, accepting fragments that are within 20bp of the expected size. When using --partial, if a single primer has a hit but the contig starts/ends within the length of the expected product size, we call it a hit.

A minimal filename.fasta ClermontType output table can be generated by piping to a results file using a bash loop:

for i in strain1 strain2 strain3;
	do
	  ezclermont ${i} >> results.txt
done

or, using GNU parallel, and saving a log file:

ls ./folder/with/assemblies/*.fa | parallel "ezclermont {} 1>> results.txt  2>> results.log"

Run the webapp

docker run -p 5000:5000 nickp60/ezclermont

Have fun!

Installation

From Pypi

conda create -n ezclermont_env ezclermont
conda activate ezclermont_env

development

conda create -n ez biopython
conda activate ezclermont
git clone https://github.com/nickp60/ezclermont && cd ezclermont
pip install .

Testing

The tests can be run by either unittests or nosetests.

Requirements

commandline tool

Biopython

webapp

flask biopython

Acknowledgements

Thanks to Dave Gamache for Skeleton, the webapp CSS theme.

Name note

The name of this repo (and pypi package was changed on April 21 from ClermontPCR to EzClermont.