- Introduction (Wiki)
- Installation
- PanMAN Construction
- panmanUtils functionalities
- Contribute
- Citing PanMAN
Here we provide an overview of PanMAN, panmanUtils, and its installation methods and usage. For more information please see our Wiki.
PanMAN or Pangenome Mutation-Annotated Network is a novel data representation for pangenomes that provides massive leaps in both representative power and storage efficiency. Specifically, PanMANs are composed of mutation-annotated trees, called PanMATs, which, in addition to substitutions, also annotate inferred indels (Fig. 1b), and even structural mutations (Fig. 1a) on the different branches. Multiple PanMATs are connected in the form of a network using edges to generate a PanMAN (Fig. 1c). PanMAN's representative power is compared against existing pangenomic formats in Fig. 1d. PanMANs are the most compressible pangenomic format for the different microbial datasets (SARS-CoV-2, RSV, HIV, Mycobacterium. Tuberculosis, E. Coli, and Klebsiella pneumoniae), providing 2.9 to 559-fold compression over standard pangenomic formats.
panmanUtils includes multiple algorithms to construct PanMANs and to support various functionalities to modify and extract useful information from PanMANs (Fig. 2).
Step 0: Dependencies
Git
Step 1: Clone the repository
git https://github.com/TurakhiaLab/panman.git
cd panman
Step 2: Run the installation script
chmod +x install/installationUbuntu.sh
./install/installationUbuntu.sh
Step 3: Run panmanUtils
cd build
./panmanUtils --help
To use panmanUtils in a docker container, users can create a docker container from a docker image, by following these steps
Step 0: Dependencies
Docker
Step 1: Pull the PanMAN docker image from DockerHub
docker pull swalia14/panman:latest
Step 2: Build and run the docker container
docker run -it swalia14/panman:latest
Step 3: Run panmanUtils
# Insider docker container
cd /home/panman/build
./panmanUtils --help
Docker container with preinstalled panmanUtils can also be built from DockerFile by following these steps
Step 0: Dependencies
Docker
Git
Step 1: Clone the repository
git https://github.com/TurakhiaLab/panman.git
cd panman
Step 2: Build a docker image
cd docker
docker build -t panman .
Step 3: Build and run docker container
docker run -it panman
Step 4: Run panmanUtils
# Insider docker container
cd /home/panman/build
./panmanUtils --help
Once the package is installed, PanMANs can be constructed from PanGraph [or GFA or MSA] and Tree topology (Newick format) using panmanUtils. Here we provide examples for constructing PanMANs from PanGraph (JSON) and custom dataset. Alternatively, users can follow the instructions provided in wiki for other methods.
Step 1: Check if sars_20.json
and sars_20.nwk
files exist in test
directory.
Step 2: Run panmanUtils with the following command to build a panman from PanGraph:
cd $PANMAN_HOME/build
./panmanUtils -P $PANMAN_HOME/test/sars_20.json -N $PANMAN_HOME/test/sars_20.nwk -O sars_20
The above command will run panmanUtils program and build sars_20.panman
in $PANMAN_HOME/build/panman
directory.
We provide a Snakemake workflow to construct PanMANs from raw sequences (FASTA format) or from fragment assemblies.
!!!Note The Snakemake workflow uses various tools such as PanGraph tool, PGGB, MAFFT, and MashTree to build input PanGraph, GFA, MSA, and Tree topology files, respectively and it is particularly designed to be used in the docker container build from either the provided docker image or the DockerFile (instructions provided here).
Step 1: Run the following command to construct a panman from raw sequences.
cd $PANMAN_HOME/workflows
conda activate snakemake
snakemake --use-conda --cores 8 --config RUNTYPE="pangraph/gfa/msa" FASTA="[user_input]" SEQ_COUNT="Number of sequences" ASSEM="NONE" REF="NONE" TARGET="NONE"
Step 1: Run the following command to construct a panman from fragment assemblies.
cd $PANMAN_HOME/workflows
conda activate snakemake
snakemake --use-conda --cores 8 --config RUNTYPE="pangraph/gfa/msa" FASTA="None" SEQ_COUNT="Number of sequences" ASSEM="frag" REF="reference_file" TARGET="target.txt"
Here, target.txt includes a list of files that contain the fragmented assemblies.
panmanUtils provide various functionalities such as summary, [Raw sequence, MSA, VCF, GFA] extract, sub-network pruning, and many more. Please refer to wiki for detailed information. Here we provide usage syntax and examples for summary and VCF extract.
The summary feature extracts node and tree level statistics of a PanMAN, that contains a summary of its geometric and parsimony information.
- Usage Syntax
./panmanUtils -I <path to PanMAN file> --summary --output-file=<prefix of output file> (optional)
- Example
cd $PANMAN_HOME/build
./panmanUtils -I panman/sars_20.panman --summary --output-file=sars_20
Extract variations of all sequences from any PanMAT in a PanMAN in the form of a VCF file with respect to any reference sequence (ref) in the PanMAT.
- Usage syntax
./panmanUtils -I <path to PanMAN file> --vcf -reference=ref --output-file=<prefix of output file> (optional)
- Example
cd $PANMAN_HOME/build
./panmanUtils -I panman/sars_20.panman --vcf -reference="Switzerland/SO-ETHZ-500145/2020|OU000199.2|2020-11-12" --output-file=sars_20
We welcome contributions from the community to enhance the capabilities of PanMAN and panmanUtils. If you encounter any issues or have suggestions for improvement, please open an issue on PanMAN GitHub page. For general inquiries and support, reach out to our team.
If you use the PanMANs or panmanUtils in your research or publications, we kindly request that you cite the following paper:
- Sumit Walia, Harsh Motwani, Kyle Smith, Russell Corbett-Detig, Yatish Turakhia, "Compressive Pangenomics Using Mutation-Annotated Networks", bioRxiv 2024.07.02.601807; doi: 10.1101/2024.07.02.601807