Skip to content

TurakhiaLab/panman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License DOI Build Status

Pangenome Mutation Annotated Network (PanMAN)

Table of Contents

Introduction

Here we provide an overview of PanMAN, panmanUtils, and its installation methods and usage. For more information please see our Wiki.

What is a PanMAN?

PanMAN or Pangenome Mutation-Annotated Network is a novel data representation for pangenomes that provides massive leaps in both representative power and storage efficiency. Specifically, PanMANs are composed of mutation-annotated trees, called PanMATs, which, in addition to substitutions, also annotate inferred indels (Fig. 1b), and even structural mutations (Fig. 1a) on the different branches. Multiple PanMATs are connected in the form of a network using edges to generate a PanMAN (Fig. 1c). PanMAN's representative power is compared against existing pangenomic formats in Fig. 1d. PanMANs are the most compressible pangenomic format for the different microbial datasets (SARS-CoV-2, RSV, HIV, Mycobacterium. Tuberculosis, E. Coli, and Klebsiella pneumoniae), providing 2.9 to 559-fold compression over standard pangenomic formats.

Figure 1: Overview of the PanMAN data structure

panmanUtils

panmanUtils includes multiple algorithms to construct PanMANs and to support various functionalities to modify and extract useful information from PanMANs (Fig. 2).

Figure 2: Overview of panmanUtils' functionalities

Installation

Using installation script (requires sudo access)

Step 0: Dependencies

Git

Step 1: Clone the repository

git https://github.com/TurakhiaLab/panman.git
cd panman

Step 2: Run the installation script

chmod +x install/installationUbuntu.sh
./install/installationUbuntu.sh

Step 3: Run panmanUtils

cd build
./panmanUtils --help

Using Docker Image

To use panmanUtils in a docker container, users can create a docker container from a docker image, by following these steps

Step 0: Dependencies

Docker

Step 1: Pull the PanMAN docker image from DockerHub

docker pull swalia14/panman:latest

Step 2: Build and run the docker container

docker run -it swalia14/panman:latest

Step 3: Run panmanUtils

# Insider docker container
cd /home/panman/build
./panmanUtils --help

Using DockerFile

Docker container with preinstalled panmanUtils can also be built from DockerFile by following these steps

Step 0: Dependencies

Docker
Git

Step 1: Clone the repository

git https://github.com/TurakhiaLab/panman.git
cd panman

Step 2: Build a docker image

cd docker
docker build -t panman .

Step 3: Build and run docker container

docker run -it panman

Step 4: Run panmanUtils

# Insider docker container
cd /home/panman/build
./panmanUtils --help

PanMAN Construction

Once the package is installed, PanMANs can be constructed from PanGraph [or GFA or MSA] and Tree topology (Newick format) using panmanUtils. Here we provide examples for constructing PanMANs from PanGraph (JSON) and custom dataset. Alternatively, users can follow the instructions provided in wiki for other methods.

Building PanMAN from PanGraph

Step 1: Check if sars_20.json and sars_20.nwk files exist in test directory.

Step 2: Run panmanUtils with the following command to build a panman from PanGraph:

cd $PANMAN_HOME/build
./panmanUtils -P $PANMAN_HOME/test/sars_20.json -N $PANMAN_HOME/test/sars_20.nwk -O sars_20

The above command will run panmanUtils program and build sars_20.panman in $PANMAN_HOME/build/panman directory.

Building PanMAN from raw sequences or fragment assemblies using Snakemake Workflow

We provide a Snakemake workflow to construct PanMANs from raw sequences (FASTA format) or from fragment assemblies.

!!!Note The Snakemake workflow uses various tools such as PanGraph tool, PGGB, MAFFT, and MashTree to build input PanGraph, GFA, MSA, and Tree topology files, respectively and it is particularly designed to be used in the docker container build from either the provided docker image or the DockerFile (instructions provided here).

Building PanMAN from raw genome sequences

Step 1: Run the following command to construct a panman from raw sequences.

cd $PANMAN_HOME/workflows
conda activate snakemake
snakemake --use-conda --cores 8 --config RUNTYPE="pangraph/gfa/msa" FASTA="[user_input]" SEQ_COUNT="Number of sequences" ASSEM="NONE" REF="NONE" TARGET="NONE"

Building PanMAN from fragment assemblies

Step 1: Run the following command to construct a panman from fragment assemblies.

cd $PANMAN_HOME/workflows
conda activate snakemake
snakemake --use-conda --cores 8 --config RUNTYPE="pangraph/gfa/msa" FASTA="None" SEQ_COUNT="Number of sequences" ASSEM="frag" REF="reference_file" TARGET="target.txt"

Here, target.txt includes a list of files that contain the fragmented assemblies.

panmanUtils functionalities

panmanUtils provide various functionalities such as summary, [Raw sequence, MSA, VCF, GFA] extract, sub-network pruning, and many more. Please refer to wiki for detailed information. Here we provide usage syntax and examples for summary and VCF extract.

Summary extract

The summary feature extracts node and tree level statistics of a PanMAN, that contains a summary of its geometric and parsimony information.

  • Usage Syntax
./panmanUtils -I <path to PanMAN file> --summary --output-file=<prefix of output file> (optional)
  • Example
cd $PANMAN_HOME/build
./panmanUtils -I panman/sars_20.panman  --summary --output-file=sars_20

Variant Call Format (VCF) extract

Extract variations of all sequences from any PanMAT in a PanMAN in the form of a VCF file with respect to any reference sequence (ref) in the PanMAT.

  • Usage syntax
./panmanUtils -I <path to PanMAN file> --vcf -reference=ref --output-file=<prefix of output file> (optional) 
  • Example
cd $PANMAN_HOME/build
./panmanUtils -I panman/sars_20.panman --vcf -reference="Switzerland/SO-ETHZ-500145/2020|OU000199.2|2020-11-12" --output-file=sars_20 

Contribute

We welcome contributions from the community to enhance the capabilities of PanMAN and panmanUtils. If you encounter any issues or have suggestions for improvement, please open an issue on PanMAN GitHub page. For general inquiries and support, reach out to our team.

Citing PanMAN

If you use the PanMANs or panmanUtils in your research or publications, we kindly request that you cite the following paper:

  • Sumit Walia, Harsh Motwani, Kyle Smith, Russell Corbett-Detig, Yatish Turakhia, "Compressive Pangenomics Using Mutation-Annotated Networks", bioRxiv 2024.07.02.601807; doi: 10.1101/2024.07.02.601807