Skip to content

PanPhlAn mapping

Leonard Dubois edited this page May 18, 2020 · 1 revision

panphlan_map.py requires bowtie2 and samtools in order to map metagenomic samples against the pangenome using data generated by panphlan_new_pangenome_generation.py (bowtie2 indexes and cancatenated .fna file). The function must be called once for each sample file. Output generated can finaly be analyzed by panphlan_profile.py

Example:
panphlan/panphlan_map.py -c erectale -i sample01.tar.gz -o map_results/sample01_erectale.csv

Input

  • -c CLADE_NAME to specify the species database.
  • -i INPUT_FILE input path to a metagenomic sample. The following file formats are accepted: .fastq, .fastq.gz, .fastq.bz2, .tar.gz, .tar.bz2, and .sra.

Output

If no --output argument is provided, the default value map_results will lead to the creation of the map_results/ folder. In this folder :

  • a mapping result file named INPUT_FILE_CLADE_NAME.csv

Help -h

./panphlan/panphlan_map.py -h
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input INPUT_FILE
                        File(s) containing the unpaired reads to be aligned
                        using Bowtie2. If not specified, Bowtie2 gets the read
                        from the stdin filehandle.
  --i_bowtie2_indexes INPUT_BOWTIE2_INDEXES
                        Input directory of bowtie2 indexes and pangenome
  --fastx FASTX_FORMAT  Read input format (fasta or fastq), default: fastq, if
                        not fasta recognized by file ending.
  -c CLADE_NAME, --clade CLADE_NAME
                        Name of the specie to consider, i.e. the basename of
                        the index for the reference genome used by Bowtie2 to
                        align reads.
  -o OUTPUT_FILE, --output OUTPUT_FILE
                        Mapping result output-file: path/sampleID_clade.csv
  --th_mismatches NUMOF_MISMATCHES
                        Number of mismatches to filter.
  -p NUMOF_PROCESSORS, --nproc NUMOF_PROCESSORS
                        Maximum number of processors to use. Default value is
                        the minimum between 12 and the number of available
                        processors.
  -b OUTPUT_BAM_FILE, --out_bam OUTPUT_BAM_FILE
                        Forces the name of the BAM file generated by the
                        Samtools pipeline.
  -m MEMORY_GIGABTES_FOR_SAMTOOLS, --mGB MEMORY_GIGABTES_FOR_SAMTOOLS
                        Maximum amount of memory we get available for
                        Samtools.
  --readLength READS_LENGTH
                        Minimum read length.
  --tmp TEMP_FOLDER     Alternative folder for temporary files.
  --verbose             Defines if the standard output must be verbose or not.
  -v, --version         Prints the current PanPhlAn version and exits.