Skip to content

7. Join separate exons

Tobias Hofmann edited this page Oct 26, 2015 · 5 revisions

join_exons.py
In this step we join all the separate exons sequences for each gene. This script simply transforms all the separate exon-alignments belonging to the same gene. The alignment files need to be in fasta format and have to be named in this manner locusname_exon.fasta. The script simply joins all available exon sequences for each gene for each sample and fills the missing sequences up with missing data. This reduces the number of alignment files considerably and may be preferred for further phylogenetic analyses. Each gene is usually considered one coherent locus that shares the same gene-tree, as it is inherited as one unit and is therefore the smallest sensible unit for phylogenetic inference.

Run the script

This script is very easy to run. You simply provide the path to the folder containing the alignments files in fasta format (--input) and specify where you want to store the joined alignments (--output), which will also be in fasta format.
#####Example
python2.7 join_exons.py --input path/to/alignment-folder/fasta --output path/to/alignment-folder/joined/fasta