You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I'm not mistaken, there is a problem with the integration of Pfam domains into the final GFF file. It seems that when a transcript contains introns, the coordinates of the domain are not correctly spliced and offset.
Below is an example:
There are 3 Pfam domains in a transcript with multiple exons. The first domain, PF02861, has coordinates 277134-277217 and sits between the first intron and the second CDS (277188-277561). The 3rd domain, PF00004, is fully contained in an intron. I think a protein domain can include only CDS regions (possibly more than one).
If I'm not mistaken, there is a problem with the integration of Pfam domains into the final GFF file. It seems that when a transcript contains introns, the coordinates of the domain are not correctly spliced and offset.
Below is an example:
There are 3 Pfam domains in a transcript with multiple exons. The first domain, PF02861, has coordinates 277134-277217 and sits between the first intron and the second CDS (277188-277561). The 3rd domain, PF00004, is fully contained in an intron. I think a protein domain can include only CDS regions (possibly more than one).
These are all the features of this gene:
If any useful, this script https://github.com/glaParaBio/genomeAnnotationPipeline/blob/master/scripts/add_hmmsearch_to_gff.py should properly integrate the output of hmmsearch/hmmscan into a gff (not extensively tested!)
The text was updated successfully, but these errors were encountered: