You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, Thank you for this awesome pipeline for pseudogenes analysis. I just wanted to know if I can get the fasta sequences categorized as Short, long, fragmented and intergenic sequences. Because, I think it shows the total number of short, long, fragmented and intergenics in log. Is there any way it can give the nucleotide sequences categorized like which sequences are short, long, fragmented, because I am interesting to do further analysis on long sequences.
Thanks.
The text was updated successfully, but these errors were encountered:
The categorical information for each pseudogene is found in the GFF output file. From there you can identify the locus tags associated with the group of pseudogenes you are interested in analyzing further, and then pull the sequences that correspond to those locus tags from the fasta files.
I also love the tool. The "Reason(s):" list appears to always be blank when the reason is that the feature was input as a pseudogene. It is still relatively easy to parse because of the pseudogene vs. pseudogene candidate designation in the .gff. I bring it up because when I then called re-annotate, it always has 0 input pseudogenes. Maybe that is just how reannotate works, but I thout it might have something to do with the lack of annotation in the .gff file? It looks in "annotate.py" like the pseudogene reason strings are sometimes saved in reason_dict, sometimes as pseudo_reasons, and sometimes as pseudo_candidate_reasons, so maybe these objects aren't all communicating with each other properly?
Hi, Thank you for this awesome pipeline for pseudogenes analysis. I just wanted to know if I can get the fasta sequences categorized as Short, long, fragmented and intergenic sequences. Because, I think it shows the total number of short, long, fragmented and intergenics in log. Is there any way it can give the nucleotide sequences categorized like which sequences are short, long, fragmented, because I am interesting to do further analysis on long sequences.
Thanks.
The text was updated successfully, but these errors were encountered: