Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequences categorized into types of pseudogenes #56

Open
navkahlon240 opened this issue Apr 10, 2023 · 3 comments
Open

Sequences categorized into types of pseudogenes #56

navkahlon240 opened this issue Apr 10, 2023 · 3 comments

Comments

@navkahlon240
Copy link

Hi, Thank you for this awesome pipeline for pseudogenes analysis. I just wanted to know if I can get the fasta sequences categorized as Short, long, fragmented and intergenic sequences. Because, I think it shows the total number of short, long, fragmented and intergenics in log. Is there any way it can give the nucleotide sequences categorized like which sequences are short, long, fragmented, because I am interesting to do further analysis on long sequences.

Thanks.

@mitchso
Copy link
Collaborator

mitchso commented Apr 10, 2023

Hi,

The categorical information for each pseudogene is found in the GFF output file. From there you can identify the locus tags associated with the group of pseudogenes you are interested in analyzing further, and then pull the sequences that correspond to those locus tags from the fasta files.

Hope this helps!
Mitch

@liamfriar
Copy link

Hi,

I also love the tool. The "Reason(s):" list appears to always be blank when the reason is that the feature was input as a pseudogene. It is still relatively easy to parse because of the pseudogene vs. pseudogene candidate designation in the .gff. I bring it up because when I then called re-annotate, it always has 0 input pseudogenes. Maybe that is just how reannotate works, but I thout it might have something to do with the lack of annotation in the .gff file? It looks in "annotate.py" like the pseudogene reason strings are sometimes saved in reason_dict, sometimes as pseudo_reasons, and sometimes as pseudo_candidate_reasons, so maybe these objects aren't all communicating with each other properly?

Thanks again. Great tool!

@mitchso
Copy link
Collaborator

mitchso commented Apr 16, 2023

Thanks for bringing this to my attention! I'll clean up the labelling and data structure soon.
Best,
Mitch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants