-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(Execution workflows / Utils): making RefSeq annotations work for QAPA and PAQR. #457
Comments
Hi @mrgazzara, I try to look into PAQR, but it's not quite clear yet what the adaptation of refseq to gencode needs to look like.
which is for the gencode annotation and could be replaced by gbkey (=mRNA) or gene_biotype (but this is only on type, col 3 in gff). Those seem to be the replacement keys.
But you mentioned that you would need to change code within. Can you specify what exactly? What is assumed in the gtf that is not present in the refseq annotation? What I see in tandem pas, is that it looks for |
Hey @mrgazzara, Could you maybe specify in more detail what seems to be the problem? Or provide your refseq annotation somewhere? |
Sorry for the delays. Getting back to APAeval now. The RefSeq annotations Farica and I used for running/re-running the other tools were from UCSC genome browser. Taking another look they were actually the "refGene" annotations, which seem to be RefSeq but just for annotated protein coding and non-coding genes (accessions with NM_* and NR_*, see documentation here. These files can be downloaded through UCSC ftp for human hg38.refGene.gtf.gz or mouse mm10.refGene.gtf.gz. This file format seemed to be to most closely match the ones DaPars used as their recommended input. I can try the file CJ linked to to see if it works with PAQR / QAPA (I think it might) and we can either use this as is, subset this file, or adapt the refGene gtf to have the necessary fields. |
In trying to run the more conservative RefSeq annotation on certain tools, I've run in to an issue where the tools assume specific attributes (e.g. Ensembl type attributes for QAPA or Ensembl/Gencode for PAQR) to be present in order to build their annotation.
I need to check a bit deeper to make sure I gather all the requirements. Not sure if it is best to make a Util script to transform a RefSeq gtf so it contains theses required Ensembl/Gencode attributes or if we should add an option within each EWF to handle this.
The text was updated successfully, but these errors were encountered: