Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues while anaysing highly similar genomes #6

Open
mnshgl0110 opened this issue Nov 9, 2022 · 8 comments
Open

Issues while anaysing highly similar genomes #6

mnshgl0110 opened this issue Nov 9, 2022 · 8 comments

Comments

@mnshgl0110
Copy link
Member

I think there is some incompleteness in the pansyri.pansyn.find_overlaps as it is giving me error when I try to get pansyntenic region with two highly similar (actually simulated) query genomes.
The files are here:
/srv/netscratch/dep_mercier/grp_schneeberger/projects/syri2/results/human/simulatedgenomes/chr22

syns, alns = util.parse_input_tsv('genomes.tsv')
df = util.coresyn_from_lists(syns, alns, SYNAL=False)
Traceback (most recent call last):
  File "/srv/netscratch/dep_mercier/grp_schneeberger/software/anaconda3_2021/envs/mgpy3.8/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3398, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-59-90a0de3ea250>", line 1, in <cell line: 1>
    df = util.coresyn_from_lists(syns, alns, SYNAL=False)
  File "pansyri/pyxfiles/util.pyx", line 75, in pansyri.util.coresyn_from_lists
  File "pansyri/pyxfiles/pansyn.pyx", line 262, in pansyri.pansyn.find_multisyn
  File "pansyri/pyxfiles/pansyn.pyx", line 132, in pansyri.pansyn.find_overlaps
  File "pansyri/pyxfiles/util.pyx", line 85, in pansyri.util.get_orgs_from_df
TypeError: reduce() of empty sequence with no initial value

We can discuss it when you have some time.

@lrauschning
Copy link
Collaborator

lrauschning commented Nov 9, 2022

(reposting here, as replying to the email seems not to have worked)

Hi Manish,
I got the same error message yesterday when I was fixing an issue related to the parallelization, which I discovered while benchmarking.
The error is fixed now and when I tried with the latest commit in the repo (branch leon, now also merged to master),
pansyri -i genomes.tsv --sp --syn
did not throw an error.
Let me know if there are still issues when running the current version!
Cheers,
Leon

@mnshgl0110
Copy link
Member Author

Hi Leon,
So, this is the current status:

import pansyri.util as util
from pansyri.pansyn import find_multisyn
syns, alns = util.parse_input_tsv('genomes.tsv')
df = util.coresyn_from_lists(syns, alns, SYNAL=False) # Does not work
df = find_multisyn(syns, alns, SYNAL=False) # Works but give crosssyn as well
df = find_multisyn(syns, alns, SYNAL=False, only_core=True) # Does not work

We need to ensure that this is working for all use cases.

@mnshgl0110
Copy link
Member Author

mnshgl0110 commented Nov 11, 2022

It seems that this issue is caused when pansyri does not like the input file names in the genomes.tsv, specifically how the bam/syri.out files are named.

@lrauschning
Copy link
Collaborator

I can reproduce the error. It's weird that this only arises when calling core synteny.
On the ampril dataset, all combinations work.
I'll look more into this later.

@lrauschning
Copy link
Collaborator

Okay, I think i might have fixed what is happening in c565b85.
There was still some code specific to testing on the ampril dataset in there that was also causing some other issues.

@mnshgl0110
Copy link
Member Author

Earlier, it seemed to be working when the filenames were ref_qry1.bam' and ref_qry2.bam`, but not when they were something else. Were you able to reproduce and possibly fix that?

@lrauschning
Copy link
Collaborator

I think this commit should fix the need for this filename format (it was hardcoded to match the names in the Ampril dataset).
I'll try to reproduce it and see if normal naming works in the next few days.

@lrauschning
Copy link
Collaborator

Ah, sorry I forgot to test it again after the commit. My account for the HPC at Cologne is expired now, I'll test it again when I get the account renewed. Testing locally, everything works on the ampril dataset, but that's not really a surprise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants