Issues while anaysing highly similar genomes #6

mnshgl0110 · 2022-11-09T16:59:56Z

I think there is some incompleteness in the pansyri.pansyn.find_overlaps as it is giving me error when I try to get pansyntenic region with two highly similar (actually simulated) query genomes.
The files are here:
/srv/netscratch/dep_mercier/grp_schneeberger/projects/syri2/results/human/simulatedgenomes/chr22

syns, alns = util.parse_input_tsv('genomes.tsv')
df = util.coresyn_from_lists(syns, alns, SYNAL=False)
Traceback (most recent call last):
  File "/srv/netscratch/dep_mercier/grp_schneeberger/software/anaconda3_2021/envs/mgpy3.8/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3398, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-59-90a0de3ea250>", line 1, in <cell line: 1>
    df = util.coresyn_from_lists(syns, alns, SYNAL=False)
  File "pansyri/pyxfiles/util.pyx", line 75, in pansyri.util.coresyn_from_lists
  File "pansyri/pyxfiles/pansyn.pyx", line 262, in pansyri.pansyn.find_multisyn
  File "pansyri/pyxfiles/pansyn.pyx", line 132, in pansyri.pansyn.find_overlaps
  File "pansyri/pyxfiles/util.pyx", line 85, in pansyri.util.get_orgs_from_df
TypeError: reduce() of empty sequence with no initial value

We can discuss it when you have some time.

The text was updated successfully, but these errors were encountered:

lrauschning · 2022-11-09T23:19:59Z

(reposting here, as replying to the email seems not to have worked)

Hi Manish,
I got the same error message yesterday when I was fixing an issue related to the parallelization, which I discovered while benchmarking.
The error is fixed now and when I tried with the latest commit in the repo (branch leon, now also merged to master),
pansyri -i genomes.tsv --sp --syn
did not throw an error.
Let me know if there are still issues when running the current version!
Cheers,
Leon

mnshgl0110 · 2022-11-11T13:43:54Z

Hi Leon,
So, this is the current status:

import pansyri.util as util
from pansyri.pansyn import find_multisyn
syns, alns = util.parse_input_tsv('genomes.tsv')
df = util.coresyn_from_lists(syns, alns, SYNAL=False) # Does not work
df = find_multisyn(syns, alns, SYNAL=False) # Works but give crosssyn as well
df = find_multisyn(syns, alns, SYNAL=False, only_core=True) # Does not work

We need to ensure that this is working for all use cases.

mnshgl0110 · 2022-11-11T15:45:46Z

It seems that this issue is caused when pansyri does not like the input file names in the genomes.tsv, specifically how the bam/syri.out files are named.

lrauschning · 2022-11-11T16:15:05Z

I can reproduce the error. It's weird that this only arises when calling core synteny.
On the ampril dataset, all combinations work.
I'll look more into this later.

lrauschning · 2022-11-16T20:01:34Z

Okay, I think i might have fixed what is happening in c565b85.
There was still some code specific to testing on the ampril dataset in there that was also causing some other issues.

mnshgl0110 · 2022-11-17T13:40:00Z

Earlier, it seemed to be working when the filenames were ref_qry1.bam' and ref_qry2.bam`, but not when they were something else. Were you able to reproduce and possibly fix that?

lrauschning · 2022-11-18T13:26:01Z

I think this commit should fix the need for this filename format (it was hardcoded to match the names in the Ampril dataset).
I'll try to reproduce it and see if normal naming works in the next few days.

lrauschning · 2022-11-25T10:49:14Z

Ah, sorry I forgot to test it again after the commit. My account for the HPC at Cologne is expired now, I'll test it again when I get the account renewed. Testing locally, everything works on the ampril dataset, but that's not really a surprise.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues while anaysing highly similar genomes #6

Issues while anaysing highly similar genomes #6

mnshgl0110 commented Nov 9, 2022

lrauschning commented Nov 9, 2022 •

edited

Loading

mnshgl0110 commented Nov 11, 2022

mnshgl0110 commented Nov 11, 2022 •

edited

Loading

lrauschning commented Nov 11, 2022

lrauschning commented Nov 16, 2022

mnshgl0110 commented Nov 17, 2022

lrauschning commented Nov 18, 2022

lrauschning commented Nov 25, 2022

Issues while anaysing highly similar genomes #6

Issues while anaysing highly similar genomes #6

Comments

mnshgl0110 commented Nov 9, 2022

lrauschning commented Nov 9, 2022 • edited Loading

mnshgl0110 commented Nov 11, 2022

mnshgl0110 commented Nov 11, 2022 • edited Loading

lrauschning commented Nov 11, 2022

lrauschning commented Nov 16, 2022

mnshgl0110 commented Nov 17, 2022

lrauschning commented Nov 18, 2022

lrauschning commented Nov 25, 2022

lrauschning commented Nov 9, 2022 •

edited

Loading

mnshgl0110 commented Nov 11, 2022 •

edited

Loading