Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prune2df warning messages #106

Closed
TobiTekath opened this issue Nov 7, 2019 · 16 comments
Closed

prune2df warning messages #106

TobiTekath opened this issue Nov 7, 2019 · 16 comments

Comments

@TobiTekath
Copy link

Hi,
thanks for developing this very useful toolkit.

I am wondering, if it is normal/expected to get so many warnings messages while performing prune2df.
I get warning messages like

pyscenic.transform - WARNING - Less than 80% of the genes in some_gene could be mapped to hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather. Skipping this module.

or

pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for some_regulon could be mapped to hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather. Skipping this module.

I get these warnings with the two hg38-Database files as well as with the six hg19-Database files - so the hg-version of the db does not seem to be the cause. My data is annotated with gencode, so it should be hg38.

The results of the prun2df() do look quite good, I am just not sure about the ~29000 warning messages I get in the process.


My prune2df call looks like this
df = prune2df(rnkdbs=dbs, modules=modules, motif_annotations_fname=MOTIF_ANNOTATIONS_FNAME, client_or_address="custom_multiprocessing", num_workers=30)

with dbs: [FeatherRankingDatabase(name="hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather"), FeatherRankingDatabase(name="hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather")]

and

[FeatherRankingDatabase(name="hg19-500bp-upstream-7species.mc9nr.feather"), FeatherRankingDatabase(name="hg19-500bp-upstream-10species.mc9nr.feather"), FeatherRankingDatabase(name="hg19-tss-centered-5kb-7species.mc9nr.feather"), FeatherRankingDatabase(name="hg19-tss-centered-5kb-10species.mc9nr.feather"), FeatherRankingDatabase(name="hg19-tss-centered-10kb-7species.mc9nr.feather"), FeatherRankingDatabase(name="hg19-tss-centered-10kb-10species.mc9nr.feather")]

respectively. I am using motifs-v9-nr.hgnc-m0.001-o0.0.tbl as motif annotation and hs_hgnc_curated_tfs.txt as tfs.

I am using pyscenic version 0.9.19.

Thanks in advance.

@alyamahmoud
Copy link

I am getting the same error message while using the python version of pySCENIC (not the command line version). Is there something that pops up particularly with hg38 ?

@TobiTekath
Copy link
Author

Just as an quick update: I see the same warning messages when using the CLI-Version as well as in Jupyter.

@bramvds It would be great, if you could clarify if it is expected to have so many warnings. At least I see other people (#138) experiencing the same Warning messages.

@cflerin
Copy link
Contributor

cflerin commented May 18, 2020

It's normal to have 10s or 100s of these warnings when running the pruning step (and it's a warning, not an error). The cause is just what it states: for a given module that is being pruned, there are not enough genes present that overlap with the database. The module is then excluded from further analysis.

@cflerin cflerin closed this as completed May 18, 2020
@prullens
Copy link

prullens commented Jul 29, 2021

Doesn't 80% seem a bit rigid? I'd be happy if ~50% of the module targets have motif enrichment. Or is 80% a justified threshold in your experience? Could one for example manually lower this percentage for modules to be included in further analysis? In my dataset an unfortunate number of interesting TFs are excluded consequent to this threshold.

Best,

@klprint
Copy link

klprint commented Apr 8, 2022

I would like to chime in here and bump this issue. How can I lower the 80% cutoff? I end up with only 26 transcription factor activity matrix in the end of pyscenic due to this pruning step.

@RinconFer
Copy link

RinconFer commented Apr 11, 2022

I would like to chime in here and bump this issue. How can I lower the 80% cutoff? I end up with only 26 transcription factor activity matrix in the end of pyscenic due to this pruning step.

I would also like to know this, in my dataset is pruning hundreds of regulons, some of them really interesting to me.

Thank you

@klprint
Copy link

klprint commented Apr 12, 2022

@RinconFer I just created a pull request #387 for pySCENIC which will allow you to change the cutoff. I can obviously not 100% guarantee that it works as expected but in my test runs it did.

You can test it in your setting using the following conda environment:

conda create -n pyscenic-test python=3.7 pip git
conda activate pyscenic-test
pip install git+https://github.com/klprint/pySCENIC@relax_module2df

@RinconFer
Copy link

@RinconFer I just created a pull request #387 for pySCENIC which will allow you to change the cutoff. I can obviously not 100% guarantee that it works as expected but in my test runs it did.

You can test it in your setting using the following conda environment:

conda create -n pyscenic-test python=3.7 pip git
conda activate pyscenic-test
pip install git+https://github.com/klprint/pySCENIC@relax_module2df

Thank you very much!!

I'll test it as soon as possible and let you know how it goes.

@klprint
Copy link

klprint commented Apr 13, 2022

FYI the parameter for chaning the cutoff is the following. Forgot to add it in my previous message.

pyscenic ctx \
    .... \
    --frac_mapping_module 0.8 \
    ....

@Beki-seq
Copy link

FYI the parameter for chaning the cutoff is the following. Forgot to add it in my previous message.

pyscenic ctx \
    .... \
    --frac_mapping_module 0.8 \
    ....

Hello,
I meet exactly same issue with you guys. I also tried to use the command you give to solve the problem, however, my pyscenic said the cannot recognize frac_mapping_module 0.8. I am wondering is there any specific order for the --frac_mapping_module comman?

And my code is:
pyscenic ctx
adj.sample.tsv $feather
--annotations_fname $tbl
--frac_mapping_module 0.8
--expression_mtx_fname $input_loom
--mode "dask_multiprocessing"
--output reg.csv
--num_workers 20
--mask_dropouts

@li-xuyang28
Copy link

@RinconFer I just created a pull request #387 for pySCENIC which will allow you to change the cutoff. I can obviously not 100% guarantee that it works as expected but in my test runs it did.

You can test it in your setting using the following conda environment:

conda create -n pyscenic-test python=3.7 pip git
conda activate pyscenic-test
pip install git+https://github.com/klprint/pySCENIC@relax_module2df

Thanks for implementing the module cutoff, which is definitely much needed. However, I ran into the following error when setting the cutoff to 0.5:

IndexError: boolean index did not match indexed array along dimension 0; dimension is 55 but corresponding boolean dimension is 107

Could you please suggest/advise on how to resolve it? Thanks again.

@razorofockham
Copy link

@li-xuyang28 I am finding the same error even when running with the default 0.8 cutoff, did you ever manage to get this to run? Cheers!

@klgoss
Copy link

klgoss commented Nov 16, 2023

Hello, I wanted to bump this as I'm experiencing the same issue as @li-xuyang28 . I've installed the pyscenic-test environment as @klprint described above, but am met with the following error: pyscenic: error: unrecognized arguments: --frac_mapping_module 0.5

Any insight is greatly appreciated!

@LacquerHed
Copy link

Having the same issue as @klgoss above, test environment does not contain the argument --frac_mapping_module

@DiracZhu1998
Copy link

@RinconFer I just created a pull request #387 for pySCENIC which will allow you to change the cutoff. I can obviously not 100% guarantee that it works as expected but in my test runs it did.
You can test it in your setting using the following conda environment:

conda create -n pyscenic-test python=3.7 pip git
conda activate pyscenic-test
pip install git+https://github.com/klprint/pySCENIC@relax_module2df

Thanks for implementing the module cutoff, which is definitely much needed. However, I ran into the following error when setting the cutoff to 0.5:

IndexError: boolean index did not match indexed array along dimension 0; dimension is 55 but corresponding boolean dimension is 107

Could you please suggest/advise on how to resolve it? Thanks again.

Hey Xuyang, I was wondering have you solve this problems? Many thanks!

@DiracZhu1998
Copy link

@RinconFer I just created a pull request #387 for pySCENIC which will allow you to change the cutoff. I can obviously not 100% guarantee that it works as expected but in my test runs it did.
You can test it in your setting using the following conda environment:

conda create -n pyscenic-test python=3.7 pip git
conda activate pyscenic-test
pip install git+https://github.com/klprint/pySCENIC@relax_module2df

Thanks for implementing the module cutoff, which is definitely much needed. However, I ran into the following error when setting the cutoff to 0.5:

IndexError: boolean index did not match indexed array along dimension 0; dimension is 55 but corresponding boolean dimension is 107

Could you please suggest/advise on how to resolve it? Thanks again.

Screenshot 2024-01-31 at 20 18 52

I just solved it, you could check source code and pay attention to "annotated_features" variable which most likely will have duplicated motifIDs. I changed all gene to Ensembl ID based on its Gene name and Gene Synonym retrieved from Ensembl since some genes SCENIC used was synonym rather than gene name. but some genes like Atf5 has a symbol name called Atf7, causing two ENSMUSG00000038539 ~ cisbp__M0302 line in annotated_features variable, which further caused the purne steps "dimension" bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests