-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prune2df warning messages #106
Comments
I am getting the same error message while using the python version of pySCENIC (not the command line version). Is there something that pops up particularly with hg38 ? |
It's normal to have 10s or 100s of these warnings when running the pruning step (and it's a warning, not an error). The cause is just what it states: for a given module that is being pruned, there are not enough genes present that overlap with the database. The module is then excluded from further analysis. |
Doesn't 80% seem a bit rigid? I'd be happy if ~50% of the module targets have motif enrichment. Or is 80% a justified threshold in your experience? Could one for example manually lower this percentage for modules to be included in further analysis? In my dataset an unfortunate number of interesting TFs are excluded consequent to this threshold. Best, |
I would like to chime in here and bump this issue. How can I lower the 80% cutoff? I end up with only 26 transcription factor activity matrix in the end of pyscenic due to this pruning step. |
I would also like to know this, in my dataset is pruning hundreds of regulons, some of them really interesting to me. Thank you |
@RinconFer I just created a pull request #387 for pySCENIC which will allow you to change the cutoff. I can obviously not 100% guarantee that it works as expected but in my test runs it did. You can test it in your setting using the following conda environment: conda create -n pyscenic-test python=3.7 pip git
conda activate pyscenic-test
pip install git+https://github.com/klprint/pySCENIC@relax_module2df |
Thank you very much!! I'll test it as soon as possible and let you know how it goes. |
FYI the parameter for chaning the cutoff is the following. Forgot to add it in my previous message. pyscenic ctx \
.... \
--frac_mapping_module 0.8 \
.... |
Hello, And my code is: |
Thanks for implementing the module cutoff, which is definitely much needed. However, I ran into the following error when setting the cutoff to 0.5: IndexError: boolean index did not match indexed array along dimension 0; dimension is 55 but corresponding boolean dimension is 107 Could you please suggest/advise on how to resolve it? Thanks again. |
@li-xuyang28 I am finding the same error even when running with the default 0.8 cutoff, did you ever manage to get this to run? Cheers! |
Hello, I wanted to bump this as I'm experiencing the same issue as @li-xuyang28 . I've installed the pyscenic-test environment as @klprint described above, but am met with the following error: Any insight is greatly appreciated! |
Having the same issue as @klgoss above, test environment does not contain the argument |
Hey Xuyang, I was wondering have you solve this problems? Many thanks! |
I just solved it, you could check source code and pay attention to "annotated_features" variable which most likely will have duplicated motifIDs. I changed all gene to Ensembl ID based on its Gene name and Gene Synonym retrieved from Ensembl since some genes SCENIC used was synonym rather than gene name. but some genes like Atf5 has a symbol name called Atf7, causing two ENSMUSG00000038539 ~ cisbp__M0302 line in annotated_features variable, which further caused the purne steps "dimension" bug. |
Hi,
thanks for developing this very useful toolkit.
I am wondering, if it is normal/expected to get so many warnings messages while performing
prune2df
.I get warning messages like
pyscenic.transform - WARNING - Less than 80% of the genes in some_gene could be mapped to hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather. Skipping this module.
or
pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for some_regulon could be mapped to hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather. Skipping this module.
I get these warnings with the two hg38-Database files as well as with the six hg19-Database files - so the hg-version of the db does not seem to be the cause. My data is annotated with gencode, so it should be hg38.
The results of the
prun2df()
do look quite good, I am just not sure about the ~29000 warning messages I get in the process.My
prune2df
call looks like thisdf = prune2df(rnkdbs=dbs, modules=modules, motif_annotations_fname=MOTIF_ANNOTATIONS_FNAME, client_or_address="custom_multiprocessing", num_workers=30)
with dbs:
[FeatherRankingDatabase(name="hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather"), FeatherRankingDatabase(name="hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather")]
and
[FeatherRankingDatabase(name="hg19-500bp-upstream-7species.mc9nr.feather"), FeatherRankingDatabase(name="hg19-500bp-upstream-10species.mc9nr.feather"), FeatherRankingDatabase(name="hg19-tss-centered-5kb-7species.mc9nr.feather"), FeatherRankingDatabase(name="hg19-tss-centered-5kb-10species.mc9nr.feather"), FeatherRankingDatabase(name="hg19-tss-centered-10kb-7species.mc9nr.feather"), FeatherRankingDatabase(name="hg19-tss-centered-10kb-10species.mc9nr.feather")]
respectively. I am using
motifs-v9-nr.hgnc-m0.001-o0.0.tbl
as motif annotation andhs_hgnc_curated_tfs.txt
as tfs.I am using pyscenic version 0.9.19.
Thanks in advance.
The text was updated successfully, but these errors were encountered: