Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues filtering for datasource (in Webservice, in R) and in Cytoscape? #33

Open
1 of 7 tasks
DeniseSl22 opened this issue Mar 8, 2023 · 11 comments
Open
1 of 7 tasks

Comments

@DeniseSl22
Copy link

DeniseSl22 commented Mar 8, 2023

@tabbassidaloii and me have been checking the issues with the xref batch query.

Issue reported by @egonw:

https://webservice.bridgedb.org/Human/xrefs/S/O14494?dataSource=L

this seems to ignore the ?dataSource=L parameter.

Issue reproduced by @tabbassidaloii and @DeniseSl22 ; we believe the parameter is not ignored, but that the SystemCodes are present in the mapping files, but not correctly read in by the BridgeDb libraries.

@tabbassidaloii also tried out different mapping files in R (91, 104, 105, 107) through the BridgeDbR-package (v2.8.0 **rJava_1.0-6; according to GitHub using BridgeDb libraries: [BridgeDb 3.0.19 and Derby 10.15.2]) and all these version give the same issue, when defining the datasource to map to parameter:

map(mapper, "H", "VGF", "L")
Error in .jcall("org/bridgedb/DataSource", "Lorg/bridgedb/DataSource;",  : 
  java.lang.IllegalArgumentException: No DataSource known for the Bioregistry.io prefix L

The first known "bioregistry" addition is mentioned in the BridgeDb 3.0.14 release.
@egonw reset the Webservice back to BridgeDb 3.0.13 (which seemed to solve another issue).

We don't know if this issue is related to the other issues we're seeing for the GeneProtein_107 release (not being able to search for HGNC symbol in PV, this does work in 104, not in 105). Our suggestions:

  • 1. Revert the webservice data back to version Ensembl104 (for now) @egonw
  • 2. Revert the GeneProtein mapping file download page of BridgeDb back to 104 @tabbassidaloii
  • 3. Download BridgeDb java code (v3.0.13), maven build, update GeneProtein mapping file code with these libraries (Ensembl 104, 107), run code again, check if issue is persistent (check for Hs only for now!)) @tabbassidaloii
  • 4. Revert BridgeDbR code back to older version (using BridgeDb libraries older than 3.0.13) to test if issue above is gone @tabbassidaloii and/or @egonw
  • 5. If previous point solves issues, revert BridgeDbR code back to BridgeDb library 3.0.13 @egonw
  • 6. Start making official releases on GitHub for BridgeDbR code, so we can easily go back to older version @egonw
  • 7. Check with @AlexanderPico if NCBI gene mapping to Ensembl in Cytoscape are (still) an issue, and if our suggestions above solve these @DeniseSl22
@egonw
Copy link
Member

egonw commented Mar 8, 2023

@DeniseSl22, @tabbassidaloii, for the "BridgeDb back to 104" step, please update the JSON files accordingly in the https://github.com/bridgedb/data repository

@tabbassidaloii
Copy link
Member

BridgeDb back to 104

The PR is sent

@tabbassidaloii
Copy link
Member

tabbassidaloii commented Mar 8, 2023

regarding using an older version of BridgeDb (point 3):
I have checked the dependencies for creating gene/protein derby files, and I noticed I have not updated that. It is even an older version of BridgeDb (3.0.6). And it has been the same for all the releases (v103 to v107). So that would not cause the problem. What do you @egonw @DeniseSl22 think?
I have opened different versions of Hs derby files (v103, 104, and 107) in squirrel and their structures seem to be similar.
What else can be checked?

@DeniseSl22
Copy link
Author

@tabbassidaloii : could you maybe run the script for Hs 107 again, and make sure all the java libraries are version 3.0.13
(check in pom.xml?). I could create a local version of PV 3.3, with new BridgeDb libraries (also 3.0.13), and see if that resolves the issues. If not, we might be looking at this from the wrong perspective (and might need to check Ensembl?), or we would have to go back to an older version of BridgeDb, and than go from there to see what might be causing the issues. In the meantime, I can create a new metabolite mapping file (which uses BridgeDb 3.0.13), and see if I'm getting the same issue in PV regarding the lookup of names.

@tabbassidaloii
Copy link
Member

@DeniseSl22, I tried to reproduce v104 file again (as it was correct), but the new derby file has the same issue (cannot be searched with gene symbols in PV 3). I am checking all the steps one by one (reviewing all the minor changes) to find the issue. I will try also what you suggested as well. I am documenting all the checks so we can make sure we don't miss anything.

@DeniseSl22
Copy link
Author

@DeniseSl22, I tried to reproduce v104 file again (as it was correct), but the new derby file has the same issue (cannot be searched with gene symbols in PV 3). I am checking all the steps one by one (reviewing all the minor changes) to find the issue. I will try also what you suggested as well. I am documenting all the checks so we can make sure we don't miss anything.

This is getting stranger and stranger.... *sighs.... Could you share the new 104 version with me that you just created? Than I can double check if I see the same behaviour.... And maybe a zipped file of the sourcecode for the GeneProtein generation?

@tabbassidaloii
Copy link
Member

This is getting stranger and stranger.... *sighs.... Could you share the new 104 version with me that you just created? Than I can double check if I see the same behaviour.... And maybe a zipped file of the sourcecode for the GeneProtein generation?

Indeed. I will share them on slack.

@tabbassidaloii
Copy link
Member

tabbassidaloii commented Mar 10, 2023

The issue of not being able to search the database in PV using gene names was because of a minor change we made a while ago to fix an error. But we did not oversee the problem it may cause.

While generating the database for Zm (v52), we got the error below:

Attribute external_gene_name NOT FOUND

To solve this, we changed line 157 in QueryBioMart.java from geneId.setAttribute("name", "external_gene_id"); to
geneId.setAttribute("name", "ensembl_gene_id");

So a search was only possible using Ensembl gene id.

Now I have changed it to

if (config.getSpecies().equals("zmays_eg_gene")) {
   geneId.setAttribute("name", "external_gene_name");
else {
   geneId.setAttribute("name", "ensembl_gene_id");
}

So the database for species with gene name (external_gene_name) attribute could be searched using gene names.

@egonw
Copy link
Member

egonw commented Mar 10, 2023

Thank you for debugging the issue!

@mkutmon
Copy link

mkutmon commented Mar 10, 2023

Thanks, @tabbassidaloii!

@egonw egonw transferred this issue from bridgedb/BridgeDb Jul 25, 2024
@egonw egonw transferred this issue from bridgedb/BridgeDbWebservice Jul 25, 2024
@egonw
Copy link
Member

egonw commented Jul 25, 2024

See also bridgedb/BridgeDbWebservice#29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants