Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Globally set RDKit to include all properties when pickling an RDKit Mol #344

Merged
merged 17 commits into from
Dec 18, 2024

Conversation

dotsdl
Copy link
Member

@dotsdl dotsdl commented Sep 11, 2024

Closes #322.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see more tests on this please - it's not immediately clear to me that this doesn't change the behaviour and I would like to be sure it doesn't.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant this as a request changes.

Copy link

codecov bot commented Oct 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.50%. Comparing base (128da4a) to head (84aa941).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #344   +/-   ##
=======================================
  Coverage   98.50%   98.50%           
=======================================
  Files          37       37           
  Lines        2143     2147    +4     
=======================================
+ Hits         2111     2115    +4     
  Misses         32       32           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dotsdl I think we can go with this one, but it would be good to a) make sure we aren't creating big file (especially with ProteinComponents) - I could see this becoming an issue when you have large PDBs (maybe a fully solvated system?), b) it doesn't break any backwards compatibility.

I'm not 100% sure on the latter, it shouldn't have an impact, but we'd need some kind of migration test (maybe something you already do in alchemiscale?). The former should be easy, we can take a big PDB file, serialize it, and see what breaks?

@dotsdl
Copy link
Member Author

dotsdl commented Oct 29, 2024

@IAlibay thanks for this! This should have no impact on files, since we don't/can't support pickle files as a stable form of serialization written to disk. It is important, however, when using tools like multiprocessing.ProcePool, concurrent.futures.ProcessPoolExecutor, or dask.distributed, as these often rely on pickling for transmission of Python objects between processes.

@dotsdl
Copy link
Member Author

dotsdl commented Oct 29, 2024

As an alternative, we could switch ExplicitMoleculeComponent to use PropertyMol instead of Mol as the class for its _rdkit attribute. That will always serialize its properties, and avoids us setting this behavior globally on rdkit itself for all Mol objects in the Python session.

See this comment for explanation: rdkit/rdkit#6573 (comment)

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably needs a test.

@@ -9,6 +9,10 @@
# typing
from ..custom_typing import RDKitMol

# globally set RDKit to include all properties when pickling an RDKit Mol
# see issue #322: https://github.com/OpenFreeEnergy/gufe/issues/322
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a pickle test please? If that's the use case we're targeting then we should have a test for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do! Good call.

@dotsdl
Copy link
Member Author

dotsdl commented Oct 29, 2024

I will add a test and a logger warning to the global settings change, and draft a follow-up issue proposing the use of PropertyMols instead as a less-global, more targeted solution.

@dotsdl
Copy link
Member Author

dotsdl commented Dec 13, 2024

@IAlibay this is ready for another review. Instead of setting global state, we decided to only issue a warning when pickling is attempted on an ExplicitMoleculeComponent and the rdkit pickle behavior is not set to preserve all properties.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm @dotsdl - however does it truly close #322? As in that issue remains active, would it be worth keeping it open?

@dotsdl
Copy link
Member Author

dotsdl commented Dec 18, 2024

@IAlibay I'm in favor of letting this PR close #322, since it amounts to our answer to that issue at this time. If this problem rears its head again, we can consider another approach and reopen #322.

@dotsdl
Copy link
Member Author

dotsdl commented Dec 18, 2024

pre-commit.ci autofix

@dotsdl dotsdl merged commit 703f151 into main Dec 18, 2024
9 of 11 checks passed
@dotsdl dotsdl deleted the bugfix/rdkit-mol-pickle branch December 18, 2024 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RDKit Mol doesn't preserve atom properties when pickled; impacts ExplicitMoleculeComponent and subclasses
2 participants