Multistatereporter variable pos frequency #767

ianmkenney · 2025-01-16T18:33:37Z

Continuation of #712. ~~There are multiple avenues to address the current failing tests. I would like to address those concerns here before continuing in that PR.~~

…rent frequency to energies data.

currently no pos test is failing...

ianmkenney · 2025-01-17T18:36:34Z

This PR will take over #712.

Accessing data from netCDF returns a masked array, even if no data is masked. For valid data, extract it and give back just the basic numpy array. For invalid data, give back a nan array with the proper shape.

codecov · 2025-01-17T19:21:42Z

Codecov Report

Attention: Patch coverage is 98.23009% with 2 lines in your changes missing coverage. Please review.

Project coverage is 84.91%. Comparing base (f3f355a) to head (c211252).

Additional details and impacted files

ianmkenney · 2025-01-17T19:24:22Z

@ijpulidos @IAlibay @mikemhenry I can't request reviewers, but I think this is ready for review.

ijpulidos

This looks good to me. Thanks a lot! Great work. Just a couple of non-blocking comments.

ijpulidos · 2025-01-17T20:12:10Z

openmmtools/multistate/multistatereporter.py

+                        raise IndexError
+                    # pull the valid data out of masked array
+                    positions = unit.Quantity(x.data, unit.nanometers)
+                except (IndexError, KeyError):


Non-blocking comment. Should we try outputting something here when an error is caught? Maybe something warning users that there's missing data or something similar. What are the reasons for the array to be masked?

On the topic of how MaskedArrays work here. Assuming that a particular coordinate exists in a dimension (e.g. at least some data exists for a given iteration) we always get back a MaskedArray. The MaskedArray will have a data attribute with a numpy array that will be masked by the MaskedArrays mask attribute. If the data was all valid, then we just pull that information out and expose it to the user. If no data exists at that dimension coordinate, then we'll just get an IndexError so we give back a zero array in place of the nonsense data inside the MaskedArray. By doing it this way, a user never has to deal with a MaskedArray.

I think that if the data was masked it's basically the same behavior as if the data didn't exist and we would have expected a zero array anyways. I don't really see much of a reason to give a warning about this since having a giant array of zeros is kind of a warning already. I'd even be careful saying the data is missing since that would imply it should have been there in the first place.

ijpulidos · 2025-01-17T20:14:18Z

openmmtools/multistate/multistatereporter.py

+        information, 0 would prevent information being written
+    velocity_interval : int, default 1
+        the frequency at which to write positions relative to analysis
+        information, 0 would prevent information being written


Non-blocking question. This might be too late now to ask, but why is that we want positions and velocities to be stored at different intervals? Does it make sense to define a state (i.e. with positions and velocities) from different intervals? Just thinking about ways of not having to deal with the "complexity" of having to check for pos and vel intervals independently.

I think this would depend on the intent of the simulation. I could see that someone interested in doing VAC calculations wouldn't care as much about the positions.

So the one case I can think of is that you might not want to store the velocities but you do want to store the positions.

I might be wrong, but as far as I remember the "state" is stored in the checkpoint? So you could recover a simulation from the checkpoint, and just store positions in the simulation file.

IAlibay · 2025-01-17T21:39:18Z

I'll be testing this in a Protocol before I review :)

IAlibay · 2025-01-19T22:35:00Z

openmmtools/multistate/multistatereporter.py

@@ -100,6 +100,12 @@ class MultiStateReporter(object):
    analysis_particle_indices : tuple of ints, Optional. Default: () (empty tuple)
        If specified, it will serialize positions and velocities for the specified particles, at every iteration, in the
        reporter storage (.nc) file. If empty, no positions or velocities will be stored in this file for any atoms.
+    position_interval : int, default 1


Could we store these values inside of the NetCDF file as additional variables? This would make it a lot easier to know what the spacing is between frames, rather than having to iterate through the positions.

I'm being blind sorry 🙈 - it's below

IAlibay · 2025-01-19T23:15:43Z

openmmtools/multistate/multistatereporter.py

@@ -100,6 +100,12 @@ class MultiStateReporter(object):
    analysis_particle_indices : tuple of ints, Optional. Default: () (empty tuple)
        If specified, it will serialize positions and velocities for the specified particles, at every iteration, in the
        reporter storage (.nc) file. If empty, no positions or velocities will be stored in this file for any atoms.
+    position_interval : int, default 1
+        the frequency at which to write positions relative to analysis


Could you also make it clear here that this controls whether or not box_vectors are written down? (it's self-evident, but it might be good to make it clear to users)

IAlibay · 2025-01-19T23:16:00Z

openmmtools/multistate/multistatereporter.py

@@ -202,6 +214,16 @@ def checkpoint_interval(self):
        """Returns the checkpoint interval"""
        return self._checkpoint_interval

+    @property
+    def position_interval(self):
+        """Interval relative to energies that positions are written at"""


And box_vectors

IAlibay · 2025-01-19T23:16:57Z

openmmtools/multistate/multistatereporter.py

                    # pass zeros as velocities when key is not found (<0.21.3 behavior)
                    velocities = np.zeros_like(positions)

                if 'box_vectors' in storage.variables:
                    # Restore box vectors.
                    x = storage.variables['box_vectors'][read_iteration, replica_index, :, :].astype(np.float64)
+                    # TODO: Are box vectors also variably saved?


From wwhat I can tell, they seem to be saved at the same rate as positions - worth double checking though!

richardjgowers and others added 11 commits July 13, 2023 13:48

allow MultiStateReporter to write positions and velocities at a diffe…

4805553

…rent frequency to energies data.

MultiStateReporter use 0 for do not write

59ca4d5

Merge branch 'main' into multistatereporter_variable_pos_frequency

7a3bd40

Merge branch 'main' into multistatereporter_variable_pos_frequency

d8f80b9

WIP of multistatereporter tests

8fba23f

test for variable position saving

e0555c5

more tests for smaller nc files

2d88a33

currently no pos test is failing...

catch IndexError when file had no position/velocity data ever stored

9a24479

Merge branch 'main' into multistatereporter_variable_pos_frequency

981fe1b

Merge branch 'main' into multistatereporter_variable_pos_frequency

542f341

Merge branch 'main' into multistatereporter_variable_pos_frequency

a262c2b

ianmkenney force-pushed the multistatereporter_variable_pos_frequency branch 2 times, most recently from a0e7dbb to a262c2b Compare January 17, 2025 18:25

ianmkenney changed the title ~~Multistatereporter variable pos frequency and masked arrays~~ [DNM/WIP] Multistatereporter variable pos frequency and masked arrays Jan 17, 2025

ianmkenney changed the base branch from multistatereporter_variable_pos_frequency to main January 17, 2025 18:33

ijpulidos mentioned this pull request Jan 17, 2025

MultiStateReporter variable pos/vel save frequency #712

Closed

5 tasks

ianmkenney added 4 commits January 17, 2025 11:44

Improve clarity of boolean logic

7fd7639

Interpret masked data as invalid data

bae275c

Accessing data from netCDF returns a masked array, even if no data is masked. For valid data, extract it and give back just the basic numpy array. For invalid data, give back a nan array with the proper shape.

Give back zero arrays instead of nan arrays

0b770d5

Allow CI to run for all branches

c211252

ianmkenney changed the title ~~[DNM/WIP] Multistatereporter variable pos frequency and masked arrays~~ [DNM/WIP] Multistatereporter variable pos frequency Jan 17, 2025

ianmkenney changed the title ~~[DNM/WIP] Multistatereporter variable pos frequency~~ Multistatereporter variable pos frequency Jan 17, 2025

ijpulidos requested review from ijpulidos and mikemhenry January 17, 2025 19:28

ijpulidos approved these changes Jan 17, 2025

View reviewed changes

IAlibay mentioned this pull request Jan 19, 2025

Add support for masked frames (i.e. positions missing at certain frames) OpenFreeEnergy/openfe_analysis#41

Open

IAlibay reviewed Jan 19, 2025

View reviewed changes

IAlibay mentioned this pull request Jan 20, 2025

[WIP] Add support for variable position/velocity trajectory writing OpenFreeEnergy/openfe#1083

Draft

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multistatereporter variable pos frequency #767

Multistatereporter variable pos frequency #767

ianmkenney commented Jan 16, 2025 •

edited

Loading

ianmkenney commented Jan 17, 2025

codecov bot commented Jan 17, 2025

ianmkenney commented Jan 17, 2025

ijpulidos left a comment

ijpulidos Jan 17, 2025

ianmkenney Jan 17, 2025

ijpulidos Jan 17, 2025

ianmkenney Jan 17, 2025

IAlibay Jan 17, 2025

IAlibay commented Jan 17, 2025

IAlibay Jan 19, 2025

IAlibay Jan 20, 2025

IAlibay Jan 19, 2025

IAlibay Jan 19, 2025

IAlibay Jan 19, 2025

Multistatereporter variable pos frequency #767

Are you sure you want to change the base?

Multistatereporter variable pos frequency #767

Conversation

ianmkenney commented Jan 16, 2025 • edited Loading

ianmkenney commented Jan 17, 2025

codecov bot commented Jan 17, 2025

Codecov Report

ianmkenney commented Jan 17, 2025

ijpulidos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IAlibay commented Jan 17, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ianmkenney commented Jan 16, 2025 •

edited

Loading