-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multistatereporter variable pos frequency #767
base: main
Are you sure you want to change the base?
Multistatereporter variable pos frequency #767
Conversation
…rent frequency to energies data.
currently no pos test is failing...
a0e7dbb
to
a262c2b
Compare
This PR will take over #712. |
Accessing data from netCDF returns a masked array, even if no data is masked. For valid data, extract it and give back just the basic numpy array. For invalid data, give back a nan array with the proper shape.
@ijpulidos @IAlibay @mikemhenry I can't request reviewers, but I think this is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Thanks a lot! Great work. Just a couple of non-blocking comments.
raise IndexError | ||
# pull the valid data out of masked array | ||
positions = unit.Quantity(x.data, unit.nanometers) | ||
except (IndexError, KeyError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking comment. Should we try outputting something here when an error is caught? Maybe something warning users that there's missing data or something similar. What are the reasons for the array to be masked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the topic of how MaskedArray
s work here. Assuming that a particular coordinate exists in a dimension (e.g. at least some data exists for a given iteration) we always get back a MaskedArray
. The MaskedArray
will have a data
attribute with a numpy array that will be masked by the MaskedArray
s mask
attribute. If the data was all valid, then we just pull that information out and expose it to the user. If no data exists at that dimension coordinate, then we'll just get an IndexError
so we give back a zero array in place of the nonsense data inside the MaskedArray
. By doing it this way, a user never has to deal with a MaskedArray
.
I think that if the data was masked it's basically the same behavior as if the data didn't exist and we would have expected a zero array anyways. I don't really see much of a reason to give a warning about this since having a giant array of zeros is kind of a warning already. I'd even be careful saying the data is missing since that would imply it should have been there in the first place.
information, 0 would prevent information being written | ||
velocity_interval : int, default 1 | ||
the frequency at which to write positions relative to analysis | ||
information, 0 would prevent information being written |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking question. This might be too late now to ask, but why is that we want positions and velocities to be stored at different intervals? Does it make sense to define a state (i.e. with positions and velocities) from different intervals? Just thinking about ways of not having to deal with the "complexity" of having to check for pos and vel intervals independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would depend on the intent of the simulation. I could see that someone interested in doing VAC calculations wouldn't care as much about the positions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the one case I can think of is that you might not want to store the velocities but you do want to store the positions.
I might be wrong, but as far as I remember the "state" is stored in the checkpoint? So you could recover a simulation from the checkpoint, and just store positions in the simulation file.
I'll be testing this in a Protocol before I review :) |
@@ -100,6 +100,12 @@ class MultiStateReporter(object): | |||
analysis_particle_indices : tuple of ints, Optional. Default: () (empty tuple) | |||
If specified, it will serialize positions and velocities for the specified particles, at every iteration, in the | |||
reporter storage (.nc) file. If empty, no positions or velocities will be stored in this file for any atoms. | |||
position_interval : int, default 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we store these values inside of the NetCDF file as additional variables? This would make it a lot easier to know what the spacing is between frames, rather than having to iterate through the positions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm being blind sorry 🙈 - it's below
@@ -100,6 +100,12 @@ class MultiStateReporter(object): | |||
analysis_particle_indices : tuple of ints, Optional. Default: () (empty tuple) | |||
If specified, it will serialize positions and velocities for the specified particles, at every iteration, in the | |||
reporter storage (.nc) file. If empty, no positions or velocities will be stored in this file for any atoms. | |||
position_interval : int, default 1 | |||
the frequency at which to write positions relative to analysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also make it clear here that this controls whether or not box_vectors are written down? (it's self-evident, but it might be good to make it clear to users)
@@ -202,6 +214,16 @@ def checkpoint_interval(self): | |||
"""Returns the checkpoint interval""" | |||
return self._checkpoint_interval | |||
|
|||
@property | |||
def position_interval(self): | |||
"""Interval relative to energies that positions are written at""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And box_vectors
# pass zeros as velocities when key is not found (<0.21.3 behavior) | ||
velocities = np.zeros_like(positions) | ||
|
||
if 'box_vectors' in storage.variables: | ||
# Restore box vectors. | ||
x = storage.variables['box_vectors'][read_iteration, replica_index, :, :].astype(np.float64) | ||
# TODO: Are box vectors also variably saved? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From wwhat I can tell, they seem to be saved at the same rate as positions - worth double checking though!
Continuation of #712.
There are multiple avenues to address the current failing tests. I would like to address those concerns here before continuing in that PR.