-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AlchemicalNonequilibriumLangevinIntegrator acceptance #201
Comments
Here's an example that highlights the issue we're seeing with the correction factor. In the zip file correction.zip correction.py script takes a system with a ligand and turns on the electrostatics/sterics over the course of 50 steps. At each step of the ncmc integrator this script outputs what the potentials would be for those positions in the normal system and if the alchemical system was fully interacting (lambda steric/statics = 1) as well as the difference between the potentials to emulate what's done for the alchemical correction factor. (Just to note this starting position was taken the middle of a symmetric protocol where we turned off the sterics/electrostatics and performed a random rotation of the ligand. This 'correction.py' script is basically emulating the other half of turning on those interactions after the rotation.) |
Also tagging @pgrinaway since we were planning on discussing these topics with him. |
So for the
and the concern is that difference does not cancel with the |
And this issue is separate from seeing |
Tagging in @maxentile on the g-BAOAB stability issue, since he's looking into this right now. My suggestion is to modify your splitting string to use k=4 or k=5 inner steps for g-BAOAB, like
for k=4. This should improve stability. @maxentile is currently benchmarking DHFR to see how the choice of |
Yes. I guess when you just ran the system didn't blow up, but when the system does blow up we can see numbers like
which can really skew the acceptance. |
We think it's be a separate issue, since we do see some NaNs following acceptance when we turn off the alchemical correction factor. |
I'll give the increased inner steps a try in the meantime, and I'll give an update after I test it out. Thanks! |
How precisely are you computing the alchemical correction factor? The alchemically-modified PME system omits the reciprocal-space interactions, so it will always differ from the real system. The hope is that this difference is rather insensitive to exactly which orientation the ligand is in. So you'd compute something like
Is that what you're doing? If so, what does that look like, and how does the quantity computed in your example bear on this since it's only half the quantity of interest? Is it just that you sometimes see very large discrepancies? This is probably only occurring when the integration is unstable and energies are blowing up anyway---the two systems will have very different numerical behaviors when the energies are insanely high. This is probably just an integrator stability issue, where the geodesic inner steps will help a lot. |
Hi @sgill2 , A couple quick notes:
This is actually not "BAOAB" -- BAOAB would be Regarding the number of geodesic drift steps Regarding the placement of the hamiltonian update (should you use |
@sgill2 - looks like the order of updates MAY be incorrect, then. Probably that's the first thing to check... (But, @maxentile - @jchodera had specifically recommended this g-BAOAB scheme to us here: MobleyLab/blues#50 ; did he mistype?) There's also earlier background -- April 7th @jchodera and I were in the same room together and @sgill2 had me ask him this: "The normal order for the BAOAB integrator is VRORV. If I was to use the AlchemicalLangevinSplittingIntegrator would the order of moves then be VROHORV, where H is the alchemical perturbation step?" I asked John this in person and he told me yes -- my summary was "John says yes, or put the H's at the beginning and the end. It's half as expensive if they are at the beginning and the end, he thinks! So HVRORVH." @maxentile - have you guys looked at this any more since then? Please do keep us posted. We had a protocol that was basically working for us with our original integrator but we're trying to switch to the "better" integrator you guys have been recommending using |
Whoops, my fault here! |
We're working on quantifying the effect of H placement on shadow work accumulation, but my original estimates that one scheme is much faster than another scheme may be incorrect. For now, we think that (as long as you are using the latest |
What was your original protocol? We should be able to give you a splitting that you can compare against that should replicate your original behavior for reference. |
We were using the velocity verlet (from the integrator in perses). |
Thanks! I'll test out using the actual BAOBAB splitting, along with varying geodesic drift steps–to see how that affects the integration. |
If it was just velocity Verlet, then you should be able to use
I think the one difference is that we used to do steps of Velocity Verlet on either end of the protocol with H updates in the middle, whereas this one is still symmetric but bookended by H updates. Can you compare that with k=4 BAOAB ( |
Geodesic BAOAB uses consecutive constrained drift steps, so those should have been Note that multiple consecutive constrained Also note that |
@maxentile : Thanks for the clarifications, and apologies for my errors above! |
I’ve tested the integrators with actual BAOAB splitting, and both ran without accepting positions that resulted in NaNs–at least without the alchemical correction factor. When including the correction factor, I still get into the case where moves are accepted which seemingly shouldn’t be. Just to be sure I haven’t messed up the alchemical correction–which is a correction for transforming the system to/from an alchemical system–I have it defined as: To illustrate an example case, this alch_corr.zip attached .zip file contains the initial and final positions of such a case after some ncmc integration steps in which the move was accepted. If you run the enclosed sys_compare.py, you should see that in the initial position the energies between the alchemical and normal system are roughly the same. The final positions, however, are in a configuration that are nowhere near favorable, and the difference between the two systems with those positions is vastly dissimilar, which leads to an extremely large acceptance correction factor overall.
|
The final alchemical energies are really large. Can you store the potential energy after each timestep during your BLUES protocol and plot that? I'm wondering where the energy accumulation occurs. Where are you placing the Hamiltonian update in your scheme? Our more recent tests have not noted any major difference in the stability with different numbers of geodesic steps, so it may be simplest to stick with standard BAOAB ( |
I'll get on that now!
Could you clarify for me what you mean by scheme here? I'm not exactly sure what you're referring to, but if you were referring to the splitting I used |
It might be useful to also try |
Okay, I'll go ahead and test out that splitting while I'm rerunning things too. |
I'm still not sure what the alchemical correction factor has anything to do with this. If the energy is exploding, why would the constant added from your correction have anything to do with that? Aren't 100 steps a very, very small number for an NCMC switching protocol? Our protocols for just changing a single proton are typically ~40 ps in length. |
@jchodera - Sam can comment further (I should be sleeping) but I can make two comments:
|
That's odd. You're saying that the alchemical correction factor can be highly favorable in some cases that you wouldn't expect? Could there be a sign error in your use of it? If you compute
then your
There's no way that could have been correct. 100 steps is far, far too short for an alchemical insertion of even a proton. |
Do you have a plot of acceptance rate vs number of NCMC steps? |
Actually, it seems to me that this could be significantly easier than a proton. Specifically, what we're looking at here is flipping toluene between orientations that (at least for successful moves) could have been docked into successfully. It's nonpolar, so we're dealing with relatively weak interactions. Even STANDARD MC without any relaxation at all can sometimes yield moves which are accepted for these types of moves (though much more rarely). It seems highly plausible to me that even adding only a very small amount of relaxation -- such as 100 steps -- could dramatically increase acceptance, which is exactly what we've been seeing. To me, insertion of a proton sound like a much more difficult transformation as there, you're dealing with changing the formal charge, so you have very strong long-range electrostatic interactions, and relaxation will need to be much more significant, whereas in this particular case, protein relaxation is almost minimal. @sgill2 will have to respond on the other issues.
Yes. |
I'm still dubious that 100 steps would help much. I'd love to see the average acceptance vs number of steps for even just the alchemically-modified system, omitting the correction factors that are causing trouble. There must be something wrong with the correction factor if it is leading toward higher acceptance. Since there's a slight difference between fully-interacting and alchemically-modified systems, I would presume it would, on average, bias you toward rejection, not acceptance. The big differences in energies, as we noted earlier, appear only in cases where the energies are hugely unfavorable, but that indicates something else is going on when the insertion is nearly complete. Did you recently switch over to |
Just to clarify a few things and give updates on what I'm running right now:
In the actual calculation I have the energies in terms of kJ/mol, so I multiply by -1/kT to get the correction in appropriate of so in the code it looks like
Using the The number of steps/acceptance ratio is definitely an important point to consider, and I don't disagree that the 100 steps in this case may not be ideal, but I think it's drawing us away from the issue at hand, which at this point seems to be figuring out why the correction factor is leading to these obviously unfavorable NCMC moves being accepted and how to address that. In general though, the number of ncmc steps should vary greatly depending on the protocol and system. In the
We have actually recently switched to using alchemy from openmmtools. Currently I'm running the different integrators with the two different alchemy modules to see if I can spot any differences in their behavior. |
Tangentially, how much wallclock time does it take for to run those 20ps of NCMC in the proton insertion case? I'd love to explore longer relaxation timescales but running the alchemical integrators is quite slow so running anything past 1000 steps repeatedly doesn't seem very feasible to me currently. |
Also somewhat tangentially I'll be doing some benchmarking runs on T4/lysozyme toluene to see the acceptance as a function of NCMC steps so we can use those as a reference (and for paper-related things). |
That looks correct to me, provided you're careful about how you get all those quantities with the correct units. Can you point me to where you compute
I'm still dubious here, but I've been surprised before.
I think it's premature to even worry about the correction factor since your plot is showing that the work accumulated for the purely alchemical system is blowing up. Do you have a bunch of work trace you could provide data for or plot? I think we need to figure out what the work distribution after 100 steps looks like and make sure that there is a good fraction of these work values less than a few kT before we tackle what's wrong with the correction factor. The correction factor should be very small.
The alchemical integrator shouldn't be much slower than standard
What do you mean by "quite slow" here? I think our 20 ps switches took a few minutes. |
You'll have to jump around a bit in the code, so let me contextualize the code references:
I don't have those handy, but I'll try to construct them when I'm running my benchmarking runs.
I did a simple benchmarking on my lysozyme/toluene system with different integrators, running on a GPU. The OpenMM |
This is getting a little complicated since I'm not sure at which time you pulled code from I'd really recommend trying to use as much of the
That's very surprising. Do you have a minimal test script you could post? If there's only a few alchemically-modified atoms, it really shouldn't be so slow---we can certainly address that. |
I'll definitely read through the sections of the code above that you highlight on my flight today, @sgill2. I just wanted to set expectations because there's a lot of code there and I might not be able to spot the issues easily because of the subtle divergence with perses and lack of component tests. |
Sounds good. I'll see about getting the tests moved over, or just use the integrator from perses directly. On that note I'll add testing VV with the
I definitely appreciate any of your time spent on this and duly noted regarding the expectations bit. |
Here's a minimal example comparing the integration schemes on a T4-lysozyme/toluene explicit solvent system over 10,000 steps. And I just wanted to say, thanks for all the help everyone! |
Something funny is going on in
For some reason, the step time grows by 20x around 100 steps in (where I've reduced the protocol length to 500 steps). Thanks for this! We'll take a look at what's going on here. |
The slowdown seems to occur when
|
The issue was the long-range correction. Every time Tagging @pgrinaway @bas-rustenburg @Lnaden @gregoryross, since this issue may be relevant to many of our other projects. To avoid this, we should add some options to Thanks again for bringing the slowdown to our attention and providing a simple script to reproduce this, @sgill2! |
@jchodera - thanks so much for digging into that. Glad to hear there will be a simple fix! I'd been worried about the slowdown long-term, though the improvement in efficiency we get incorporating MC moves made it still worthwhile. But having it NOT slow down will obviously be way better. :) Let us know if you have any thoughts on our bigger problem (blowing up/alchemical corrections) when you get a chance to look at the code, otherwise @sgill2 will get back to you once he's run those additional tests. |
The bigger problem is harder to debug because it relies on a very complex codebase that was branched from our but no longer contains the requisite component unit tests. This is probably going to be a hard-fought lesson in software engineering best (and worst) practices, unfortunately. I would suggest we try to migrate useful common components between perses and blues into openmmtools along with their unit tests. Ideally, blues would eventually be encapsulated as an openmmtools.mcmc.MCMCMove subclass so users can mix-and-match move sets with other kinds of ergodically sampling, but for now, at least making sure we can rule out where the issues might be by reusing common tested components will help narrow in on the real problem. |
We've got a fix for the speed issue in the works: When I construct
|
I've been looking to use Velocity Verlet (VV) with the File "/Users/samgill/anaconda/lib/python2.7/site-packages/openmmtools/integrators.py", line 1655, in __init__
super(AlchemicalNonequilibriumLangevinIntegrator, self).__init__(*args, **kwargs)
File "/Users/samgill/anaconda/lib/python2.7/site-packages/openmmtools/integrators.py", line 1474, in __init__
super(NonequilibriumLangevinIntegrator, self).__init__(*args, **kwargs)
File "/Users/samgill/anaconda/lib/python2.7/site-packages/openmmtools/integrators.py", line 998, in __init__
ORV_counts, mts, force_group_nV = self._parse_splitting_string(splitting)
File "/Users/samgill/anaconda/lib/python2.7/site-packages/openmmtools/integrators.py", line 1386, in _parse_splitting_string
self._sanity_check(splitting_string)
File "/Users/samgill/anaconda/lib/python2.7/site-packages/openmmtools/integrators.py", line 1254, in _sanity_check
assert ("O" in splitting_no_space)
AssertionError Could someone point me to the proper splitting for VV with the alchemical integrator? On that note, it might be helpful to include more alchemical splitting schemes in the |
@pgrinaway and @maxentile : Can you handle this? Should we allow non-Langevin dynamics to be covered by @sgill2 : Are you still getting very different behavior than before? Is this primarily for debugging purposes? Last I heard, we had solved the slowdown problem (use the latest |
@jchodera I've mostly been working paper writing, so I haven't had a chance to run those energy trace tests yet. I was looking to move towards replacing the |
I think it makes sense for this to be separately named (like Another possibility is to add an optional argument |
What about Alternatively, I like the idea of an |
Copying from the alchemistry Slack channel from @davidlmobley:
We're having some issues with our work on binding mode sampling with NCMC (in BLUES), especially as we try to switch to the
AlchemicalNonequilibriumLangevinIntegrator
, where we are having large energies/forces that lead to crashes as we are turning back on alchemical interactions between the ligand and the protein. Part of this seems to potentially be due to problems with the alchemical correction factor in certain cases, but we think that actually forces are getting large enough to cause the integrator to lose stability first. Do you have suggestions? We were wondering whether we ought to be checking for forces that exceed some sort of cutoff and automatically rejecting cases where that happens (naively it seems to me there ought to be some sort of force to timestep ratio which determines the largest force that can be tolerated without risking crashes, and that anything exceeding that value by many orders of magnitude should certainly be rejected).Some notes
•For the AlchemicalNonequilibriumLangevinIntegrator we were using
R V O H O V R
(g-baoab) splitting. •The "alchemical correction factor" is the correction for transform the real system to/from the alchemical system (it was in perses.annihilation.ncmc_switching at one point, https://github.com/choderalab/perses/blob/3c0e672728c0acb87a0efc9fa76bc893f1b5dfed/perses/annihilation/ncmc_switching.py#L162-L182 but glancing over the master branch I don't see it there now)The text was updated successfully, but these errors were encountered: