Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-construct database files from primary data #20

Closed
davidlmobley opened this issue Apr 17, 2015 · 11 comments
Closed

Re-construct database files from primary data #20

davidlmobley opened this issue Apr 17, 2015 · 11 comments

Comments

@davidlmobley
Copy link
Member

This is implied by several other issues, but ought to exist as its own issue.

To complete this issue, we must first resolve Issue #14, #13 , and #19.

This issue must be completed before we can resolve Issues #15 and #16 .

Resolving this issue will also provide the best resolution of the issue with water molecules in topology files (# to be inserted here).

@kyleabeauchamp
Copy link
Contributor

I'm not claiming that my code is ready for prime time yet, but I do have a somewhat automated pipeline for building the AMBER files using OpenEye and Antechamber.

https://github.com/choderalab/gbff/blob/master/code/rebuild_freesolv.py

@jchodera
Copy link
Contributor

I think we decided to not use gaff2xml for this purpose until we have a stable public API, so my plan was to ingest the dependencies into a standalone script for FreeSolv until gaff2xml (or some other tool) stabilized and we could reduce complexity by using these external functions.

@kyleabeauchamp
Copy link
Contributor

Agreed. But we may still want to check that the standalone script does the same thing as the latest gaff2xml functions. A lot of the code we have floating around differs in the various steps used to prepare ligands, which leads to different charges. In particular, the setSampleHydrogens.

Another possibilty is for me to roll out a release of gaff2xml. Even if the API is not stable, we can still just specific a hard-coded version number for installation via conda. I'm happy with whatever here.

@davidlmobley
Copy link
Member Author

What's the issue with setSampleHydrogens? Is it an only an issue with
ambiguity about protonation state/tautomer state? The compounds here in
general are quite straightforward at this point; there are only a handful
where there would be any concern at all about tautomer state.

John, what's your preference on the approaches Kyle suggested?

Also, I should point out that Chris Bayly just pointed me to the
documentation on how to generate canonical charges using his multiconformer
procedure -
https://docs.eyesopen.com/toolkits/cookbook/python/modeling/am1-bcc.html.
This is what we should be using for charging.

On Sat, Apr 18, 2015 at 10:17 AM, Kyle Beauchamp [email protected]
wrote:

Agreed. But we may still want to check that the standalone script does the
same thing as the latest gaff2xml functions. A lot of the code we have
floating around differs in the various steps used to prepare ligands, which
leads to different charges. In particular, the setSampleHydrogens.

Another possibilty is for me to roll out a release of gaff2xml. Even if
the API is not stable, we can still just specific a hard-coded version
number for installation via conda. I'm happy with whatever here.


Reply to this email directly or view it on GitHub
#20 (comment).

David Mobley
[email protected]
949-385-2436

@jchodera
Copy link
Contributor

Christopher Bayly's canonical AM1-BCC recipe, which we have now adopted, uses

    omega.SetSampleHydrogens(True)

which @kyleabeauchamp has found leads to modest charge differences for ethanol due to placement of the proton.

@kyleabeauchamp has also found that

omega.SetRMSThreshold(rms)

seems to behave independently of the specified rms value, and omega.SetRMSThreshold(0) behaves completely differently from omitting a call to this method altogether. We're not yet sure if this is a bug. Using or omitting this line leads to charge differences for ethyl acrylate.

I think we can either use a specific gaff2xml version to regenerate these or ingest the methods that we are using into scripts here---either method will work from a technical standpoint.

@kyleabeauchamp makes a good point about testing, however---gaff2xml currently has an automated testing framework to ensure correct behavior of its methods, while this project does not yet have automated testing framework.

The biggest question might therefore be "What kind of testing do we need to ensure correctness of FreeSolv?". It seems that not much in the way of testing process existed previously (which is why all of these errors have come to light). Perhaps we want to figure out how we want things to be tested first, and that will make the path forward clearer?

@davidlmobley
Copy link
Member Author

What is being done about the omega.SetRMSThreshold issue you mentioned?
Have you talked with support & Bayly?

Can you elaborate on how gaff2xml is tested and how that might be extended
to FreeSolv?

There is no obvious way that I can tell to check that the
topology/coordinate files generated are in fact correct, though we can
certainly check a number of side issues:

  • Do the number of water molecules/water atoms in the topology file and
    coordinate file match, and match what's expected?
  • Do the charges in the topology file match the expected charges, within
    rounding?
  • Do the atom types in the topology file match those in the
    AMBER-atom-typed .mol2 file from antechamber?
  • others?
  • Do the number of atoms in the topology file match what's expected?

You could also do more complicated testing, such as:

  • If you turn the resulting topology file back into an OE molecule (which
    would require writing a parser) do you get the same isomeric SMILES as the
    original molecule?

On Mon, Apr 20, 2015 at 10:02 AM, John Chodera [email protected]
wrote:

Christopher Bayly's canonical AM1-BCC recipe
http://docs.eyesopen.com/toolkits/cookbook/python/modeling/am1-bcc.html,
which we have now adopted, uses

omega.SetSampleHydrogens(True)

which @kyleabeauchamp https://github.com/kyleabeauchamp has found leads
to modest charge differences for ethanol due to placement of the proton.

@kyleabeauchamp https://github.com/kyleabeauchamp has also found that

omega.SetRMSThreshold(rms)

seems to behave independently of the specified rms value, and
omega.SetRMSThreshold(0) behaves completely differently from omitting a
call to this method altogether. We're not yet sure if this is a bug. Using
or omitting this line leads to charge differences for ethyl acrylate.

I think we can either use a specific gaff2xml version to regenerate these
or ingest the methods that we are using into scripts here---either method
will work from a technical standpoint.

@kyleabeauchamp https://github.com/kyleabeauchamp makes a good point
about testing, however---gaff2xml currently has an automated testing
framework to ensure correct behavior of its methods, while this project
does not yet have automated testing framework.

The biggest question might therefore be "What kind of testing do we need
to ensure correctness of FreeSolv?". It seems that not much in the way of
testing process existed previously (which is why all of these errors have
come to light). Perhaps we want to figure out how we want things to be
tested first, and that will make the path forward clearer?


Reply to this email directly or view it on GitHub
#20 (comment).

David Mobley
[email protected]
949-385-2436

@jchodera
Copy link
Contributor

What is being done about the omega.SetRMSThreshold issue you mentioned?
Have you talked with support & Bayly?

@kyleabeauchamp emailed Christopher Bayly on 11 Apr 2015, and I've just pinged him again today. We can email OpenEye support for clarification if that doesn't work.

Can you elaborate on how gaff2xml is tested and how that might be extended
to FreeSolv?

There is a suite of nosetests used to test individual functions and run some datasets through them:
https://github.com/choderalab/gaff2xml/tree/master/gaff2xml/tests

There is no obvious way that I can tell to check that the topology/coordinate files generated are in fact correct,

Supposing technical constraints were not a problem, what would the true test of "correctness" be? Would it be identical free energies of hydration in each program (gromacs, AMBER, etc)? Identical potential energies for various conformations?

we can certainly check a number of side issues:

All of these things (many of which caused trouble in the past) would be good to test.

  • If you turn the resulting topology file back into an OE molecule (which would require writing a parser) do you get the same isomeric SMILES as the original molecule?

@swails has a neat tool called ParmEd that is very much getting close to being able to do a lot of these tests, so it's possible we can utilize it in a testing framework.

@davidlmobley
Copy link
Member Author

There is no obvious way that I can tell to check that the
topology/coordinate files generated are in fact correct,

Supposing technical constraints were not a problem, what would the true
test of "correctness" be? Would it be identical free energies of hydration
in each program (gromacs, AMBER, etc)? Identical potential energies for
various conformations?

When I said "correct", what I had in mind was actually a different issue -
that the topology in question contains the intended set of force field
parameters for the intended molecule with the intended protonation/tautomer
state. For the sake of terminology, let's call the issue you're raising
"consistent" rather than correct. That is, we would like FreeSolv to
contain topology/coordinate files which are both correct (containing the
intended parameters for the intended molecule as just described) AND
consistent (across file formats). I think "correctness" is probably harder
(in theory) to test than "consistent", and I don't know how to check it in
general except the specific aspects I already mentioned.

Consistency in principle ought to be able to be verified by identical free
energies of hydration and identical potential energies. However, this is
really still a research issue (i.e. we have a separate project trying to
get reproducibility of hydration free energies across several different
codes, and it is not straightforward. This also looks some at potential
energies, and a similar thing is true there). Potential energies are
certainly the easier of the two and probably ought to be our benchmark
here, at least for reference configurations between AMBER and GROMACS
files.

David Mobley
[email protected]
949-385-2436

@jchodera
Copy link
Contributor

I think "correctness" is probably harder (in theory) to test than "consistent", and I don't know how to check it in general except the specific aspects I already mentioned.

Excellent points.

I think it might be practical to check consistency via some potential energy comparisons (especially with tools the OpenMM app and its file readers or ParmEd to help), and correctness via Topology -> OpenEye OEMol -> canonical SMILES, at least for now.

@davidlmobley
Copy link
Member Author

Yes, I think that ought to be the goal.

On Mon, Apr 20, 2015 at 11:14 AM, John Chodera [email protected]
wrote:

I think "correctness" is probably harder (in theory) to test than
"consistent", and I don't know how to check it in general except the
specific aspects I already mentioned.

Excellent points.

I think it might be practical to check consistency via some potential
energy comparisons (especially with tools the OpenMM app and its file
readers or ParmEd to help), and correctness via Topology -> OpenEye
OEMol -> canonical SMILES, at least for now.


Reply to this email directly or view it on GitHub
#20 (comment).

David Mobley
[email protected]
949-385-2436

@davidlmobley
Copy link
Member Author

This is basically resolved by #28 but can be re-opened later if there is anything we need to reawaken here. A lot of the energetic consistency stuff has been handled elsewhere via ParmEd and in tests for openmoltools.

I intend to work with Michael Shirts separately to get in input files in other formats such as DESMOND.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants