Skip to content

Commit

Permalink
Merge pull request #60 from open-forcefield-group/smirff_energy_check
Browse files Browse the repository at this point in the history
Cross-check SMIRFF energies vs parm@frosst on full AlkEthOH set (finished, tests added)
  • Loading branch information
davidlmobley authored Jul 12, 2016
2 parents 4423c1a + 704d4e0 commit 81bccc8
Show file tree
Hide file tree
Showing 31 changed files with 1,940 additions and 23 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ var/
*.manifest
*.spec

# Extracted archive
AlkEthOH_inputfiles/

# Installer logs
pip-log.txt
pip-delete-this-directory.txt
Expand Down
13 changes: 12 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,12 @@

This is a simple example of how Bayesian atom type sampling using reversible-jump Markov chain Monte Carlo (RJMCMC) [1] over SMARTS types might work.

This also provides a prototype and validation of the SMIRFF SMIRKS-based force field format, along with classes to parameterize OpenMM systems given SMIRFF `.ffxml` format files as provided here.

## Manifest

* `examples/` - some toy examples - look here to get started
* `smarty/` - simple toolkit illustrating the use of RJMCMC to sample over SMARTS-specified atom types
* `smarty/` - simple toolkit illustrating the use of RJMCMC to sample over SMARTS-specified atom types; also contains forcefield.py for handling SMIRFF forcefield format.
* `devtools/` - continuous integration and packaging scripts and utilities
* `oe_license.txt.enc` - encrypted OpenEye license for continuous integration testing
* `.travis.yml` - travis-ci continuous integration file
Expand Down Expand Up @@ -157,6 +159,15 @@ If a proposed type matches zero atoms, the RJMCMC move is rejected.

Currently, the acceptance criteria does not include the full Metropolis-Hastings acceptance criteria that would include the reverse probability. This needs to be added in.

## SMIRFF

The SMIRFF forcefield format is available in sample form under data/forcefield, and is handled by forcefield.py.
An example comparing SMIRFF versus AMBER energies for the parm@frosst forcefield is provided under
examples/SMIRFF_comparison, where two scripts can compare energies for a single molecule or for the entire AlkEthOH set.
Note that two forcefields are currently available in this format, `Fross_AlkEtOH.ffxml`,
the parm@frosst forcefield as it should have been for this set, and `Frosst_AlkEtOH_parmAtFrosst.ffxml`,
the forcefield as it was actually implemented (containing several bugs as noted in the file itself).

## References

[1] Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711, 1995.
Expand Down
Binary file modified examples/SMIRFF_comparison/AlkEthOH_inputfiles.tar.gz
Binary file not shown.
34 changes: 34 additions & 0 deletions examples/SMIRFF_comparison/compare_molecule_energies.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/env python

from smarty.forcefield_utils import *
import os

# Cross-check energies of molecules from AlkEthOH set using SMIRFF xml file
# versus energies from AMBER .prmtop and .crd files (parm@frosst params)

datapath = './AlkEthOH_inputfiles/AlkEthOH_rings_filt1'
#molname = 'AlkEthOH_r0' #That fails, but it's complicated. Try cyclobutane
molname = 'AlkEthOH_r51'
mol_filename = os.path.join( datapath, molname+'.mol2')

# Check if we have this data file; if not we have to extract the archive
if not os.path.isfile( mol_filename):
print "Extracting archived molecule files."
tarfile = 'AlkEthOH_inputfiles.tar.gz'
os.system('tar -xf AlkEthOH_inputfiles.tar.gz')

# Load OEMol
mol = oechem.OEGraphMol()
ifs = oechem.oemolistream(mol_filename)
flavor = oechem.OEIFlavor_Generic_Default | oechem.OEIFlavor_MOL2_Default | oechem.OEIFlavor_MOL2_Forcefield
ifs.SetFlavor( oechem.OEFormat_MOL2, flavor)
oechem.OEReadMolecule(ifs, mol )
oechem.OETriposAtomNames(mol)

# Load forcefield
forcefield = ForceField(get_data_filename('forcefield/Frosst_AlkEtOH_parmAtFrosst.ffxml'))

# Compare energies
prmtop = os.path.join( datapath, molname+'.top')
crd = os.path.join( datapath, molname+'.crd')
results = compare_molecule_energies( prmtop, crd, forcefield, mol)
47 changes: 28 additions & 19 deletions examples/SMIRFF_comparison/compare_set_energies.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,41 @@

from smarty.forcefield_utils import *
import os
import glob

# Cross-check energies of molecules from AlkEthOH set using SMIRFF xml file
# versus energies from AMBER .prmtop and .crd files (parm@frosst params)

datapath = './AlkEthOH_inputfiles/AlkEthOH_chain_filt1'
molname = 'AlkEthOH_c100'
mol_filename = os.path.join( datapath, molname+'.mol2')
#datapath = './AlkEthOH_inputfiles/AlkEthOH_chain_filt1'
datapath = './AlkEthOH_inputfiles/AlkEthOH_rings_filt1'

#Check if this is a directory, if not, extract it
# Check if we have this data file; if not we have to extract the archive
if not os.path.isfile( mol_filename):
if not os.path.isdir( datapath ):
print "Extracting archived molecule files."
tarfile = 'AlkEthOH_inputfiles.tar.gz'
os.system('tar -xf AlkEthOH_inputfiles.tar.gz')

# Load OEMol
mol = oechem.OEGraphMol()
ifs = oechem.oemolistream(mol_filename)
flavor = oechem.OEIFlavor_Generic_Default | oechem.OEIFlavor_MOL2_Default | oechem.OEIFlavor_MOL2_Forcefield
ifs.SetFlavor( oechem.OEFormat_MOL2, flavor)
oechem.OEReadMolecule(ifs, mol )
oechem.OETriposAtomNames(mol)

# Load forcefield
forcefield = ForceField(get_data_filename('forcefield/Frosst_AlkEtOH.ffxml'))

# Compare energies
prmtop = os.path.join( datapath, molname+'.top')
crd = os.path.join( datapath, molname+'.crd')
results = compare_molecule_energies( prmtop, crd, forcefield, mol)
#Obtain list of molecules
mol_filenames = glob.glob( datapath+'/*.mol2')
mol_filenames = [ fnm for fnm in mol_filenames if not 'c1302' in fnm ] #Skip water

for mol_filename in mol_filenames:
molname = os.path.basename( mol_filename).replace('.mol2','')
print("Comparing %s (%s/%s)..." % (molname, mol_filenames.index(mol_filename), len(mol_filenames) ) )

# Load OEMol
mol = oechem.OEGraphMol()
ifs = oechem.oemolistream(mol_filename)
flavor = oechem.OEIFlavor_Generic_Default | oechem.OEIFlavor_MOL2_Default | oechem.OEIFlavor_MOL2_Forcefield
ifs.SetFlavor( oechem.OEFormat_MOL2, flavor)
oechem.OEReadMolecule(ifs, mol )
oechem.OETriposAtomNames(mol)

# Load forcefield
forcefield = ForceField(get_data_filename('forcefield/Frosst_AlkEtOH_parmAtFrosst.ffxml'))

# Compare energies
prmtop = os.path.join( datapath, molname+'.top')
crd = os.path.join( datapath, molname+'.crd')
results = compare_molecule_energies( prmtop, crd, forcefield, mol)
3 changes: 2 additions & 1 deletion smarty/data/forcefield/Frosst_AlkEtOH.ffxml
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,10 @@
<Proper smirks="[#6X4:1]-[#6X4:2]-[#8X2:3]-[#1:4]" idivf1='1' periodicity1="3" phase1="0.0" k1="0.16" idivf2="1" periodicity2="1" phase2="0.0" k2="0.25"/> <!-- HO-OH-CT-CT from frcmod.Frosst_AlkEthOH -->
<Proper smirks="[#6X4:1]-[#6X4:2]-[#6X4:3]-[#6X4:4]" idivf1="1" periodicity1="3" phase1="0.0" k1="0.18" idivf2="1" periodicity2="2" phase2="180.0" k2="0.25" idivf3="1" periodicity3="1" phase3="180.0" k3="0.20"/> <!-- CT-CT-CT-CT from frcmod.Frosst_AlkEthOH -->
<Proper smirks="[#6X4:1]-[#6X4:2]-[#8X2:3]-[#6X4:4]" idivf1="1" periodicity1="3" phase1="0.0" k1="0.383" idivf2="1" periodicity2="2" phase2="180.0" k2="0.1"/> <!-- CT-CT-OS-CT from frcmod.Frosst_AlkEthOH -->
<Proper smirks="[#6X4:1]-[#8X4:2]-[#6X4:3]-[O&amp;X2&amp;H0:4]" idivf1="1" periodicity1="3" phase1="0.0" k1="0.10" idivf2="1" periodicity2="2" phase2="180.0" k2="0.85" idivf3="1" periodicity3="1" phase3="180.0" k3="1.35"/> <!-- CT-OS-CT-OS from frcmod.Frosst_AlkEthOH -->
<Proper smirks="[#6X4:1]-[#8X2:2]-[#6X4:3]-[O&amp;X2&amp;H0:4]" idivf1="1" periodicity1="3" phase1="0.0" k1="0.10" idivf2="1" periodicity2="2" phase2="180.0" k2="0.85" idivf3="1" periodicity3="1" phase3="180.0" k3="1.35"/> <!-- CT-OS-CT-OS from frcmod.Frosst_AlkEthOH -->
<Proper smirks="[#8X2:1]-[#6X4:2]-[#6X4:3]-[#8X2:4]" idivf1="1" periodicity1="3" phase1="0.0" k1="0.144" idivf2="1" periodicity2="2" phase2="0.0" k2="1.175"/> <!-- O_-CT-CT-O_ from frcmod.Frosst_AlkEthOH -->
<Proper smirks="[#8X2:1]-[#6X4:2]-[#6X4:3]-[#1:4]" idivf1="1" periodicity1="3" phase1="0.0" k1="0.0" idivf2="1" periodicity2="1" phase2="0.0" k2="0.25"/> <!-- H_-CT-CT-O_ from frcmod.Frosst_AlkEthOH; discrepancy with parm@frosst with H2,H3-CT-CT-O_ per C Bayly -->
<Proper smirks="[#1:1]-[#6X4:2]-[#6X4:3]-[OX2:4]" idivf1="1" periodicity1="3" phase1="0.0" k1="0.0" idivf2="1" periodicity2="1" phase2="0.0" k2="0.25"/> <!-- HC,H1-CT-CT-OH,OS from frcmod.Frosst_AlkEthOH ; note corresponding H2 and H3 params missing so they presumably get generic parameters (another parm@frosst bug) -->
<Improper smirks="[a,A:1]~[#6X3:2]([a,A:3])~[OX1:4]" periodicity1="2" phase1="180.0" k1="10.5"/> <!-- X -X -C -O from frcmod.Frosst_AlkEthOH; none in set but here as format placeholder -->
</PeriodicTorsionForce>
<!-- WARNING: AMBER formats typically use r_0/2=r_min/2 to describe the relevant distance parameter, where r0 = 2^(1/6)*sigma. The difference is important, and the two conventions can be used here by specifying sigma or rmin_half. -->
Expand Down
4 changes: 3 additions & 1 deletion smarty/data/forcefield/Frosst_AlkEtOH_parmAtFrosst.ffxml
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,13 @@
<Proper smirks="[#6X4:1]-[#6X4:2]-[#8X2:3]-[#1:4]" idivf1='1' periodicity1="3" phase1="0.0" k1="0.16" idivf2="1" periodicity2="1" phase2="0.0" k2="0.25"/> <!-- HO-OH-CT-CT from frcmod.Frosst_AlkEthOH -->
<Proper smirks="[#6X4:1]-[#6X4:2]-[#6X4:3]-[#6X4:4]" idivf1="1" periodicity1="3" phase1="0.0" k1="0.18" idivf2="1" periodicity2="2" phase2="180.0" k2="0.25" idivf3="1" periodicity3="1" phase3="180.0" k3="0.20"/> <!-- CT-CT-CT-CT from frcmod.Frosst_AlkEthOH -->
<Proper smirks="[#6X4:1]-[#6X4:2]-[#8X2:3]-[#6X4:4]" idivf1="1" periodicity1="3" phase1="0.0" k1="0.383" idivf2="1" periodicity2="2" phase2="180.0" k2="0.1"/> <!-- CT-CT-OS-CT from frcmod.Frosst_AlkEthOH -->
<Proper smirks="[#6X4:1]-[#8X4:2]-[#6X4:3]-[O&amp;X2&amp;H0:4]" idivf1="1" periodicity1="3" phase1="0.0" k1="0.10" idivf2="1" periodicity2="2" phase2="180.0" k2="0.85" idivf3="1" periodicity3="1" phase3="180.0" k3="1.35"/> <!-- CT-OS-CT-OS from frcmod.Frosst_AlkEthOH -->
<Proper smirks="[#6X4:1]-[#8X2:2]-[#6X4:3]-[O&amp;X2&amp;H0:4]" idivf1="1" periodicity1="3" phase1="0.0" k1="0.10" idivf2="1" periodicity2="2" phase2="180.0" k2="0.85" idivf3="1" periodicity3="1" phase3="180.0" k3="1.35"/> <!-- CT-OS-CT-OS from frcmod.Frosst_AlkEthOH -->
<Proper smirks="[#8X2:1]-[#6X4:2]-[#6X4:3]-[#8X2:4]" idivf1="1" periodicity1="3" phase1="0.0" k1="0.144" idivf2="1" periodicity2="2" phase2="0.0" k2="1.175"/> <!-- O_-CT-CT-O_ from frcmod.Frosst_AlkEthOH -->
<Proper smirks="[#8X2:1]-[#6X4:2]-[#6X4:3]-[#1:4]" idivf1="1" periodicity1="3" phase1="0.0" k1="0.0" idivf2="1" periodicity2="1" phase2="0.0" k2="0.25"/> <!-- H_-CT-CT-O_ from frcmod.Frosst_AlkEthOH; discrepancy with parm@frosst with H2,H3-CT-CT-O_ per C Bayly -->
<Proper smirks="[#1:1]-[#6X4:2]-[#6X4:3](-[#8])-[#1:4]" idivf1="9" periodicity1="3" phase1="0.0" k1="1.40"/> <!-- X-CT-CT-X from parm99/parm@frosst applied to H1-CT-CT-H1, H2- and H3- torsions. This reproduces a parm99/parm@frosst bug where the generic X-CT-CT-X is erroneously applied to all H*-CT-CT-H* torsions EXCEPT HC-CT-CT-HC.. -->
<Proper smirks="[#1:1]-[#6X4:2](-[#8])-[#6X4:3]-[#6X4:4]" idivf1="9" periodicity1="3" phase1="0.0" k1="1.40"/> <!-- X -CT-CT-X from frcmod.Frosst_AlkEthOH applied to H1-CT-CT-CT, H2- and H3- torsions. This reproduces a parm99/parm@frosst bug where the generic X-CT-CT-X is erroneously applied to all H*-CT-CT-CT torsions EXCEPT HC-CT-CT-CT. -->
<Proper smirks="[#1:1]-[#6X4:2]-[#6X4:3]-[OX2:4]" idivf1="1" periodicity1="3" phase1="0.0" k1="0.0" idivf2="1" periodicity2="1" phase2="0.0" k2="0.25"/> <!-- HC,H1-CT-CT-OH,OS from frcmod.Frosst_AlkEthOH ; note corresponding H2 and H3 params missing so they presumably get generic parameters (another parm@frosst bug) -->
<Proper smirks="[#1:1]-[#6X4:2](-[#8])(-[#8])-[#6X4:3]-[OX2:4]" idivf1="9" periodicity1="3" phase1="0.0" k1="1.40"/> <!-- Reproduce parm@frosst bug involving application of generic (X -CT-CT-X ) parameter to (H2,H3)-CT-CT-(OH,OS). -->
<Improper smirks="[a,A:1]~[#6X3:2]([a,A:3])~[OX1:4]" periodicity1="2" phase1="180.0" k1="10.5"/> <!-- X -X -C -O from frcmod.Frosst_AlkEthOH; none in set but here as format placeholder -->
</PeriodicTorsionForce>
<!-- WARNING: AMBER formats typically use r_0/2=r_min/2 to describe the relevant distance parameter, where r0 = 2^(1/6)*sigma. The difference is important, and the two conventions can be used here by specifying sigma or rmin_half. -->
Expand Down
15 changes: 15 additions & 0 deletions smarty/data/molecules/AlkEthOH_c100.crd
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
AlkEthOH_c100
26
-0.9553000 0.4818000 -0.2077000 -0.3490000 1.5118000 -3.9233000
2.2817000 2.6277000 -0.7960000 1.2497000 4.6992000 -1.7325000
-1.2082000 0.6003000 -1.7062000 -0.0897000 1.3641000 -2.4285000
1.3968000 3.1877000 -1.9119000 -2.4504000 1.2717000 -1.9070000
2.0195000 2.9930000 -3.1820000 0.0656000 2.6662000 -1.8674000
-0.8584000 1.4672000 0.2604000 -0.0508000 -0.0987000 -0.0023000
-1.8018000 -0.0117000 0.2823000 -1.2912000 2.0347000 -4.1123000
0.4621000 2.0550000 -4.4201000 -0.3970000 0.5255000 -4.3979000
2.4059000 1.5440000 -0.8979000 3.2673000 3.1044000 -0.7929000
1.8318000 2.7934000 0.1888000 0.6177000 5.1189000 -2.5244000
2.2188000 5.2065000 -1.7918000 0.7741000 4.9438000 -0.7766000
-1.2996000 -0.4020000 -2.1404000 0.8600000 0.8372000 -2.2840000
-3.1291000 0.7293000 -1.4768000 2.8732000 3.4574000 -3.1770000
61 changes: 61 additions & 0 deletions smarty/data/molecules/AlkEthOH_c100.mol2
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
@<TRIPOS>MOLECULE
AlkEthOH_c100
26 25 1 0 0
SMALL
USER_CHARGES

@<TRIPOS>ATOM
1 C1 -0.9553 0.4818 -0.2077 CT 1 <0> -0.12819
2 C2 -0.3490 1.5118 -3.9233 CT 1 <0> -0.11278
3 C3 2.2817 2.6277 -0.7960 CT 1 <0> -0.13028
4 C4 1.2497 4.6992 -1.7325 CT 1 <0> -0.13028
5 C5 -1.2082 0.6003 -1.7062 CT 1 <0> 0.13828
6 C6 -0.0897 1.3641 -2.4285 CT 1 <0> 0.17039
7 C7 1.3968 3.1877 -1.9119 CT 1 <0> 0.34809
8 O1 -2.4504 1.2717 -1.9070 OH 1 <0> -0.58269
9 O2 2.0195 2.9930 -3.1820 OH 1 <0> -0.62548
10 O3 0.0656 2.6662 -1.8674 OS 1 <0> -0.40486
11 H1 -0.8584 1.4672 0.2604 HC 1 <0> 0.04431
12 H2 -0.0508 -0.0987 -0.0023 HC 1 <0> 0.04431
13 H3 -1.8018 -0.0117 0.2823 HC 1 <0> 0.04431
14 H4 -1.2912 2.0347 -4.1123 HC 1 <0> 0.04980
15 H5 0.4621 2.0550 -4.4201 HC 1 <0> 0.04980
16 H6 -0.3970 0.5255 -4.3979 HC 1 <0> 0.04980
17 H7 2.4059 1.5440 -0.8979 HC 1 <0> 0.05049
18 H8 3.2673 3.1044 -0.7929 HC 1 <0> 0.05049
19 H9 1.8318 2.7934 0.1888 HC 1 <0> 0.05049
20 H10 0.6177 5.1189 -2.5244 HC 1 <0> 0.05049
21 H11 2.2188 5.2065 -1.7918 HC 1 <0> 0.05049
22 H12 0.7741 4.9438 -0.7766 HC 1 <0> 0.05049
23 H13 -1.2996 -0.4020 -2.1404 H1 1 <0> 0.02625
24 H14 0.8600 0.8372 -2.2840 H1 1 <0> 0.04199
25 H15 -3.1291 0.7293 -1.4768 HO 1 <0> 0.39414
26 H16 2.8732 3.4574 -3.1770 HO 1 <0> 0.41015
@<TRIPOS>BOND
1 1 5 1
2 2 6 1
3 3 7 1
4 4 7 1
5 5 6 1
6 5 8 1
7 6 10 1
8 7 9 1
9 7 10 1
10 1 11 1
11 1 12 1
12 1 13 1
13 2 14 1
14 2 15 1
15 2 16 1
16 3 17 1
17 3 18 1
18 3 19 1
19 4 20 1
20 4 21 1
21 4 22 1
22 5 23 1
23 6 24 1
24 8 25 1
25 9 26 1
@<TRIPOS>SUBSTRUCTURE
1 <0> 1 GROUP 0 A ****
Loading

0 comments on commit 81bccc8

Please sign in to comment.