Skip to content

Commit

Permalink
Merge pull request #39 from open-forcefield-group/binaries
Browse files Browse the repository at this point in the history
Implement new decorator framework, construction of more decorated SMARTS (old behavior is retained via an optional argument)
  • Loading branch information
davidlmobley authored Jul 6, 2016
2 parents ae633aa + b6ca359 commit d9b6e13
Show file tree
Hide file tree
Showing 19 changed files with 10,271 additions and 26 deletions.
37 changes: 35 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,10 @@ Note that lines beginning with `%` are comment lines.

We also specify a number of starting types, "initial types" which can be the same or different from the base types. These follow the same format, and `atomtypes/basetypes.smarts` can be reused unless alternate behavior is desired (such as starting from more sophisticated initial types).

Atom type creation moves attempt to split off a new atom type from a parent atom type by combining (via an "and" operator, `&`) the parent atom type with a "decorator".
The decorators are listed in `atomtypes/decorators.smarts`:
Atom type creation moves has two options, one is using simple decorators (`--decoratorbehavior=simple-decorators`) and the other is combinatorial decorators (default).

The first option (simple-decorators) attempt to split off a new atom type from a parent atom type by combining (via an "and" operator, `&`) the parent atom type with a "decorator".
The decorators are listed in `AlkEtOH/atomtypes/decorators.smarts` or `parm@frosst/atomtypes/decorators.smarts`:
```
% bond order
$([*]=[*]) double-bonded
Expand Down Expand Up @@ -115,6 +117,37 @@ Each decorator has a corresponding string token (no spaces allowed!) that is use

For example, we may find the atom type ```[#6]&H3``` which is `carbon total-h-count-3` for a C atom bonded to three hydrogens.

The second option (combinatorial-decorator) attempt to create a new atom type by adding randomly one, two or three decorators to a base atom type.
This decorators are different from the simple-decorator option and do not have atom types on it, only bond information.
The new decorators are listed in `AlkEtOH2/atomtypes/decorators.smarts`:

```
% bonded to atoms
$(*-z) simply-bonded
$(*=z) doubly-bonded
$(*#z) triply-bonded
$(*:z) aromatic-bond
$(*~z) any-bond
% total connectivity
X1 connections-1
X2 connections-2
X3 connections-3
X4 connections-4
% total-h-count
H0 total-h-count-0
H1 total-h-count-1
H2 total-h-count-2
H3 total-h-count-3
% formal charge
+0 neutral
+1 cationic+1
-1 anionic-1
% aromatic/aliphatic
a aromatic
A aliphatic
```
This option also has the corresponding string token.

Newly proposed atom types are added to the end of the list.
After a new atom type is proposed, all molecules are reparameterized using the new set of atom types.
Atom type matching proceeds by trying to see if each SMARTS match can be applied working from top to bottom of the list.
Expand Down
Binary file added dist/smarty-0.1.0-py2.7.egg
Binary file not shown.
46 changes: 46 additions & 0 deletions examples/AlkEtOH2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Example application of SMARTY atom type sampler to recover parm99 typing of alkanes, ethers, and alcohols

In this example, the SMARTY `AtomTypeSampler` is used to attempt to recover SMARTS atom types that recapitulate the typing rules from a referenced set of typed molecules.

## Manifest
* `smarty.py` - example command-line driver
* `atomtypes/` - input atom type sample specification files
* `molecules/` - typed molecule datasets
* `scripts/` - useful conversion scripts

## Usage

Usage

Example:
```
smarty --basetypes=atomtypes/basetypes-elemental.smarts --decorators=atomtypes/decorators.smarts --substitutions=atomtypes/substitutions.smarts \
--molecules=molecules/test_filt1_tripos.mol2 --reference=molecules/test_filt1_ff.mol2 \
--iterations 500 --temperature=0.1 >| smartyAlkEtOH.iter500.log
```

The results below are extracted from the log file smartyAlkEtOH.iter500.log.
Initially, the base atom types are added to the pool of current atom types, and the number of atoms and molecules matched by each atom type are shown:
```
INDEX ATOMS MOLECULES TYPE NAME SMARTS
1 : 464 42 | hydrogen [#1]
2 : 232 42 | carbon [#6]
3 : 0 0 | nitrogen [#7]
4 : 107 42 | oxygen [#8]
5 : 0 0 | fluorine [#9]
6 : 0 0 | phosphorous [#15]
7 : 0 0 | sulfur [#16]
8 : 0 0 | chlorine [#17]
9 : 0 0 | bromine [#35]
10 : 0 0 | iodine [#53]
TOTAL : 803 42
```
After a number of iterations, the pool of current atom types will have diverged, with a child having been added to the set and unused atom types removed from the original set.
```
INDEX ATOMS MOLECULES TYPE NAME SMARTS REF TYPE FRACTION OF REF TYPED MOLECULES MATCHED
1 : 464 42 | hydrogen [#1] HC 244 / 244 (100.000%)
2 : 232 42 | carbon [#6] CT 232 / 232 (100.000%)
3 : 39 30 | oxygen [#8] OS 39 / 39 (100.000%)
4 : 68 42 | oxygen total-h-count-1 [#8&H1] OH 68 / 68 (100.000%)
TOTAL : 803 42 | 583 / 803 match (72.603 %)
```
83 changes: 83 additions & 0 deletions examples/AlkEtOH2/atomtypes/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Atom type SMARTS components

## Formats

### Initial types

A `basetypes` file specifies the initial atom types used to initialize the sampler.

Comments beginning with `%` are ignored throughout the file.
Each line has the format
```
<SMARTS> <typename>
```
where `<SMARTS>` is an [OpenEye SMARTS string](https://docs.eyesopen.com/toolkits/cpp/oechemtk/SMARTS.html) and `<typename>` is a human-readable typename associated with that atom type.

Atom type definitions are hierarchical, with the last match in the file taking precedence over earlier matches.

For example, we could use the elemental base types:
```
% atom types
H hydrogen
C carbon
N nitrogen
O oxygen
F fluorine
P phosphorous
S sulfur
Cl chlorine
Br bromine
I iodine
```

### Decorators

A `decorators` file contains a list of SMARTS

Comments beginning with `%` are ignored throughout the file.
Each line has the format
```
<SMARTS> <decoratorname>
```
where `<SMARTS>` is an [OpenEye SMARTS string](https://docs.eyesopen.com/toolkits/cpp/oechemtk/SMARTS.html) and `<decoratorname>` is a human-readable typename associated with that decorator.

The SMARTS component is ANDed together (using the `&` operator) with a parent atom type to create a new proposed child atom type.
The human-readable `<decoratorname>` is appended (with a space) to the parent name to keep a human-readable annotation of the proposed child atom type.

### Substitutions

It is often convenient to define various tokens that are substituted for more sophisticated SMARTS expressions.

% Substitution definitions
% Format:
% <SMARTS> <replacement-string>

Comments beginning with `%` are ignored throughout the file.
Each line has the format
```
<SMARTS> <substitution-name>
```
where `<SMARTS>` is an [OpenEye SMARTS string](https://docs.eyesopen.com/toolkits/cpp/oechemtk/SMARTS.html) and `<substitution-name>` is the token that will be substituted for this.

For example, we could define some elemental substitutions along with some substitutions for halogens:
```
% elements
[#9] fluorine
[#17] chlorine
[#35] bromine
[#53] iodine
% halogens
[$smallhals,$largehals] halogen
[$fluorine,$chlorine] smallhals
[$bromine,$iodine] largehals
```

The [`OESmartsLexReplace`](http://docs.eyesopen.com/toolkits/python/oechemtk/OEChemFunctions/OESmartsLexReplace.html) function is used to implement these replacements.

## Manifest
* `basetypes-elemental.smarts` - basetypes file with elemental atom types - this is a good choice to begin with
* `basetypes.smarts` - basetypes file with more sophisticated atom types
* `decorators.smarts` - `decorators` file with a variety of decorators
* `decorators-simple.smarts` - minimal `decorators` file for testing
* `substitutions.smarts` - minimal `substitutions` file
11 changes: 11 additions & 0 deletions examples/AlkEtOH2/atomtypes/basetypes-elemental.smarts
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
% atom types
[#1] hydrogen
[#6] carbon
[#7] nitrogen
[#8] oxygen
[#9] fluorine
[#15] phosphorous
[#16] sulfur
[#17] chlorine
[#35] bromine
[#53] iodine
11 changes: 11 additions & 0 deletions examples/AlkEtOH2/atomtypes/basetypes.smarts
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
% atom types
[#1] hydrogen
[#6] carbon
[#7] nitrogen
[#8] oxygen
[#9] fluorine
[#15] phosphorous
[#16] sulfur
[#17] chlorine
[#35] bromine
[#53] iodine
5 changes: 5 additions & 0 deletions examples/AlkEtOH2/atomtypes/decorators-simple.smarts
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
% aromatic/aliphatic
a aromatic
A aliphatic
% halogens
$(*~[$halogen]) halogen-adjacent
23 changes: 23 additions & 0 deletions examples/AlkEtOH2/atomtypes/decorators.smarts
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
% bonded to atoms
$(*-z) simply-bonded
$(*=z) doubly-bonded
$(*#z) triply-bonded
$(*:z) aromatic-bond
$(*~z) any-bond
% total connectivity
X1 connections-1
X2 connections-2
X3 connections-3
X4 connections-4
% total-h-count
H0 total-h-count-0
H1 total-h-count-1
H2 total-h-count-2
H3 total-h-count-3
% formal charge
+0 neutral
+1 cationic+1
-1 anionic-1
% aromatic/aliphatic
a aromatic
A aliphatic
20 changes: 20 additions & 0 deletions examples/AlkEtOH2/atomtypes/substitutions.smarts
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
% Substitution definitions
% Format:
% <SMARTS> <replacement-string>

% elements
[#1] hydrogen
[#6] carbon
[#7] nitrogen
[#8] oxygen
[#9] fluorine
[#15] phosphorous
[#16] sulfur
[#17] chlorine
[#35] bromine
[#53] iodine

% halogens
[$smallhals,$largehals] halogen
[$fluorine,$chlorine] smallhals
[$bromine,$iodine] largehals
Loading

0 comments on commit d9b6e13

Please sign in to comment.