-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #39 from open-forcefield-group/binaries
Implement new decorator framework, construction of more decorated SMARTS (old behavior is retained via an optional argument)
- Loading branch information
Showing
19 changed files
with
10,271 additions
and
26 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# Example application of SMARTY atom type sampler to recover parm99 typing of alkanes, ethers, and alcohols | ||
|
||
In this example, the SMARTY `AtomTypeSampler` is used to attempt to recover SMARTS atom types that recapitulate the typing rules from a referenced set of typed molecules. | ||
|
||
## Manifest | ||
* `smarty.py` - example command-line driver | ||
* `atomtypes/` - input atom type sample specification files | ||
* `molecules/` - typed molecule datasets | ||
* `scripts/` - useful conversion scripts | ||
|
||
## Usage | ||
|
||
Usage | ||
|
||
Example: | ||
``` | ||
smarty --basetypes=atomtypes/basetypes-elemental.smarts --decorators=atomtypes/decorators.smarts --substitutions=atomtypes/substitutions.smarts \ | ||
--molecules=molecules/test_filt1_tripos.mol2 --reference=molecules/test_filt1_ff.mol2 \ | ||
--iterations 500 --temperature=0.1 >| smartyAlkEtOH.iter500.log | ||
``` | ||
|
||
The results below are extracted from the log file smartyAlkEtOH.iter500.log. | ||
Initially, the base atom types are added to the pool of current atom types, and the number of atoms and molecules matched by each atom type are shown: | ||
``` | ||
INDEX ATOMS MOLECULES TYPE NAME SMARTS | ||
1 : 464 42 | hydrogen [#1] | ||
2 : 232 42 | carbon [#6] | ||
3 : 0 0 | nitrogen [#7] | ||
4 : 107 42 | oxygen [#8] | ||
5 : 0 0 | fluorine [#9] | ||
6 : 0 0 | phosphorous [#15] | ||
7 : 0 0 | sulfur [#16] | ||
8 : 0 0 | chlorine [#17] | ||
9 : 0 0 | bromine [#35] | ||
10 : 0 0 | iodine [#53] | ||
TOTAL : 803 42 | ||
``` | ||
After a number of iterations, the pool of current atom types will have diverged, with a child having been added to the set and unused atom types removed from the original set. | ||
``` | ||
INDEX ATOMS MOLECULES TYPE NAME SMARTS REF TYPE FRACTION OF REF TYPED MOLECULES MATCHED | ||
1 : 464 42 | hydrogen [#1] HC 244 / 244 (100.000%) | ||
2 : 232 42 | carbon [#6] CT 232 / 232 (100.000%) | ||
3 : 39 30 | oxygen [#8] OS 39 / 39 (100.000%) | ||
4 : 68 42 | oxygen total-h-count-1 [#8&H1] OH 68 / 68 (100.000%) | ||
TOTAL : 803 42 | 583 / 803 match (72.603 %) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
# Atom type SMARTS components | ||
|
||
## Formats | ||
|
||
### Initial types | ||
|
||
A `basetypes` file specifies the initial atom types used to initialize the sampler. | ||
|
||
Comments beginning with `%` are ignored throughout the file. | ||
Each line has the format | ||
``` | ||
<SMARTS> <typename> | ||
``` | ||
where `<SMARTS>` is an [OpenEye SMARTS string](https://docs.eyesopen.com/toolkits/cpp/oechemtk/SMARTS.html) and `<typename>` is a human-readable typename associated with that atom type. | ||
|
||
Atom type definitions are hierarchical, with the last match in the file taking precedence over earlier matches. | ||
|
||
For example, we could use the elemental base types: | ||
``` | ||
% atom types | ||
H hydrogen | ||
C carbon | ||
N nitrogen | ||
O oxygen | ||
F fluorine | ||
P phosphorous | ||
S sulfur | ||
Cl chlorine | ||
Br bromine | ||
I iodine | ||
``` | ||
|
||
### Decorators | ||
|
||
A `decorators` file contains a list of SMARTS | ||
|
||
Comments beginning with `%` are ignored throughout the file. | ||
Each line has the format | ||
``` | ||
<SMARTS> <decoratorname> | ||
``` | ||
where `<SMARTS>` is an [OpenEye SMARTS string](https://docs.eyesopen.com/toolkits/cpp/oechemtk/SMARTS.html) and `<decoratorname>` is a human-readable typename associated with that decorator. | ||
|
||
The SMARTS component is ANDed together (using the `&` operator) with a parent atom type to create a new proposed child atom type. | ||
The human-readable `<decoratorname>` is appended (with a space) to the parent name to keep a human-readable annotation of the proposed child atom type. | ||
|
||
### Substitutions | ||
|
||
It is often convenient to define various tokens that are substituted for more sophisticated SMARTS expressions. | ||
|
||
% Substitution definitions | ||
% Format: | ||
% <SMARTS> <replacement-string> | ||
|
||
Comments beginning with `%` are ignored throughout the file. | ||
Each line has the format | ||
``` | ||
<SMARTS> <substitution-name> | ||
``` | ||
where `<SMARTS>` is an [OpenEye SMARTS string](https://docs.eyesopen.com/toolkits/cpp/oechemtk/SMARTS.html) and `<substitution-name>` is the token that will be substituted for this. | ||
|
||
For example, we could define some elemental substitutions along with some substitutions for halogens: | ||
``` | ||
% elements | ||
[#9] fluorine | ||
[#17] chlorine | ||
[#35] bromine | ||
[#53] iodine | ||
% halogens | ||
[$smallhals,$largehals] halogen | ||
[$fluorine,$chlorine] smallhals | ||
[$bromine,$iodine] largehals | ||
``` | ||
|
||
The [`OESmartsLexReplace`](http://docs.eyesopen.com/toolkits/python/oechemtk/OEChemFunctions/OESmartsLexReplace.html) function is used to implement these replacements. | ||
|
||
## Manifest | ||
* `basetypes-elemental.smarts` - basetypes file with elemental atom types - this is a good choice to begin with | ||
* `basetypes.smarts` - basetypes file with more sophisticated atom types | ||
* `decorators.smarts` - `decorators` file with a variety of decorators | ||
* `decorators-simple.smarts` - minimal `decorators` file for testing | ||
* `substitutions.smarts` - minimal `substitutions` file |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
% atom types | ||
[#1] hydrogen | ||
[#6] carbon | ||
[#7] nitrogen | ||
[#8] oxygen | ||
[#9] fluorine | ||
[#15] phosphorous | ||
[#16] sulfur | ||
[#17] chlorine | ||
[#35] bromine | ||
[#53] iodine |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
% atom types | ||
[#1] hydrogen | ||
[#6] carbon | ||
[#7] nitrogen | ||
[#8] oxygen | ||
[#9] fluorine | ||
[#15] phosphorous | ||
[#16] sulfur | ||
[#17] chlorine | ||
[#35] bromine | ||
[#53] iodine |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
% aromatic/aliphatic | ||
a aromatic | ||
A aliphatic | ||
% halogens | ||
$(*~[$halogen]) halogen-adjacent |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
% bonded to atoms | ||
$(*-z) simply-bonded | ||
$(*=z) doubly-bonded | ||
$(*#z) triply-bonded | ||
$(*:z) aromatic-bond | ||
$(*~z) any-bond | ||
% total connectivity | ||
X1 connections-1 | ||
X2 connections-2 | ||
X3 connections-3 | ||
X4 connections-4 | ||
% total-h-count | ||
H0 total-h-count-0 | ||
H1 total-h-count-1 | ||
H2 total-h-count-2 | ||
H3 total-h-count-3 | ||
% formal charge | ||
+0 neutral | ||
+1 cationic+1 | ||
-1 anionic-1 | ||
% aromatic/aliphatic | ||
a aromatic | ||
A aliphatic |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
% Substitution definitions | ||
% Format: | ||
% <SMARTS> <replacement-string> | ||
|
||
% elements | ||
[#1] hydrogen | ||
[#6] carbon | ||
[#7] nitrogen | ||
[#8] oxygen | ||
[#9] fluorine | ||
[#15] phosphorous | ||
[#16] sulfur | ||
[#17] chlorine | ||
[#35] bromine | ||
[#53] iodine | ||
|
||
% halogens | ||
[$smallhals,$largehals] halogen | ||
[$fluorine,$chlorine] smallhals | ||
[$bromine,$iodine] largehals |
Oops, something went wrong.