- A Python wrapper for the CDK (which is written in Java)
- Primary purpose:
- Generate diverse chemical compound identifiers (SMILES, InChI)
- Inter-convert between these identifiers
- Fully compatible to Python 3.x
The chemistry world only has a small number of open tools, e.g. OpenBabel and the Chemistry Development Kit (github).
I have been using OpenBabel for some time now, and it is a great tool offering many options, I found several issues which make it hard to use:
- Generating InChI (keys) from SMILES often either does not work or struggles with stereochemistry.
- InChI cannot be used as input format.
git clone https://github.com/sebotic/cdk_pywrapper.git
cd cdk_pywrapper
pip install .
This will install the package on your local system, it will download the CDK and it will build the cdk_bridge.java. So after that, cdk_pywrapper should be ready to use, like in the example below.
Don't forget to use e.g. sudo for global installation or pip3 for Python 3.
I will also host this on pypi soon, so no repo cloning will be required. I have tested it on Linux and MacOS, not sure if it would work on Windows.
from cdk_pywrapper.cdk_pywrapper import Compound
smiles = 'CCN1C2=CC=CC=C2SC1=CC=CC=CC3=[N+](C4=CC=CC=C4S3)CC.[I-]'
cmpnd = Compound(compound_string=smiles, identifier_type='smiles')
ikey = cmpnd.get_inchi_key()
print(ikey)
Output: 'MNQDKWZEUULFPX-UHFFFAOYSA-M'