Skip to content

Latest commit

 

History

History
65 lines (45 loc) · 2.03 KB

README.md

File metadata and controls

65 lines (45 loc) · 2.03 KB

GraphPLI - Graph-based Prediction of Protein-Ligand Interactions

Download and Installation

Store the code locally by cloning this GitHub repository

git clone [email protected]:kalininalab/SIPILG.git

The entire code can be executed in the same conda-environment. To install the conda environment, move into the project folder and install the conda environment from the environment.yaml file. Finally, activate it to run the code.

cd SIPILG
conda env create -f environment.yaml
conda activate sipilg

Data Preparation

To prepare the data for training or inference, run

snakemake -j <no_cpus> --configfile config/snakemake/unilectin.yaml

in the root folder of the project. To run these snakemake pipelines, the datasets have to have a specific structure which is explained below.

Training

To run the training, just run

python -m src.train config/train/default.yaml

in the root folder of this project.

All training configs for the Master's thesis are included in the config/thesis_train folder.

Dataset Structure

The datasets used for training and testing of the models have to have a specific structure to be preprocessed by the snakemake pipeline.

dataset/
├── structures/
│       └── <pdb structures>
└── tables/
        ├── inter.tsv
        ├── lig.tsv
        └── prot.tsv

The lig.tsv file needs a header line and has to contain a column Drug_ID and a column SMILES mapping Drug_IDs to their SMILES strings. Similarly, prot.tsv needs a header line and two columns, namely Target_ID and Target. This file maps Target_IDs to their FASTA sequence. Furthermore, the Target_IDs have to match the filenames in the structures folder. Finally, the inter.tsv file contains the actual pairwise interactions of Drug_IDs and Target_IDs in a TSV format. Again a header line is required as well as a third column, Y, storing a proxy for the binding affinity. This can either be a regression value or classification labels.