This project is intended to use deep learning models for Crispr-Cas on-target efficiency prediction and off target specificity prediction.
Below is the layout of the whole model.
This model includes four components:
- Embedding layer
- Transformer layer
- Convolutional neural network
- Fully connected layer
This model includes four components:
- Embedding layer
- Transformer layer
- Convolutional neural network
- Fully connected layer
- if conda is used, the virtual environment can be created with:
conda env create -f environment.yml
- required packages
- keras
- tensorflow
- pytorch
- sklearn
- pandas
- numpy
- skorch
- visdom
- shap
python ./attn_to_crispr.py <data/model>
<data/model> could be K562/A549/NB4/cpf1/cpf1_OT/deepCrispr_OT
- Organize dataset format as the example dataset in dataset/customized_Cas9_OT
- Save the new dataset as dataset/customized_Cas9_OT/customized_Cas9_OT_data.csv
python flexible_OT_crispr.py customized_Cas9_OT
- Optional: Specify training-testing split methods: change split_method in "models/customized_Cas9_OT/config.py":
- "regular" for n-fold split
- "stratified" for leave sgRNAs out split
- Organize dataset format as the example dataset in dataset/customized_Cas9_ontar
- Save the new dataset as dataset/customized_Cas9_OT/customized_Cas9_ontar_data.csv
- make sure the
extra_numerical_features
variable in "models/customized_Cas9_ontar/config.py" file isextra_numerical_features = []
, this indicates no extra features are added besides sgRNA sequence features
python crispr_attn.py customized_Cas9_ontar