Gene silencing through RNA interference (RNAi) has emerged as a powerful tool for studying gene function and developing therapeutics[1]. Small interfering RNA (siRNA) molecules play a crucial role in RNAi by targeting specific mRNA sequences for degradation. Identifying highly efficient siRNA molecules is essential for successful gene silencing experiments and therapeutic applications. Built on the transformer architecture[2], OligoFormer can capture multi-dimensional features and learn complex patterns of siRNA-mRNA interactions for siRNA efficacy prediction.
OligoFormer was trained on a dataset of mRNA and siRNA pairs with experimentally measured efficacy by Huesken et al[3]. The training data consisted of diverse mRNA sequences and corresponding siRNA molecules with known efficacies.
dataset | siRNA number | cell line |
---|---|---|
Huesken | 2431 | H1299 |
Reynolds | 240 | HEK293 |
Vickers | 76 | T24 |
Haborth | 44 | HeLa |
Ui-Tei | 62 | HeLa |
Khvorova | 14 | HEK293 |
Hiesh | 108 | HEK293T |
Amarzguioui | 46 | Cos-1, HaCaT |
Takayuki | 702 | HeLa |
Implementation manual
Download the repository and create the environment of RNA-FM.
#Clone the OligoFormer repository from GitHub
git clone https://github.com/lulab/OligoFormer.git
cd ./OligoFormer
#Install the required dependencies
conda create -n oligoformer python=3.8*
source 1: Download the packaged RNA-FM.
wget https://cloud.tsinghua.edu.cn/f/46d71884ee8848b3a958/?dl=1 -O RNA-FM.tar.gz
tar -zxvf RNA-FM.tar.gz
source 2: Create the environment of RNA-FM[4].
git clone https://github.com/ml4bio/RNA-FM.git
cd ./RNA-FM
conda env create --name RNA-FM -f environment.yml
Download pre-trained models from this gdrive link and place the pth files into the pretrained
folder.
You should have at least an NVIDIA GPU and a driver on your system to run the training or inference.
source activate oligoformer
pip install -r requirements.txt
#The following command take ~60 min on a V100 GPU
python scripts/main.py --datasets Hu Mix --cuda 0 --learning_rate 0.0001 --batch_size 16 --epoch 200 --early_stopping 30
Option 1: Input the fasta file of mRNA sequence (Traverse mRNA with 19nt window size).
python scripts/main.py --infer 1 --infer_fasta ./data/example.fa --infer_output ./result/
Option 2: Input the fasta files of the mRNA and specific siRNAs (only predict these specific siRNAs).
python scripts/main.py --infer 1 -i1 data/example.fa -i2 data/example_siRNA.fa
Option 3: Input the mRNA sequence manually.
python scripts/main.py --infer 2
- Dependency of perl
source 1: CPAN
cpan Statistics::Lite
cpan Bio::TreeIO
# You also need install Vienarna package and export the PATH, and adjust the perl5lib to your own path.
# You need provide the ORF and UTR fatsa of mRNA to predict the off-target effects. The order of the sequence needs to be consistent across both files. Refer to the example data.
source 2: Download
wget https://cloud.tsinghua.edu.cn/f/cab2afdf951140a48fec/?dl=1 -O PerlLib.zip
unzip PerlLib.zip
export PERL5LIB=$(pwd)/PerlLib:$PERL5LIB
- Replace path
cd off-target/pita && make install && cd ../../
- Command
python scripts/main.py --infer 1 --infer_fasta ./data/example.fa --infer_output ./result/ -off -tox
The Docker image simplifies the installation and setup process, making it easy for users to get started with OligoFormer without worrying about dependencies and environment configuration.
- Docker installed on your machine.
-
Pull the Docker Image:
You just need to choose one source.
source 1: DockerHub
docker pull yilanbai/oligoformer:v1.0
source 2: Aliyun
docker pull registry.cn-hangzhou.aliyuncs.com/yilanbai/oligoformer:v1.0
source 3: Tsinghua Cloud
-
Run the Docker Container:
docker run -it --name oligoformer-container -dt --restart unless-stopped yilanbai/oligoformer:v1.0 && docker exec -it oligoformer-container bash
-
Access the OligoFormer Tool:
Once inside the container, you can start using OligoFormer with the following command:
oligoformer -h # help oligoformer # infer oligoformer -i 1 -i1 data/example.fa -i2 data/example_siRNA.fa # infer only interested siRNA(faster) oligoformer -off # infer with off-target prediction oligoformer -tox # infer with toxicity prediction oligoformer -off -tox # infer with off-target and toxicity prediction oligoformer -m 2 # mismatch input 19nt siRNA oligoformer -i 0 -t # test inter-dataset oligoformer -i 0 -s -t # test intra-dataset # We recommand you to run the following two commands on the patform with GPUs. oligoformer -i 0 # train inter-dataset oligoformer -i 0 -s # train intra-dataset
[1] Zamore, Phillip D., et al. "RNAi: double-stranded RNA directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide intervals." cell 101.1 (2000): 25-33.
[2] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
[3] Huesken, D., Lange, J., Mickanin, C. et al. Design of a genome-wide siRNA library using an artificial neural network. Nat Biotechnol 23, 995–1001 (2005).
[4] Chen, Jiayang, et al. "Interpretable RNA foundation model from unannotated data for highly accurate RNA structure and function predictions." arXiv preprint arXiv:2204.00300 (2022).
This tool is for research purpose and not approved for clinical use. The tool shall not be used for commercial purposes without permission.