Using multi-encoder semi-implicit graph variational autoencoder to analyze single-cell RNA sequencing data
In this study, we propose a new framework called MSVGAE based on variational graph auto-encoder and graph attention networks. Specifically, we introduce multiple encoders to learn features at different scales and control for uninformative features. Moreover, different noises are added to encoders to promote the propagation of graph structural information and distribution uncertainty. Therefore, some complex posterior distributions can be captured by our model. MSVGAE maps scRNA-seq data with high dimensions and high noise into the low-dimensional latent space, which is beneficial for downstream tasks.
To install and run this project locally, follow these steps:
git clone https://github.com/tswstart/MSVGAE.git
cd ./MSVGAE/
You can set up the primary environment for MSVGAE by using the following command:
conda env create -f environment.yml
conda activate MSVGAE
Here is a step-by-step guide on how to use the project:
Ensure your scRNA-seq data is formatted correctly. You can use datasets from the NCBI, 10X Genomics website or your own data
Execute the main script to perform clustering on your data.
# use hdbscan
python main.py --X_path "./data/balanced/datasetName_counts.csv" --Y_path "./data/balanced/datasetName_labels.csv" --preprocess --save_graph --hdbscan --GAT
# use kmeans
python main.py --X_path "./data/balanced/datasetName_counts.csv" --Y_path "./data/balanced/datasetName_labels.csv" --preprocess --save_graph --kmeans --GAT
In this context, we offer a collection of commonly employed scGCC parameters for your reference. For additional details, you can execute python main.py -h
.
Note: output files are saved in ./results, including embeddings (datasetName_MSVGAE_node_embeddings.npy)
, evaluation metrics (metric_MSVGAE.txt)
, cluster results (pd_label_MSVGAE_dataName_counts.csv)
, KNN graph
and some log files (log_MSVGAE_dataName.txt)
.
Our sample dataset is stored in the directory "data/".
python main.py --X_path "./data/imbalanced/data_-1c4_counts.csv" --Y_path "./data/imbalanced/data_-1c4_labels.csv" --preprocess --save_graph --hdbscan --GAT
output files are saved in ./results