scAFGCC: An Augmentation-Free Graph Contrastive Clustering Method for scRNA-Seq Data Analysis

Introduction

In this study, we propose scAFGCC, a novel augmentation-free graph contrastive clustering method that combines graph convolutional network (GCN) and contrastive learning to exploit inter-cell relationships. scAFGCC does not require data augmentations and negative samples to learn graph representations. Instead, we generate positive samples by exploring the local structural information and the global semantics of the target nodes. We integrate feature representation learning with clustering tasks. Additionally, we introduce a reconstruction module that pretrains the model, facilitating faster training and improved performance.

Installation

The package can be installed by git. The testing setup involves a Windows operating system with 16GB of RAM, powered by an NVIDIA GeForce GTX 1050 Ti GPU and an Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz, running at a clock speed of 2.80 GHz.

1. Git clone from github

git clone https://github.com/tswstart/scAFGCC.git
cd ./scGCC/

2. Utilize a virtual environment using Anaconda

You can set up the primary environment for scGCC by using the following command:

conda env create -f environment.yml
conda activate scAFGCC

Running scGCC

1. Data Preprocessing

For those who require swift data preparation, we offer a convenient Python script named preprocess_data.py located within the preprocess directory. This script, built on the foundation of Scanpy, streamlines the process of format transformation and preprocessing. It supports three types of input file formats: H5AD, H5, and CSV data. Throughout the preprocessing procedure, there are a total of five operations, encompassing cell-gene filtering, normalization, logarithmic transformation, scaling, and the selection of highly variable genes.

# H5AD files
python preprocess/preprocess_data.py --input_h5ad_path=Path_to_input --save_h5ad_dir=Path_to_save --filter --norm --log --scale --select_hvg
# H5 files
python preprocess/preprocess_data.py --input_h5_path=Path_to_input --save_h5ad_dir=Path_to_save --filter --norm --log --scale --select_hvg
# CSV files
python preprocess/preprocess_data.py --count_csv_path=Path_to_input --label_csv_path=Path_to_input --save_h5ad_dir=Path_to_save --filter --norm --log --scale --select_hvg

2. Apply scAFGCC

By utilizing the preprocessed input data, you have the option to invoke the subsequent script for executing the scAFGCC method:

# balanced datasets
python main.py --dataset dataset/balanced_data/data0c4_preprocessed.h5ad --mbedder scAFGCC-Results --layers [512] --pred_hid 1024 --lr 0.001 --topk 3 --device 0
# imbalanced datasets
python main.py --dataset dataset/imbalanced_data/data-1c4_preprocessed.h5ad --mbedder scAFGCC-Results --layers [512] --pred_hid 1024 --lr 0.001 --topk 3 --device 0
# real datasets
python main.py --dataset dataset/real_data/Human3_preprocessed.h5ad --mbedder scAFGCC-Results --layers [512] --pred_hid 1024 --lr 0.001 --topk 3 --device 0

In this context, we offer a collection of commonly employed scAFGCC parameters for your reference. For additional details, you can execute python scAFGCC.py -h.

Note: output files are saved in ./results, including Evaluation index(scAFGCC-Results.csv) and run log(scAFGCC-Results.txt).

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
dataset		dataset
model_checkpoints		model_checkpoints
models		models
preprocess		preprocess
results		results
.gitignore		.gitignore
README.md		README.md
data.py		data.py
embedder.py		embedder.py
environment.yml		environment.yml
fig1.png		fig1.png
graph.py		graph.py
main.py		main.py
scRNAGraph.py		scRNAGraph.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scAFGCC: An Augmentation-Free Graph Contrastive Clustering Method for scRNA-Seq Data Analysis

Introduction

Installation

1. Git clone from github

2. Utilize a virtual environment using Anaconda

Running scGCC

1. Data Preprocessing

2. Apply scAFGCC

About

Releases

Packages

Languages

tswstart/scAFGCC

Folders and files

Latest commit

History

Repository files navigation

scAFGCC: An Augmentation-Free Graph Contrastive Clustering Method for scRNA-Seq Data Analysis

Introduction

Installation

1. Git clone from github

2. Utilize a virtual environment using Anaconda

Running scGCC

1. Data Preprocessing

2. Apply scAFGCC

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages