Skip to content
This repository has been archived by the owner on Oct 9, 2024. It is now read-only.

Commit

Permalink
Merge pull request #98 from YosefLab/jhong/methodrefactor
Browse files Browse the repository at this point in the history
DE/DA method refactor
  • Loading branch information
justjhong authored Apr 2, 2024
2 parents ae1f006 + 91abaa3 commit ee602c6
Show file tree
Hide file tree
Showing 3 changed files with 476 additions and 351 deletions.
42 changes: 11 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@ pip install git+https://github.com/justjhong/mrvi.git@main

While a more comprehensive user guide is in the works, you can find here a brief overview of the main features of `mrvi`.


**Data preparation**:
MrVI relies on `scvi-tools` routines for model initialization and training.
In particular, `mrvi` assumes data to be stored in an AnnData object.
Expand All @@ -49,10 +48,10 @@ A first step is to load the data and register it, as follows:
```python
from mrvi import MrVI

MrVI.setup_anndata(adata, sample_key="my_sample_key", batch_key="my_batch_key")
MrVI.setup_anndata(adata, sample_key="my_sample_key", batch_key="my_batch_key")
```
where here `'my_sample_key'` and `'my_batch_key'` are expected to be keys of `adata.obs` that contain the sample and batch assignments, respectively.

where here `'my_sample_key'` and `'my_batch_key'` are expected to be keys of `adata.obs` that contain the sample and batch assignments, respectively.

**Model training**:
The next step is to initialize and train the model, which can be done via:
Expand All @@ -65,7 +64,6 @@ model.train()
Once the model is trained, we recommend visualizing the validation ELBO to assess convergence, which is stored in `model.history["elbo_validation"]`.
If the ELBO has not converged, you should consider training the model for more epochs.


**Latent space visualization**:
MrVI contains two latent spaces, `u`, that captures global cell-type variations, and `z`, that additionally captures sample-specific variations.
These two latent representations can be accessed via `model.get_latent_representation()`, (with `give_z=True` to access `z`).
Expand All @@ -82,7 +80,6 @@ adata.obsm["u_mde"] = u_mde
sc.pl.embedding(adata, basis="u_mde")
```


**Computing sample-sample dissimilarities**:
MrVI can be used to predict sample-sample dissimilarities, using the following snippet:

Expand All @@ -94,49 +91,32 @@ dists = model.get_local_sample_distances(

# OR predict sample-sample dissimilarities for EACH cell
# WARNING: this can be slow and memory-intensive for large datasets
dists = model.get_local_sample_distances(
adata, keep_cell=True, batch_size=32
)
dists = model.get_local_sample_distances(adata, keep_cell=True, batch_size=32)
```
These dissimilarities can then be visualized via `seaborn.clustermap` or similar tools.

These dissimilarities can then be visualized via `seaborn.clustermap` or similar tools.

**DE analysis**: MrVI can be used to identify differentially expressed genes (DEGs) between two groups of samples at the single-cell level.
Here, "samples" refere to the `sample_key` provided in `MrVI.setup_anndata`.
Identifying such genes can be done as follows,

```python
donor_keys_ = ["Status"] # Here, Status is the donor covarate of interest
multivariate_analysis_kwargs = {
"batch_size": 128,
"normalize_design_matrix": True,
"offset_design_matrix": False,
"store_lfc": True,
"eps_lfc": 1e-4,
}
res = model.perform_multivariate_analysis(
donor_keys=donor_keys_,
donor_subset=donor_subset,
**multivariate_analysis_kwargs,
sample_cov_keys = ["Status"] # Here, Status is the sample covariate of interest
de_res = model.differential_expression(
sample_cov_keys=sample_cov_keys,
)
```

**DA analysis**:
MrVI can also be used to assess differences in cell-type compositions across sample groups, using the following snippet:

```python
da_res = model.get_outlier_cell_sample_pairs()
gp1 = model.donor_info.query('Status == "A"').patient_id.values
gp2 = model.donor_info.query('Status == "B"').patient_id.values
log_p1 = da_res.log_probs.loc[{"sample": gp1}]
log_p1 = logsumexp(log_p1, axis=1) - np.log(log_p1.shape[1])
log_p2 = da_res.log_probs.loc[{"sample": gp2}]
log_p2 = logsumexp(log_p2, axis=1) - np.log(log_p2.shape[1])
log_prob_ratio = log_p1 - log_p2
da_res = model.differential_abundance(sample_cov_keys=sample_cov_keys)
A_log_probs = da_res.Status_log_probs.loc[{"Status": "A"}]
B_log_probs = da_res.Status_log_probs.loc[{"Status": "B"}]
A_B_log_prob_ratio = A_log_probs - B_log_probs
```



## Release notes

See the [changelog](https://github.com/YosefLab/mrvi/blob/main/CHANGELOG.md).
Expand Down
Loading

0 comments on commit ee602c6

Please sign in to comment.