Merge pull request #98 from YosefLab/jhong/methodrefactor

DE/DA method refactor
YosefLab · Apr 2, 2024 · ee602c6 · ee602c6
2 parents ae1f006 + 91abaa3
commit ee602c6
Show file tree

Hide file tree

Showing 3 changed files with 476 additions and 351 deletions.
diff --git a/README.md b/README.md
@@ -40,7 +40,6 @@ pip install git+https://github.com/justjhong/mrvi.git@main
 
 While a more comprehensive user guide is in the works, you can find here a brief overview of the main features of `mrvi`.
 
-
 **Data preparation**:
 MrVI relies on `scvi-tools` routines for model initialization and training.
 In particular, `mrvi` assumes data to be stored in an AnnData object.
@@ -49,10 +48,10 @@ A first step is to load the data and register it, as follows:
 ```python
 from mrvi import MrVI
 
-MrVI.setup_anndata(adata,  sample_key="my_sample_key", batch_key="my_batch_key")
+MrVI.setup_anndata(adata, sample_key="my_sample_key", batch_key="my_batch_key")
 ```
-where here `'my_sample_key'` and `'my_batch_key'` are expected to be keys of `adata.obs` that contain the sample and batch assignments, respectively. 
 
+where here `'my_sample_key'` and `'my_batch_key'` are expected to be keys of `adata.obs` that contain the sample and batch assignments, respectively.
 
 **Model training**:
 The next step is to initialize and train the model, which can be done via:
@@ -65,7 +64,6 @@ model.train()
 Once the model is trained, we recommend visualizing the validation ELBO to assess convergence, which is stored in `model.history["elbo_validation"]`.
 If the ELBO has not converged, you should consider training the model for more epochs.
 
-
 **Latent space visualization**:
 MrVI contains two latent spaces, `u`, that captures global cell-type variations, and `z`, that additionally captures sample-specific variations.
 These two latent representations can be accessed via `model.get_latent_representation()`, (with `give_z=True` to access `z`).
@@ -82,7 +80,6 @@ adata.obsm["u_mde"] = u_mde
 sc.pl.embedding(adata, basis="u_mde")
 ```
 
-
 **Computing sample-sample dissimilarities**:
 MrVI can be used to predict sample-sample dissimilarities, using the following snippet:
 
@@ -94,49 +91,32 @@ dists = model.get_local_sample_distances(
 
 # OR predict sample-sample dissimilarities for EACH cell
 # WARNING: this can be slow and memory-intensive for large datasets
-dists = model.get_local_sample_distances(
-    adata, keep_cell=True, batch_size=32
-)
+dists = model.get_local_sample_distances(adata, keep_cell=True, batch_size=32)
 ```
-These dissimilarities can then be visualized via `seaborn.clustermap` or similar tools.
 
+These dissimilarities can then be visualized via `seaborn.clustermap` or similar tools.
 
 **DE analysis**: MrVI can be used to identify differentially expressed genes (DEGs) between two groups of samples at the single-cell level.
 Here, "samples" refere to the `sample_key` provided in `MrVI.setup_anndata`.
 Identifying such genes can be done as follows,
 
 ```python
-donor_keys_ = ["Status"]  # Here, Status is the donor covarate of interest
-multivariate_analysis_kwargs = {
-    "batch_size": 128,
-    "normalize_design_matrix": True,
-    "offset_design_matrix": False,
-    "store_lfc": True,
-    "eps_lfc": 1e-4,
-}
-res = model.perform_multivariate_analysis(
-    donor_keys=donor_keys_,
-    donor_subset=donor_subset,
-    **multivariate_analysis_kwargs,
+sample_cov_keys = ["Status"]  # Here, Status is the sample covariate of interest
+de_res = model.differential_expression(
+    sample_cov_keys=sample_cov_keys,
 )
 ```
 
 **DA analysis**:
 MrVI can also be used to assess differences in cell-type compositions across sample groups, using the following snippet:
 
 ```python
-da_res = model.get_outlier_cell_sample_pairs()
-gp1 = model.donor_info.query('Status == "A"').patient_id.values
-gp2 = model.donor_info.query('Status == "B"').patient_id.values
-log_p1 = da_res.log_probs.loc[{"sample": gp1}]
-log_p1 = logsumexp(log_p1, axis=1) - np.log(log_p1.shape[1])
-log_p2 = da_res.log_probs.loc[{"sample": gp2}]
-log_p2 = logsumexp(log_p2, axis=1) - np.log(log_p2.shape[1])
-log_prob_ratio = log_p1 - log_p2
+da_res = model.differential_abundance(sample_cov_keys=sample_cov_keys)
+A_log_probs = da_res.Status_log_probs.loc[{"Status": "A"}]
+B_log_probs = da_res.Status_log_probs.loc[{"Status": "B"}]
+A_B_log_prob_ratio = A_log_probs - B_log_probs
 ```
 
-
-
 ## Release notes
 
 See the [changelog](https://github.com/YosefLab/mrvi/blob/main/CHANGELOG.md).