Skip to content

Commit

Permalink
Update to use test.genes= in new trainSingleR() call.
Browse files Browse the repository at this point in the history
  • Loading branch information
LTLA committed Sep 7, 2024
1 parent 2b50a8d commit 3b17536
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 14 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: SingleRBook
Title: The Book of SingleR
Version: 1.15.0
Date: 2023-11-29
Version: 1.15.1
Date: 2024-09-06
Authors@R: person('Aaron', 'Lun', role = c('aut', 'cre'), email="[email protected]")
Description:
Comprehensive guide to using the SingleR Bioconductor package
Expand Down
16 changes: 4 additions & 12 deletions inst/book/advanced.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -37,20 +37,18 @@ sce <- TENxPBMCData("pbmc3k")
counts(sce) <- as(counts(sce), "dgCMatrix")
```

We use the `trainSingleR()` function to do all the necessary calculations
that are independent of the test dataset.
(Almost; see comments below about `common`.)
We use the `trainSingleR()` function to do all the necessary calculations that are independent of the test dataset.
This yields a list of various components that contains all identified marker genes
and precomputed rank indices to be used in the score calculation.
We can also turn on aggregation with `aggr.ref=TRUE` (Section \@ref(pseudo-bulk-aggregation))
to further reduce computational work.
Note that we need the identities of the genes in the test dataset (hence, `test.genes=`) to ensure that our chosen markers will actually be present in the test.

```{r}
common <- intersect(rownames(sce), rownames(dice))
library(SingleR)
set.seed(2000)
trained <- trainSingleR(dice[common,], labels=dice$label.fine, aggr.ref=TRUE)
trained <- trainSingleR(dice, labels=dice$label.fine,
test.genes=rownames(sce), aggr.ref=TRUE)
```

We then use the `trained` object to annotate our dataset of interest through the `classifySingleR()` function.
Expand All @@ -73,12 +71,6 @@ identical(pred$labels, direct$labels)
stopifnot(identical(pred$labels, direct$labels))
```

The big caveat is that the universe of genes in the test dataset must be a superset of that the reference.
This is the reason behind the intersection to `common` genes and the subsequent subsetting of `dice`.
Practical use of preconstructed indices is best combined with some prior information about the gene-level annotation;
for example, we might know that we always use a particular version of the Ensembl gene models,
so we would filter out any genes in the reference dataset that are not in our test datasets.

## Parallelization

Parallelization is an obvious approach to increasing annotation throughput.
Expand Down

0 comments on commit 3b17536

Please sign in to comment.