-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dropulation - assingments.tsv.gz only contains 20% of filtered barcodes #29
Comments
Hi @Thapeachydude , Thanks for reaching out! First, I'd like to get a bit more background on this experiment. It seems like a lot of barcodes in the original 10x file, can you tell me what 10x platform was used for capture? We typically aim for 20k on the normal platform and 40k on the newer high-throughput platform. If more were captured, it might result in more ambient RNA. Does the knee plot in the output look as expected for low amounts of ambient RNA? |
Hi, the experiment was done on a nuclei prep using the 5-prime HT kit. I've found that with nuclei the cell count tends to be generally a bit higher than what one aims for (either due to some ambient contamination or it may be that their just harder to count accurately due to their size - we use automated counting). Naturally, this will result in a higher doublet rate. But our donors tend to separate very clearly transcriptionally, making the identification of doublets possible even at a high doublet rate. This particular pool is one in a series we've recently done. The reads-in-droplet percentage varies a bit ≈ 60-90% (most are around 80%). If barcodes some were lost during the process I would understand, but 13k is a bit to low, suggesting something is not going as intended. (btw. I've tried using souporcell in the |
Thanks for the additional information. Yes that all makes sense. We also applied dropulation to some HT experiments recently and found a similar pattern with very few being called as singlets. I think it's worth getting @jamesnemesh opinion here since he's the developer of Dropulation and provided me with the script for calling singlets and doublets. It may be that the thresholds have to altered slightly for HT data or that there are some assumptions that are not met with that many cells. @jamesnemesh any thoughts or recommendations to test? We also know that there is a decrease in the performance of most of the methods with higher ambient RNA. This seems to be less for souporcell and vireo in pools with less than 10 donors. Since you've already run souporcell, could you check the ambient percent it estimated? |
Hi in the troublesome pools ≈ 45%, unfortunately. I guess that makes sense :/ |
We simulated up to 25% additional ambient RNA in Demuxafy (currently under review and hasn't been updated on biorxiv with the most recent changes) so we didn't get that high in the simulations. But on average I have noticed that nuclei data result in higher ambient RNA. Our single-cell datasets are usually ~5-15% depending on the experiment and design. That is interesting about removing the cells for souporcell. I'm wondering if assumptions for the model for estimating ambient RNA are violated when you remove those cells because you've removed some data that could be important for continuity of the model. You may want to try vireo as an alternative to pair with souporcell since it is slightly more robust to ambient RNA than the other methods but I still think you may end up with many unassigned cells. I would also recommend adding the |
I can't speak to @drneavin's code, but the two donor assignment programs will emit a cell barcode in the output for every cell in the input, as long as there's at least one transcribed SNP. You do need something like 100 or so transcribed SNP UMI observations to have decent performance. If the original When we run these programs, we tend to focus on the cell barcodes we think are actually cells (or nuclei) in the experiment. We have a separate cell selection process that's slightly more useful than a knee plot - we use a combination of CellBender and visualization of the UMIs (log10) vs %intronic. Cellbender emits a probability that each cell is an empty or non-empty droplet, and the non-empty droplets become the superset from which we select cells. In the following plot, the (retained) on the X axis refers to the cell barcode library size after cellbender remove background has been applied. We use AssignCellsToSamples I haven't seen any experiments where 60K true cells/nuclei works very well, the doublet rate would be very high at that loading. |
@drneavin This might be the best place to put this: We've released both a much more full set of documentation and an R library that generates a number of useful QC plots and evaluates the donor assignment and doublet detection outputs. For people having issues, it may be very helpful to look at the docs to see if the programs are running correctly, and run the QC plots to have a common starting point for discussion about issues. |
Hi,
I'm dealing with a 10x data of multiple pools with ≈ 10 donors each that I would like to demultiplex using WES/WGS reference data. Following your recommendations I'm trying to use dropulation for this. Following your guide I've run dropulation but unfortunately most of the barcodes seem to be lost during
AssignCellsToSamples
step.Specifically, the filtered barcodes 10x output contains ≈ 60k barcodes, but the
assignments.tsv.gz
file only has 13k.Happy about any feedback : )
Best,
M
The text was updated successfully, but these errors were encountered: